Welcome to SORTA
System for Ontology-based Re-coding and Technical Annotation
SORTA, a matching tool built in MOLGENIS, is able to semi-automatically match data values with standard codes such as ontologies or local terminologies. For each data value, SORTA provides a list of the most relevant standard codes based on the lexical similarity in percentage, users can then pick the correct matches from the suggested list.
Click here for a demo.
The demo version does not have full functionality, data will not be saved in the database and will be lost after the session expires. To get access to SORTA, please contact the administrator for login credentials. Try out the examples below, you can directly get match results by clicking one of the two example links.
SORTA is built based on Lucene in combination with the N-gram string matching algorithm to achieve high performance and accuracy. Lucene matching scores are too abstract for users to understand and they are not comparable between each other. Therefore we use the N-gram algorithm to re-calculate the similarity scores (in percentages) between data values and the concepts retrieved by Lucene. The new similarity scores are more clear and comparable, enabling us to explore the uniform cut-off value.