Data and Tools
All data and tools provided here are freely available only for research purpose without any commercial use.
Description of Training and Test Data
Training and test data will be given in XML (more specifically in InkML) format. Separate grammar will be provided to understand the structure of the XML data. Click here to know more about the data file format.
The CROHME package provides training and test data from the competitions CROHME 2011, 2012 and 2013. Furthermore, thanks to the participants' authorization, we are allowed to distribute the results files from the majority of the submitted systems in 2012. Here is the description of all data directories:
- CROHME2011_data : all data from the CROHME 2011 competition:
- CROHME_test : inkml test files without ground truth
- CROHME_testGT : inkml test files with ground truth
- CROHME_train : inkml train files with ground truth
- gram : xml grammars and symbol lists for parts I and II
- CROHME2012_data : all data from the CROHME 2012 competition:
- testData : inkml test files without ground truth
- testDataGT : inkml test files with ground truth
- trainData : inkml train files with ground truth
- gram : xml grammars and symbol lists for parts I, II and III
- lists : lists of inkml files and latex expressions for parts I, II and III
- CROHME2013_data : all data from the CROHME 2013 competition:
- TrainINKML : all training inkml files sorted by origin
- TestINKML : inkml test files without ground-truth, used to run the participants systems.
- TestINKMLGT : inkml test files with ground-truth, used to evaluate the participants systems with the evalinkml tool.
- Test_LG/Test2012LG Test_LG/Test2013LG: label graph version of the test files for 2012 and 2013 dataset, using inherited edges (so the graphs are DAGs).
- Test_LG/Test2012LG_TREE Test_LG/Test2013LG_TREE: label graph version of the test files for 2012 and 2013 data set, without inherited edges (so the graphs are trees).
- ParticipantsResults2012 : results files from the participants of the CROHME 2012 competition (note that the result files from VisionObjects are restricted to the participants):
- ResultsTest : inkml results for the test part for each participant (parts I, II, and III)
- ResultsTrain : inkml results for the train part for each participant (only part III)
For CROHME 2014, the competition datasets will also include:
- Isolated symbols (Task 1): all the symbols from the 2014 training dataset will be used as train part; junk (mis-segmented) samples will be provided for the reject option. The test sample will be extracted from new expressions create for the Task 2 test set.
- New expression test set (Task 2) : will not be provided to participants during the competition; these new handwritten expressions will correspond to Grammar IV used for CROHME 2013.
- Matrix training/test sets (Task 3): This dataset will allow participants to address a new range of math expressions that include matrices. Ground-truth will be provided in inkml (with <mtable> elements) and LG file formats.
Tools
The CROHME organizers provide tools for math expression selection, running tests, evaluation and visualization. Please consult the corresponding README files to see specific requirements for each library. These libraries are often updated, so please refer to Document and Pattern Recognition Lab (DPRL) Software page to download the lastest versions. Specially, the tools will soon be updated to deal with nested structured expressions like matrices.
- CROHMELib : library of scripts to:
- Filter latex expressions with regards to xml grammars
- Run a recognizer over a list of inkml files
- Compare ground-truthed inkml files with inkml results (evalInkml_v1.7.pl in 2012 and evalInkml_v1.10.pl in 2013)
- Convert CROHME inkml files to and from lg files (e.g. to visualize recognition results in the CROHME InkML viewer (see below))
- lg2dot: converts label graphs (.lg files) to .dot files, which can then be rednered as .pdf files that visualize structure for strokes or symbols, and to visualize differences between interpretations.
- lg2mml: converting label graphs to MathML.
- Scripts to evaluate recognizer output.
All data and tools are provided under a Creative Commons license for academic and research purposes, but not for commercial use.