Menu:

CROHME 2014 Task 2:

Mathematical Expression Recognition

Task Description and System Evaluation

In this task, systems must segment, classify and parse symbols in individual handwritten expressions. All training and testing expressions will conform to a LaTeX grammar (the part IV grammar from CROHME 2013). XML and human-readable versions of these grammars are available to participants through the CROHMELib library. CROHMELib also provides a java parser for testing whether a LaTeX expression is legal for the part IV grammar. The grammar includes vertical layout structures such as fractions, square roots, subscripts, superscripts, and limits above and/or below summations and integrals. Grid-based and tabular structures such as matrices, choice notation, and cases in function definitions are not included in this task (see Task 3).

Systems will be evaluated using the same metrics as CROHME 2013.

Input file format

The input file format is the CROHME InkML format used in previous competitions. These files may be visualized using the CROHME InkML Viewer. A description is provided here.

Training data set

The training data set is the same used for CROHME 2013 (part IV), available from the TC11 download page.

New expressions will be created for the Test data set, by a set of writers that may be distinct from those in the training data. All new expressions will conform to the part IV grammar.

System inputs / outputs

Systems will be called with two arguments, the names of the input inkml file, and the output label graph file.

The output will be a simple Comma Separated Value format (CSV) which represent the Label Graph (LG) of the expression. Label Graphs represent structure at the stroke level. An introduction to the .lg format is available here, and the labels to use for spatial relationships are described here). Each stroke is represented by a node labeled with the class of its associated symbol, and all stroke pairs are have two labeled directed edges between them. Edge labels represent whether two strokes are unrelated, belong to the same symbol, or belong to two symbols in a spatial relationship (relations: Right, Above, Below, Inside (square root), Superscript, or Subscript). The LgEval library will be used to compare label graph files.

Remarks: