HANDWRITTEN / SCENE CHARACTER DATABASES OF INDIC SCRIPTS

 

Research on OCR systems for scanned printed Indian script documents have been continued for quite some time. On the other hand, not much research works on handwriting recognition or scene texts recognition of Indian scripts are available in the literature. Unfortunately, the technology of printed OCR cannot be extended to recognition of such characters due to enormous variability in their samples.

 

Devanagari is the first-most popular language and script of India while Bangla is the second-most popular language and script of the Indian subcontinent and the fifth-most popular language of the world. There are several other scripts such as Tamil, Telegu, Kanada, Malayalam and a few others, which are used by significant sections of Indian population.

 

Most of the available works on handwriting or scene text recognition of Indian scripts are based on either small or non-standard databases collected in laboratory environments. Recently, we developed a few large databases of handwritten or scene characters of major Indic script(s). These database are either already made available free of cost to the academic researchers or their release is under processing. These databases are the following.

 

1.      Online handwritten database

(a)   Bangla numerals

(b)   Bangla basic characters

2.      Offline handwritten database

(a)   Numerals

(i)     Devanagari

(ii)   Bangla

(iii) Oriya

(b)   Basic characters

(i)     Bangla

(ii)   Devanagari

(c)    Bangla Vowel Modifiers

(d)   Bangla Compound characters

3.      Bangla segmented scene character database

Application form for obtaining "ISI Handwritten Character Databases"





 

Back to Ujjwal's main page