ISI Bangla Degraded Document Image Database (ISIDDI)



This database (ISIDDI) consists of images of 535 pages scanned from 15 old Bangla printed books. These books had been collected from three different sources, viz. (i) Indian Statistical Institute library, (ii) Old Book Market at College Street, Kolkata, India and (iii) a Public Library of India. A number of pages of the books collected from sources (i) and (ii) containing various types of degradations had been initially identified before their scanning. We scanned these degraded document pages using a flatbed scanner at 300 dpi and stored them as color images in both uncompressed TIF and JPG formats. Similarly, we have identified several pages of printed Bangla containing one or multiple types of degradations from the above archive and downloaded them. Since these books were originally written over a long period of time of the past history, both of their forms of Bangla language and font of the printed script vary widely.


Ref:- Chandan Biswas, Partha Sarathi Mukherjee, Koyel Ghosh, Ujjwal Bhattacharya, Swapan K. Parui, A Hybrid Deep Architecture for Robust Recognition of Text Lines of Degraded Printed Documents. Proc. of ICPR, pp. 3174-3179, 2018.

