Since 2010, the year of initiation of annual Imagenet Competition where research teams submit programs that classify and detect objects, machine learning has gained significant popularity. In the present age, Machine learning, in particular deep learning, is incredibly powerful to make predictions based on large amounts of available data. There are many applications of machine learning in Computer vision, pattern recognition including Document analysis, Medical image analysis etc. In order to facilitate innovative collaboration and engagement between document analysis community and other research communities like computer vision and images analysis etc. here we plan to organize a workshop of Machine learning before the ICDAR conference.
The topics of interest of this workshop include, but are not limited to:
Relevance for ICDAR:
Since Machine Learning has been used largely in document analysis area hence this workshop has very much relevance with ICDAR.
Abstract: Mathematical problem solving has attracted wide attention in artificial intelligence area. Text problem parsing and solving achieved big progress in recent years due to the advances of deep learning and large language models. However, geometry problem solving fusing text and diagram remains a challenge. We stepped into this field taking advantage of document image analysis and deep learning techniques. First, we proposed a powerful diagram parser based on deep learning and graph reasoning to generate structural descriptions from plane geometry diagrams, which are composed of geometric primitives and non-geometric primitives (embedded texts). For geometry problem solving, we convert diagrams into basic textual clauses to describe diagram features effectively, and propose a new neural solver called PGPSNet to fuse multimodal information. Combining structural and semantic pre-training, PGPSNet is endowed with rich knowledge of geometry theorems and geometric representation, and therefore promotes geometric understanding and reasoning. Experiments validate the superiority of our method over state-of-the-art neural solvers. Future work would be to combing neural solver and knowledge reasoning, and leverage large models.
Biography: Cheng-Lin Liu is a Professor at the National Laboratory of Pattern Recognition, Institute of Automation of Chinese Academy of Sciences, and now the Director of the Laboratory. He is a vice president of the Institute of Automation, a vice dean of the School of Artificial Intelligence, University of Chinese Academy of Sciences. He received the PhD degree in pattern recognition and intelligent control from the Chinese Academy of Sciences, Beijing, China, in 1995. He was a postdoctoral fellow in Korea and Japan from March 1996 to March 1999. From 1999 to 2004, he was a researcher at the Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan. His research interests include pattern recognition, machine learning and document image analysis. He has published over 400 technical papers in journals and conferences. He is an Associate Editor-in-Chief of Pattern Recognition Journal and Acta Automatica Sinica, an Associate Editor of International Journal on Document Analysis and Recognition, Cognitive Computation, IEEE/CAA Journal of Automatica Sinica, Machine Intelligence Research, CAAI Trans. Intelligence Technology, CAAI Artificial Intelligence Research and Chinese Journal of Image and Graphics. He is a Fellow of the CAA, CAAI, the IAPR and the IEEE.
Abstract: Large-scale labeled data is the key to the success of most deep learning-based optical character recognition (OCR) methods in solving practical problems. However, acquiring a substantial amount of accurately labeled data of high quality can be both time consuming and costly. In this talk, I will focus on three important issues in the OCR fields from a data perspective: 1) how to reduce the dependence of deep learning-based methods on high-quality labelled data, 2) how to solve OCR problems in cases of insufficient or weakly annotated data, and 3) how to effectively utilize large-scale unlabeled data. To address these issues, some recent progress and representative methods will be introduced. Additionally, I will discuss existing technical challenges in the OCR field and forecast future research trends.
Biography: Jin Lianwen is a professor at South China University of Technology. His research interests include optical character recognition, document image understanding, computer vision and artificial intelligence. He has published over 200 papers in important academic journals such as IEEE TPAMI/TIP/TNNLS/TMM/TIFS/TCSVT, Pattern Recognition, as well as in major international conferences such as ICDAR, ICFHR, CVPR, ECCV, IJCAI, and AAAI. His papers on Google Scholar have been cited over 10,000 times, and he has an H-Index of 56. He has served as a PC member, SPC member or AC for international conferences such as ICDAR, ICFHR, CVPR, ICCV, and IJCAI. He received the New Century Excellent Talent Program of MOE Award and the Guangdong Pearl River Distinguished Professor Award in 2006 and 2011, respectively. In recent 6 years, he has mentored students to participate in academic competitions at well-known international conferences such as CVPR, ICDAR, and ICPR, winning the first place over 20 times.
Papers should be submitted via CMT.
Here is the link
https://cmt3.research.microsoft.com/ICDARWML2023
WML 2023 will follow a double-blind review process. Authors should not include their names and affiliations anywhere in the manuscript and
authors should also ensure that their identity is not revealed indirectly by citing their previous work in the third person.
The topics of interest of this workshop include, but are not limited to:
We request you to submit your research work in this workshop.
Paper Length and publication of Proceedings :
The submitted papers in ICDAR-WML 2023 will have the same policy and
conditions of ICDAR 2023 main conference papers and the ICDAR-WML 2023
proceedings will be published under the Springer Lecture Notes in Computer
Science (LNCS) series. Length of the submitted papers will be up to 15 pages
in the proceedings, including references. Papers should be formatted (latex
or in Word) according to the instructions and style files provided by Springer
available in https://www.springer.com/gp/computer-science/lncs/conference-proceedings-guidelines
,
Name | Affiliation |
---|---|
Alaei Alireza | Southern Cross University, Australia |
Bhattacharya Saumik | IITK, India |
Britto Alceu | PUCPR, Brazil |
Chan Chee Seng | University of Malaya, Malaysia |
Chanda Sukalpa | Centre For Image Analysis, Department of Information Technology, Uppsala University, Sweden |
Chen Shanxiong | Southwest University, China |
De Ishita | Barrackpore Surendranath College, India |
Gao Liangcai | Peking University, China |
Harit Gaurav | IIT Rajasthan, India |
Impedovo Donato | Dipartimento di Informatica - UNIBA, Italy |
Iwana Brian Kenji | Kyushu University, Japan |
Lian Zhouhui | Peking University, China |
Luqman Muhammad Muzzamil | L3i Laboratory, University of La Rochelle, France |
Pal Umapada | Indian Statistical Institute, India |
Pal Srikanta | Department of Computer Science, Faculty of MIEC, Maynooth University, Ireland |
Palaiahnakote Shivakumara | University of Malaya, Malaysia |
Raghavendra R. | Norwegian Biometric Laboratary, Norway |
Roy Partha Pratim | Indian Institute of Technology, India |
Roy Kaushik | West Bengal State University, India |
Roy Swalpa Kumar | North Bengal Engg College, India, India |
Saini Rajkumar | LTU, Sweden |
Sun Jun | Jiangnan University, China |
Sundaram Suresh | Indian Institute of Technology Guwahati, India |
Zhang Heng | Institute of Automation,Chinese Academy of Sciences, China |