Kereső
Bejelentkezés
Kapcsolat
Rethinking Numerical Table Recognition: A Transparent Algorithmic Solution for Specific OCR Problems |
Tartalom: | http://hdl.handle.net/10890/58907 |
---|---|
Archívum: | Műegyetem Digitális Archívum |
Gyűjtemény: |
1. Tudományos közlemények, publikációk
Konferenciák gyűjteményei Workshop on Intelligent Infocommunication Networks, Systems and Services 3rd Workshop on Intelligent Infocommunication Networks, Systems and Services, 2025 |
Cím: |
Rethinking Numerical Table Recognition: A Transparent Algorithmic Solution for Specific OCR Problems
|
Létrehozó: |
Gedeon, Máté
Marozsán, Patrik
|
Dátum: |
2025-02-20T13:51:18Z
2025-02-20T13:51:18Z
2025
|
Tartalmi leírás: |
Optical Character Recognition (OCR) is a well-established technology for the recognition of printed and handwritten text/numbers. However, its application to tabular data remains limited, with existing solutions often being costly and/or underperforming, especially when applied to archival data. These challenges stem from the fact that many OCR models are not optimized to handle the unique structural and stylistic characteristics of historical tables. For instance, nineteenth- and twentieth-century Hungarian price tables frequently feature unconventional formatting, such as midline decimal points, irregular separators, and the absence of dividing lines between cells, all of which hinder existing OCR solution's performance. To overcome these limitations, we present a transparent and customizable solution tailored for tables, for which existing softwares are inefficient. The algorithm processes table images by dividing them into cells, even when explicit dividing lines are absent, and accurately identifies decimal points, separators, and numerical values. Evaluation on a dataset consisting of historical price tables demonstrated the efficacy of our approach. Our custom digit recognition network achieved a test accuracy of 99.3%, while the complete system delivered a cell-level accuracy of 97.5% across 40 test images. These results confirm the reliability of our solution for handling tabular data, even with unique properties. Our method not only addresses the challenges of processing archival tables, but also provides a transparent and adaptable framework for broader applications. It has significant potential for practical applications in archives and libraries, and could also inspire advancements in other fields, where available solutions struggle.
|
Nyelv: |
angol
|
Típus: |
Könyvfejezet
|
Formátum: |
application/pdf
|
Azonosító: |