Walker et al., 2012 - Google Patents
A synthetic document image dataset for developing and evaluating historical document processing methodsWalker et al., 2012
View PDF- Document ID
- 603692129620672570
- Author
- Walker D
- Lund W
- Ringger E
- Publication year
- Publication venue
- Document Recognition and Retrieval XIX
External Links
Snippet
Document images accompanied by OCR output text and ground truth transcriptions are useful for developing and evaluating document recognition and processing methods, especially for historical document images. Additionally, research into improving the …
- 230000015556 catabolic process 0 abstract description 17
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2288—Version control
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/211—Formatting, i.e. changing of presentation of document
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/24—Editing, e.g. insert/delete
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K15/00—Arrangements for producing a permanent visual presentation of the output data, e.g. computer output printers
- G06K15/02—Arrangements for producing a permanent visual presentation of the output data, e.g. computer output printers using printers
- G06K15/18—Conditioning data for presenting it to the physical printing elements
- G06K15/1848—Generation of the printable image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K15/00—Arrangements for producing a permanent visual presentation of the output data, e.g. computer output printers
- G06K15/02—Arrangements for producing a permanent visual presentation of the output data, e.g. computer output printers using printers
- G06K15/18—Conditioning data for presenting it to the physical printing elements
- G06K15/1801—Input data handling means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/03—Detection or correction of errors, e.g. by rescanning the pattern
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00442—Document analysis and understanding; Document recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Drobac et al. | Optical character recognition with neural networks and post-correction with finite state methods | |
| Hochberg et al. | Script and language identification for handwritten document images | |
| Holley | How good can it get? Analysing and improving OCR accuracy in large scale historic newspaper digitisation programs | |
| US7627177B2 (en) | Adaptive OCR for books | |
| US8401293B2 (en) | Word recognition of text undergoing an OCR process | |
| US8340425B2 (en) | Optical character recognition with two-pass zoning | |
| Drobac et al. | OCR and post-correction of historical Finnish texts | |
| US20250231983A1 (en) | System and method for meta-data extraction from documents | |
| JP2021039694A (en) | Data processing system, learning device, image processing device, method, and program | |
| Reul et al. | Mixed model OCR training on historical Latin script for out-of-the-box recognition and finetuning | |
| CN102637256A (en) | Shape clustering in post optical character recognition processing | |
| Zhang et al. | Post-OCR correction with OpenAI's GPT models on challenging English prosody texts | |
| Walker et al. | A synthetic document image dataset for developing and evaluating historical document processing methods | |
| CN114510925A (en) | Chinese text error correction method, system, terminal equipment and storage medium | |
| US20230137350A1 (en) | Image processing apparatus, image processing method, and storage medium | |
| Hocking et al. | Optical character recognition for South African languages | |
| Lin | Header and footer extraction by page association | |
| RU2166207C2 (en) | Method for using auxiliary data arrays in conversion and/or verification of character-expressed computer codes and respective subpictures | |
| Lund | Ensemble Methods for Historical Machine-Printed Document Recognition | |
| Krawczyk et al. | Visual GQM approach to quality driven development of electronic documents. | |
| Summers | Document image improvment for OCR as a classification problem | |
| O’Brien et al. | Optical character recognition | |
| Piotrowski | Acquiring Historical Texts | |
| Löfgren | New tools for old news | |
| Lacerda et al. | Character Identification in Images Extracted from Portuguese Manuscript Historical Documents |