[go: up one dir, main page]

Mitchell et al., 2000 - Google Patents

Document page segmentation based on pattern spread analysis

Mitchell et al., 2000

Document ID
5870545890001355799
Author
Mitchell P
Yan H
Publication year
Publication venue
Optical Engineering

External Links

Snippet

This paper introduces an algorithm designed to segment black-and-white documents for the purpose of compression. A single document is segmented into two documents suitable for pattern-based and run-length-based compression. With some modification the same …
Continue reading at www.spiedigitallibrary.org (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/20Image acquisition
    • G06K9/32Aligning or centering of the image pick-up or image-field
    • G06K9/3233Determination of region of interest
    • G06K9/325Detection of text region in scene imagery, real life image or Web pages, e.g. licenses plates, captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/20Image acquisition
    • G06K9/32Aligning or centering of the image pick-up or image-field
    • G06K9/3275Inclination (skew) detection or correction of characters or of image to be recognised
    • G06K9/3283Inclination (skew) detection or correction of characters or of image to be recognised of characters or characters lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • G06K9/4604Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes, intersections
    • G06K9/4609Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes, intersections by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • G06K9/4642Extraction of features or characteristics of the image by performing operations within image blocks or by using histograms
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6201Matching; Proximity measures
    • G06K9/6202Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image
    • G06K9/52Extraction of features or characteristics of the image by deriving mathematical or geometrical properties from the whole image
    • G06K9/527Scale-space domain transformation, e.g. with wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00442Document analysis and understanding; Document recognition
    • G06K9/00463Document analysis by extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics, paragraphs, words or letters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/20Image acquisition
    • G06K9/2054Selective acquisition/locating/processing of specific regions, e.g. highlighted text, fiducial marks, predetermined fields, document type identification
    • G06K9/2072Selective acquisition/locating/processing of specific regions, e.g. highlighted text, fiducial marks, predetermined fields, document type identification based on positionally close symbols, e.g. amount sign or URL-specific characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/20Image acquisition
    • G06K9/34Segmentation of touching or overlapping patterns in the image field
    • G06K9/342Cutting or merging image elements, e.g. region growing, watershed, clustering-based techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00442Document analysis and understanding; Document recognition
    • G06K9/00456Classification of image contents, e.g. text, photographs, tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/68Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00442Document analysis and understanding; Document recognition
    • G06K9/00449Layout structured with printed lines or input boxes, e.g. business forms, tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/38Quantising the analogue image signal, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K2209/00Indexing scheme relating to methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30244Information retrieval; Database structures therefor; File system structures therefor in image databases
    • G06F17/30247Information retrieval; Database structures therefor; File system structures therefor in image databases based on features automatically derived from the image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K7/00Methods or arrangements for sensing record carriers, e.g. for reading patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Similar Documents

Publication Publication Date Title
Antonacopoulos Page segmentation using the description of the background
US5452374A (en) Skew detection and correction of a document image representation
Kise et al. Segmentation of page images using the area Voronoi diagram
US5048099A (en) Polygon-based method for automatic extraction of selected text in a digitized document
US6674900B1 (en) Method for extracting titles from digital images
US5410611A (en) Method for identifying word bounding boxes in text
Baird et al. Image segmentation by shape-directed covers
US5335290A (en) Segmentation of text, picture and lines of a document image
US6731788B1 (en) Symbol Classification with shape features applied to neural network
US6614930B1 (en) Video stream classifiable symbol isolation method and system
EP0854433B1 (en) Caption and photo extraction from scanned document images
JP3278471B2 (en) Area division method
Drivas et al. Page segmentation and classification utilising a bottom-up approach
US5033104A (en) Method for detecting character strings
JP2002133426A (en) Ruled line extraction device for extracting ruled lines from multi-valued images
Mitchell et al. Newspaper document analysis featuring connected line segmentation
Dori et al. Segmentation and recognition of dimensioning text from engineering drawings
US6289120B1 (en) Method and system for processing images of forms which have irregular construction and/or determining whether characters are interior to a form
Grover et al. Text extraction from document images using edge information
Payne et al. Document segmentation using texture analysis
Mitchell et al. Document page segmentation based on pattern spread analysis
Bloomberg Multiresolution morphological analysis of document images
Strouthopoulos et al. Identification of text-only areas in mixed-type documents
Amin et al. Page segmentation and classification utilizing bottom-up approach
Hesham et al. A zone classification approach for arabic documents using hybrid features