WO1992001998A1 - Procede et appareil de separation automatique d'un texte utilisant une technique automatique de filtrage electronique des couleurs de moindre intensite multiples pour la reconnaissance optique de caracteres sur des documents preimprimes - Google Patents
Procede et appareil de separation automatique d'un texte utilisant une technique automatique de filtrage electronique des couleurs de moindre intensite multiples pour la reconnaissance optique de caracteres sur des documents preimprimes Download PDFInfo
- Publication number
- WO1992001998A1 WO1992001998A1 PCT/US1991/005040 US9105040W WO9201998A1 WO 1992001998 A1 WO1992001998 A1 WO 1992001998A1 US 9105040 W US9105040 W US 9105040W WO 9201998 A1 WO9201998 A1 WO 9201998A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- color
- pixel
- grey
- scale
- processing
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N1/00—Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
- H04N1/46—Colour picture communication systems
- H04N1/56—Processing of colour picture signals
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/143—Sensing or illuminating at different wavelengths
Definitions
- the invention relates to the automatic selection and detection of a drop-out color using a color electronic scanner and more particularly, allows the Optical Character Recognition (OCR) system to adjust the filtering parameters automatically based on the form itself, rather than matching the form to the optical filter.
- OCR Optical Character Recognition
- OCR Optical Character Recognition
- the first step of the OCR process is electronic scanning of the document and converting all of the information to a digital bit-map.
- the information to be read is separated from the background information—boxes and guide text must be ignored and the filled-out text should be read.
- the electronic image of the text is processed by the OCR algorithm, where the characters of interest are converted to ASCII data.
- OCR algorithms processing business forms employ the technique of a "drop-out color". By printing documents in a predetermined color (usually a pastel color) and employing an optical filter of the same color in the electronic scanner, the filled-out text on the document can be separated from the printed form.
- the color filter causes the scanner to ignore information printed in that color (to the electronic scanner, the form color appears as being equivalent to the white background of the paper).
- the filled-out text typically is typed or printed in black (or other dark color)
- this information is captured by the scanner as black.
- the pre-printed form is converted to a white background and the filled-out text can be processed readily by an OCR algorithm.
- Use of the optical filter works well in this application, but it limits the customer to a very specific color on the form (one that precisely matches the characteristics of the optical filter installed in the scanner) . Additional drop-out colors can be included in the scanner by adding additional optical filters. Accordingly, the processing of a particular form would require selecting the proper optical filter and mechanically inserting it prior to processing the form.
- the scanner would separate all images into the three primary colors: red, green and blue.
- a black and white rendition of the image can be produced simply by adding the three color components.
- By independently processing the red, green and blue signals it is possible to segregate color information from the common black and white information, so that the apparatus filters all colors, leaving only the high contrast text for OCR reading. Disclosure of the Invention
- three digital channels are multiplied by appropriate coefficients to insure uniform color and amplitude response among all pixels.
- the three signals Once the three signals have been corrected for uniformity, they are processed as independent video signals to create three binary representations of the image.
- an "all color” filter is created which can separate "black” text from any color pre-printed information.
- the three outputs represent document images using all possible combinations of drop-out colors in color space.
- Figure 1 illustrates the configurations of a solid state charged coupled device that can be used for color scanning
- Figure 2 illustrates a block diagram of the circuit used for electronic color filtering in accordance with the invention
- FIGS. 3A-B illustrate a flow chart that is used in conjunction with white calibration.
- Figure 1 illustrates the type of electronic scanner used to generate a programmable drop-out color.
- This scanner would separate all images into the three primary colors: red, green, and blue.
- a black and white rendition of the image (as a typical electronic scanner would produce today) can be produced simply by adding the three color components.
- the electronic scanner intended for use in the present apparatus is based on a "contact type" CCD (Charge Coupled Device) 10 currently available as Model TCD126C, made by Toshiba.
- the CCD is actually several CCD arrays on a single substrate and has a horizontal resolution of 1200 Pixels/inch and spans 12 inches. Because most OCR algorithms can read accurately with scan resolutions of 200 to 400 Pixels/inch, the added resolution can be used for color detection.
- Such detection is accomplished by masking adjacent pixels with appropriate red, green a and blue optical filters with the spectral content of these filters being based on the spectral characteristics of the CCD device itself.
- three adjacent cells 12, 14, and 16 form a single "super-pixel" 18, with cells 12, 14, and 16 being masked by red, green and blue optical filters 20, 22, and 24 respectively. If each pixel corresponds to 1/1200 the effective resolution of the CCD device would be 400 Pixels/inch.
- the output of this scanner contains a three channel output of red 26, green 28, and blue 30 video signals.
- Figure 2 illustrates a block diagram for use in automatic text separation for OCR reading as well as full image capture.
- the color scanner 10 outputs three video signals per pixel— ed 26, Green 28, and Blue 30, in a segmented fashion for each scan line.
- the R, G, B signals are converted to a grey-scale digital representation by respective A/D converters 32, 34 and 36.
- Each pixel's Red, Green, and Blue component is then fed to multipliers 38, 40 and 42 respectively.
- the Microprocessor and RAM Storage Subsystem 52 monitors each pixel within a scan line to ensure proper correlation between pixel video data and calibration coefficients 38, 40, and 42, which are sent to the corresponding multiplier 46, 48, and 50 for their respective color channel.
- the output of these multipliers is in the form of a segmented bit stream of Calibrated Red 56, Green 57, and Blue 58 pixels which can be used as a grey-scale color image.
- This calibrated color information is also fed to summing junction 59 where the three color components for each pixel are added to form a grey-scale black and white image as its output.
- the calibrated Red 56, Green 57, and Blue 58 video data is fed back to Microprocessor and RAM Storage Subsystem 52 for diagnostic purposes.
- the calibrated Red 56, Green 57, and Blue 58 video data is also processed by respective Threshold Circuits 41, 43 and 45 which create 1 bit/pixel video data for Red, Green and Blue.
- Threshold Circuits 41, 43 and 45 may be in the form of a simple comparator or be as elaborate as an M x N convolution filter with adaptive thresholding.
- the output of each threshold circuit 41, 43 and 45 is binary where a "1" corresponds to a "dark” pixel, and a "0" corresponds to a "light” pixel.
- the output of AND gate 63 can be considered a "text" output for typical forms employing a pastel drop-out color.
- Color background information is filtered out and only typed text information is passed on to an OCR algorithm.
- OCR algorithm For example, a form printed with non-carbon red ink and filled out with a typewriter using a carbon-based ribbon easily could be processed using this invention.
- the present invention system would produce an image only of the typed text, ignoring the pre-printed red.
- the present invention can filter out any non-carbon ink, thereby providing greater flexibility. The user could use inter-mixed documents of different colors without worrying about changing filters.
- any drop-out colors used on a particular form can be automatically determined and suppressed by making some assumptions about the spectral content of the filled-out text.
- Most text, used to fill out business forms, can be categorized as "carbon based”. This category includes most typewritten ribbons, pens or pencils. Such text would pass as black regardless of any color filter employed, and the text can be separated from any pre-printed color information by applying an "all-color filter”.
- White calibration can be used to optimize scanner performance by compensating for any spectral anomolies or sensitivity variations on a pixel by pixel basis.
- the white calibration method discussed here is the preferred method for assuring uniform response from the scanner, since the compensation can be done just prior to running, thereby compensating for differences due to age or wear.
- Feeding a white (blank) sheet of paper through the color scanner exercises all three color signals simultaneously. Because a white sheet of paper has a known and predictable spectral curve, the color gain coefficients can be programmed in such a manner as to allow the scanner to mimic this ideal response.
- Figures 3A and B show a flow chart for implementing white calibration. Step 80 requires microprocessor and RAM storage subsystem 52 (Fig.
- step 84 an operator feeds a white piece of paper through the color scanner in order to calibrate the response.
- step 86 the beginning of the page is detected and the calibration process begins.
- Color scanner 10 outputs a sequential three color data stream (R,G,B) as it scans each horizontal line of the white document. This information is digitized by A/D converters 32, 34 and 36, one for each color channel. The digitized- signals are sent to multipliers 38, 40 and 42 respectively.
- microprocessor 52 calculates the average red, green, and blue values for each pixel in step 96 by dividing each accumulator value by the line count (number of lines captured) . This information corresponds to the average color response for each horizontal pixel.
- red, green, and blue coefficients can be calculated for each pixel in step 98. This is done in order to "normalize” the response, which guarantees that each pixel responds in a similar fashion given a similar input.
- the gain coefficients are calculated by dividing the average R, .G, B response of each pixel into the ideal or optimum R, G, B response. The optimum response is based on the ideal R, G, B values for a "white” input.
- the apparatus is capable of compensating for any color or gain anomolies by multiplying each pixel's red, green, and blue video value by an image compensating coefficient.
- color scanner 10 outputs red, green, and blue signals for each horizontal pixel sequentially, and each color signal is digitized by A/D converters 32, 34 and 36.
- the digital grey scale color information for each pixel is then sent to multiplier circuits 38, 40 and 42 respectively.
- Microprocessor and RAM storage subsystem 52 recalls the unique R, G, B gain coefficients for each pixel in the horizontal scan and simultaneously presents these coefficients to the 3 multipliers, thereby multiplying each pixel's red, green, and blue values by their corresponding gain coefficient.
- the outputs of these multipliers represent the normalized red, green, and blue values for each pixel.
- the output of color scanner 10 is balanced for a correct and uniform spectral response.
- the present invention is useful for processing business forms in conjunction with optical character recognition systems as a way of separating text information on forms by automatic filtering of color information.
- This scanner system would separate all images into three primary * colors: red, green and blue.
- a black and white rendition of the image can be produced simply by adding the three color components.
- the invention is advantageous in eliminating the drop-out color variability problem associated with mechanical filter insertion. This variability can be caused by the color of the ink used on the forms varying from one printing batch to another such that the mechanical filter was ineffective in removing the printed text on the form printed with the out of tolerance ink.
- the present invention allows one to intermix documents of different color within a batch, as well as single documents having various drop-out colors (a form with red and blue preprinted information, for example) . Without the use of the present invention, it would be impossible to accomplish this using mechanical filter insertion, as practiced in the prior art.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
- Battery Mounting, Suspending (AREA)
Abstract
Le procédé et l'appareil décrits, qui servent à séparer un texte contenu dans un document préalablement imprimé, se fondent sur l'hypothèse selon laquelle le texte utilisé pour remplir des imprimés mécanographiques peut être classé comme texte 'écrit en une couleur à base de carbone', ce texte étant alors considéré comme écrit en noir quels que soient les filtres pour couleurs utilisés. Le texte en question peut alors être séparé de toutes les informations couleur préimprimées par utilisation d'un filtre 'toutes couleurs'.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US55729490A | 1990-07-24 | 1990-07-24 | |
US557,294 | 1990-07-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1992001998A1 true WO1992001998A1 (fr) | 1992-02-06 |
Family
ID=24224826
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1991/005040 WO1992001998A1 (fr) | 1990-07-24 | 1991-07-18 | Procede et appareil de separation automatique d'un texte utilisant une technique automatique de filtrage electronique des couleurs de moindre intensite multiples pour la reconnaissance optique de caracteres sur des documents preimprimes |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0491923A1 (fr) |
JP (1) | JPH05501778A (fr) |
WO (1) | WO1992001998A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7936488B2 (en) | 2008-02-15 | 2011-05-03 | Mitsubishi Electric Corporation | Image reading apparatus |
WO2013009530A1 (fr) * | 2011-07-08 | 2013-01-17 | Qualcomm Incorporated | Procédé et appareil de traitement parallèle permettant de déterminer des informations textuelles à partir d'une image |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5287442B2 (ja) * | 2009-04-07 | 2013-09-11 | 三菱電機株式会社 | 画像読取装置 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0375090A2 (fr) * | 1988-12-21 | 1990-06-27 | Recognition International Inc. | Système de traitement de documents |
-
1991
- 1991-07-18 EP EP91913232A patent/EP0491923A1/fr not_active Withdrawn
- 1991-07-18 WO PCT/US1991/005040 patent/WO1992001998A1/fr not_active Application Discontinuation
- 1991-07-18 JP JP3512695A patent/JPH05501778A/ja active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0375090A2 (fr) * | 1988-12-21 | 1990-06-27 | Recognition International Inc. | Système de traitement de documents |
Non-Patent Citations (4)
Title |
---|
PATENT ABSTRACTS OF JAPAN vol. 14, no. 437 (E-098)19 September 1990 & JP,A,2 170 674 ( MINOLTA CAMERA CO LTD ) 2 July 1990 see abstract * |
PATENT ABSTRACTS OF JAPAN vol. 6, no. 245 (P-159)3 December 1982 & JP,A,57 143 683 ( TOKYO SHIBAURA DENKI KK ) 4 September 1982 see abstract * |
PATENT ABSTRACTS OF JAPAN vol. 9, no. 9 (P-327)(1732) 16 January 1985 & JP,A,59 158 481 ( NIPPON DENKI KK ) 7 September 1984 * |
see abstract * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7936488B2 (en) | 2008-02-15 | 2011-05-03 | Mitsubishi Electric Corporation | Image reading apparatus |
WO2013009530A1 (fr) * | 2011-07-08 | 2013-01-17 | Qualcomm Incorporated | Procédé et appareil de traitement parallèle permettant de déterminer des informations textuelles à partir d'une image |
US9202127B2 (en) | 2011-07-08 | 2015-12-01 | Qualcomm Incorporated | Parallel processing method and apparatus for determining text information from an image |
Also Published As
Publication number | Publication date |
---|---|
EP0491923A1 (fr) | 1992-07-01 |
JPH05501778A (ja) | 1993-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5014328A (en) | Automatic detection and selection of a drop-out color used in conjunction with optical character recognition of preprinted forms | |
US5014329A (en) | Automatic detection and selection of a drop-out color using zone calibration in conjunction with optical character recognition of preprinted forms | |
EP0070161B1 (fr) | Dispositif et méthode pour la détermination adaptive d'un niveau de seuil | |
DE69325527T2 (de) | Gerät und Verfahren zur Bildverarbeitung | |
US4414581A (en) | Image signal processing method and apparatus therefor | |
DE3629195C2 (fr) | ||
US7580569B2 (en) | Method and system for generating contone encoded binary print data streams | |
US7436994B2 (en) | System of using neural network to distinguish text and picture in images and method thereof | |
US4825296A (en) | Method of and apparatus for copying originals in which an image to be printed is evaluated by observing a corresponding low-resolution video image | |
DE69631812T2 (de) | System und Verfahren für ein hochadressierbares Drucksystem | |
US6775031B1 (en) | Apparatus and method for processing images, image reading and image forming apparatuses equipped with the apparatus, and storage medium carrying programmed-data for processing images | |
EP0732842A2 (fr) | Dispositif de traitement d'images capable de déterminer correctement la densité d'une portion du fond | |
US6718059B1 (en) | Block selection-based image processing | |
US5892596A (en) | Image processing apparatus capable of reforming marker editing | |
DE19744501A1 (de) | Vorrichtung und Verfahren zur Kompensation von Bildern bei der Erfassung | |
WO1992001998A1 (fr) | Procede et appareil de separation automatique d'un texte utilisant une technique automatique de filtrage electronique des couleurs de moindre intensite multiples pour la reconnaissance optique de caracteres sur des documents preimprimes | |
US20060165292A1 (en) | Noise resistant edge detection | |
US6693731B1 (en) | Image processing apparatus and method | |
US6317802B1 (en) | System for converting raster image data contained in print data into raster image data having a resolution with which a stencil printer is capable of printing | |
US6307651B1 (en) | Image processing apparatus and method | |
EP0769868A2 (fr) | Système de traitement d'images | |
US5245446A (en) | Image processing system | |
JP2861089B2 (ja) | 画線付加装置 | |
EP3565232A1 (fr) | Procédé de génération d'images de sortie d'un dispositif de lecture d'images et dispositif de lecture d'images | |
JPH06178111A (ja) | 画像処理装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1991913232 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1991913232 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1991913232 Country of ref document: EP |