JPH07220023A

JPH07220023A - Table recognition method and apparatus thereof

Info

Publication number: JPH07220023A
Application number: JP6027443A
Authority: JP
Inventors: Takuya Okamoto; 卓哉岡本; Masatoshi Hino; 匡利樋野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-01-31
Filing date: 1994-01-31
Publication date: 1995-08-18

Abstract

(57)【要約】（修正有）【目的】表の画像データからその表のフィールドを認識
する方法及び装置の改良を目的とする。また、画像デー
タのフィールド位置が変動していた場合でも、その位置
を補正して適正に認識し、フィールド内の文字を認識す
る方法及び装置を提供する。【構成】ステップ１０１で入力された画像から、表の外
枠線を抽出する。この外枠線を基準として、ステップ１
０３でフォーマット情報データベース３１から順次フォ
ーマットデータを読み出し、フォーマットの４隅を外枠
の４隅と対応付けることで、罫線情報３２で表される罫
線を画像上に座標変換し、画像との一致度を評価する。
ステップ１０６で、評価結果から複数のフォーマットの
うちのいずれであるかを判定する。ステップ１０８で罫
線の位置補正を行ない、さらに画像から罫線を消去す
る。ステップ１０９で罫線情報とフィールド情報３３か
らフィールド位置を設定し、ステップ１１０でフィール
ド中の文字を認識し、結果をプリンタ７０に出力する。 (57) [Summary] (Correction) [Purpose] The objective is to improve the method and apparatus for recognizing fields in a table from image data in the table. Further, there is provided a method and apparatus for recognizing characters in a field by correcting the position of the image data even if the field position of the image data fluctuates and properly recognizing the position. [Structure] Outer frame lines of a table are extracted from the image input in step 101. Step 1 based on this outline
In 03, the format data is sequentially read from the format information database 31, and the four corners of the format are associated with the four corners of the outer frame, whereby the ruled lines represented by the ruled line information 32 are coordinate-converted on the image and the matching degree with the image evaluate.
In step 106, it is determined from the evaluation results which one of the plurality of formats. In step 108, the position of the ruled line is corrected, and the ruled line is deleted from the image. In step 109, the field position is set from the ruled line information and the field information 33, the characters in the field are recognized in step 110, and the result is output to the printer 70.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、表の画像データからそ
の表のフィールドや文字を認識する表認識方法及び装置
に関し、例えば、紙に記入された表形式データをスキャ
ナなどの光学的読み取り装置でコンピュータ内に画像デ
ータとして読み込み、その表構造や表中の文字を認識す
る表形式データ自動入力システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a table recognition method and apparatus for recognizing a table field and characters from a table image data, for example, an optical reading apparatus such as a scanner for reading the table format data written on a paper. Relates to a table format data automatic input system for recognizing the table structure and the characters in the table by reading it as image data in a computer.

【０００２】[0002]

【従来の技術】従来の紙に記入されたデータを入力する
技術としては、帳票ＯＣＲが一般的である。これは、ス
キャナ読み取り時にドロップアウトする色を用いて記入
枠を示した用紙を用いるものである。スキャナで読み取
る際に記入枠をドロップアウトし、その枠内に記入され
たデータだけを取りだし、データの記入位置は、帳票の
周囲に記入されたマークからの相対位置で求める。2. Description of the Related Art A form OCR is generally used as a conventional technique for inputting data written on paper. This uses a sheet of paper in which an entry frame is shown by using a color that drops out when the scanner is reading. When reading with a scanner, the entry frame is dropped out, and only the data entered in the frame is taken out, and the data entry position is determined by the relative position from the mark written around the form.

【０００３】近年のＦＡＸの普及に伴い、上記のような
帳票の入力をＦＡＸを用いて行なうシステムが開発され
ている。しかしながら、ＦＡＸでは、所定の色のみをド
ロップアウトすることがうまくできないことが多い。ま
たこのようなシステムにおいては、不特定多数のＦＡＸ
から様々な種類の画像データの送信が行なわれるため、
所定の帳票のみにドロップアウトの手法を用いるという
ことはできない。Along with the widespread use of FAX in recent years, a system for inputting the above-mentioned forms using FAX has been developed. However, with FAX, it is often difficult to drop out only a predetermined color. In such a system, an unspecified number of FAX
Since various types of image data are transmitted from
It is not possible to use the dropout method only for a predetermined form.

【０００４】したがって、入力された画像中の所定の位
置に記入枠が存在するものとして、画像より消去し、フ
ィールド内のデータを抽出するという手法が取られてい
る。Therefore, a method has been adopted in which it is assumed that the entry frame exists at a predetermined position in the input image, and the data is erased from the image and the data in the field is extracted.

【０００５】また、複数のフォーマットの帳票が混在し
て入力される状態においては、画像のフォーマットを解
析して、どのフォーマットであるかを判定する方式があ
る。例えば、特願平１−４９８００号に開示された帳票
認識装置などである。In addition, there is a method of analyzing the format of an image and determining which format it is in when a plurality of formats are mixed and input. For example, it is a form recognition device disclosed in Japanese Patent Application No. 1-49800.

【０００６】これらの方式では、画像から罫線あるいは
フィールド記入枠の要素を抽出し、これを組み立ててフ
ォーマットを推定し、指定されたフォーマットのどれに
当てはまるかを、チェックする。In these methods, elements of ruled lines or field entry frames are extracted from the image, assembled to estimate the format, and it is checked which of the specified formats is applicable.

【０００７】しかし、入力された画像中には多くのノイ
ズが存在し、罫線の途切れなどがある場合も少なくな
い。したがって、良好にフォーマットを推定できるとは
限らず、また処理時間もかかるものとなる。However, many noises are present in the input image, and there are many cases in which ruled lines are discontinued. Therefore, it is not always possible to estimate the format satisfactorily, and it also takes processing time.

【０００８】[0008]

【発明が解決しようとする課題】現在、このようなシス
テムで問題になるのは、紙の搬送速度のむらや歪みのた
めに、データの記入位置も非線形に変動するということ
である。At present, a problem with such a system is that the data writing position also changes non-linearly due to unevenness and distortion of the paper conveyance speed.

【０００９】この問題を解決するために、フィールドの
非線形なずれを検出しながら、フォーマットに示された
データの記入位置を補正する必要がある。また、この問
題は、画像を光学的にコンピュータに読み込む際に、ど
のような読み込み方法を用いても起きうる問題である。In order to solve this problem, it is necessary to correct the writing position of the data shown in the format while detecting the non-linear shift of the field. In addition, this problem is a problem that may occur when any image reading method is used when the image is optically read into the computer.

【００１０】このようなフィールドの位置のずれを補正
する方式としては、フォーマットの基準点の位置を検出
し、このずれからフィールドの概略位置を設定し、フィ
ールドの周辺画像から、フィールド位置のずれを補正す
る方式がある。As a method of correcting such a position shift of the field, the position of the reference point of the format is detected, the approximate position of the field is set from this shift, and the shift of the field position is determined from the peripheral image of the field. There is a correction method.

【００１１】しかし、このような方式では、フィールド
の罫線の近くに数字の１などが記入されることにより、
フィールド位置の判定を誤るといった問題があり、これ
が認識率低下の原因となっている。However, in such a system, the number 1 is written near the ruled line of the field,
There is a problem that the field position is erroneously determined, which causes a reduction in recognition rate.

【００１２】本発明は、表の画像データからその表のフ
ィールドを認識する方法及び装置の改良を目的とする。
また、本発明は、例えば紙の搬送速度のむらや歪みなど
のために、入力した表の画像データのフィールド位置が
非線形に変動していた場合であっても、フィールドの位
置を補正してその位置を適正に高精度に認識しフィール
ド内の文字を認識することができる表認識方法及びその
装置を提供することを目的とする。It is an object of the present invention to improve a method and apparatus for recognizing a table field from a table image data.
Further, the present invention corrects the position of the field even if the field position of the input image data of the table changes non-linearly due to, for example, unevenness or distortion of the paper conveyance speed. It is an object of the present invention to provide a table recognition method and apparatus capable of appropriately recognizing characters with high accuracy and recognizing characters in a field.

【００１３】[0013]

【課題を解決するための手段】本発明では、あらかじめ
表のフォーマットを示すフォーマット情報を用意してお
き、そのフォーマット情報で示される罫線を画像データ
上の座標に変換し、座標変換された罫線と画像データと
を用いて、上記のような画像の非線形な位置ずれを検出
する。そして、検出した位置ずれに基づいて罫線の位置
を補正する。フォーマット情報の座標変換は、表のフォ
ーマットの概略位置を設定するための基準線を画像中か
ら抽出し、これを基にして画像上の座標に変換する。According to the present invention, format information indicating the format of a table is prepared in advance, the ruled line indicated by the format information is converted into coordinates on the image data, and the coordinate-converted ruled line is obtained. The image data and the image data are used to detect the non-linear positional deviation of the image as described above. Then, the position of the ruled line is corrected based on the detected positional deviation. In the coordinate conversion of the format information, a reference line for setting a rough position of the format of the table is extracted from the image, and based on this, the coordinate is converted into coordinates on the image.

【００１４】画像上に概略のフォーマットをあてはめた
後に、画像上の枠線の要素を抽出し、これとフォーマッ
トを対応付ける。また、枠線に合わせてフォーマットを
補正し、枠線は消去することで、文字パターンにノイズ
が混入するのを防止する。After applying the rough format on the image, the element of the frame line on the image is extracted and the format is associated with it. Further, by correcting the format according to the frame line and deleting the frame line, it is possible to prevent noise from being mixed in the character pattern.

【００１５】フォーマットは、縦、横それぞれ２本づつ
の基準線を基にした、相対座標で設定することにより、
紙のサイズにも、スキャナの読み取り線密度に依存しな
いフォーマット情報の設定が可能である。The format is set by relative coordinates based on two reference lines each for vertical and horizontal,
It is possible to set format information that does not depend on the reading line density of the scanner even for the paper size.

【００１６】また、フィールド単位ではなく、罫線単位
での補正を行うことで、フィールド内の文字によるフィ
ールド位置の誤抽出を防ぐ。Further, correction is performed not for each field but for each ruled line to prevent erroneous extraction of a field position due to characters in the field.

【００１７】また、上記の方式で、複数のフォーマット
を識別する場合、フォーマット数だけフォーマット情報
が必要となる。したがって、フォーマットの１部が異な
るために、多数のフォーマットを用意しなければならな
い場合がある。その場合は、異なる部分のフォーマット
だけを、別のデータベースに持つことで、フォーマット
数の増加を防ぐ。When a plurality of formats are identified by the above method, format information corresponding to the number of formats is required. Therefore, it may be necessary to prepare a large number of formats because one part of the formats is different. In that case, by increasing the number of formats by holding only the formats of different parts in another database.

【００１８】[0018]

【作用】本発明を利用することにより、帳票のデータ記
入位置の高精度な抽出が可能となり、画像の非線形な歪
みによって発生するフィールド抽出（フィールドの位置
の認識）の誤認識を減少することができる。By using the present invention, it is possible to extract the data entry position of a form with high accuracy, and reduce erroneous recognition of field extraction (recognition of field position) caused by non-linear distortion of an image. it can.

【００１９】複数のフォーマットが混在する場合でも、
高速かつ高精度なフォーマット判定が可能である。Even when a plurality of formats are mixed,
High-speed and highly accurate format determination is possible.

【００２０】また、フォーマットの１部が異なる場合で
も、多数のフォーマットを用意する必要がなく、フォー
マットのマッチングのための処理量が減少し、処理が高
速化される。Further, even if a part of the formats is different, it is not necessary to prepare a large number of formats, the processing amount for format matching is reduced, and the processing speed is increased.

【００２１】[0021]

【実施例】以下、図面を用いて本発明の実施例を説明す
る。Embodiments of the present invention will be described below with reference to the drawings.

【００２２】図１は、本発明の第１の実施例に係る表認
識システムのシステム構成である。本実施例のシステム
は、ＣＰＵ１０、メモリ２０、ハードディスク３０、ハ
ードディスクコントローラ４０、端末５０、端末コント
ローラ５１、スキャナ６０、ディスク装置６１、画像入
力コントローラ６２、プリンタ７０、プリンタコントロ
ーラ７１、及びシステムバス８０を備えている。FIG. 1 is a system configuration of a table recognition system according to the first embodiment of the present invention. The system of this embodiment includes a CPU 10, a memory 20, a hard disk 30, a hard disk controller 40, a terminal 50, a terminal controller 51, a scanner 60, a disk device 61, an image input controller 62, a printer 70, a printer controller 71, and a system bus 80. I have it.

【００２３】メモリ２０中には、表認識処理プログラム
２１、及びデータ格納領域２２が設けられている。ＣＰ
Ｕ１０は、システムの全体の管理、制御、及び、表認識
処理プログラム２１の実行を行う。ハードディスク３０
中には、認識する表のフォーマット情報が、フォーマッ
ト情報データベース３１として格納されている。ハード
ディスクコントローラ４０は、ハードディスク３０を制
御し、ＣＰＵ１０からの制御によりハードディスク３０
のデータの入出力を行う。A table recognition processing program 21 and a data storage area 22 are provided in the memory 20. CP
The U10 manages and controls the entire system and executes the table recognition processing program 21. Hard disk 30
The format information of the table to be recognized is stored therein as the format information database 31. The hard disk controller 40 controls the hard disk 30, and the hard disk 30 is controlled by the CPU 10.
Input and output the data of.

【００２４】端末５０から、人間が処理の指示、結果の
表示チェックなどを行う。端末５０は、端末コントロー
ラ５１で制御され、これによりキーボードからの入力及
びディスプレイへの表示などが行われる。スキャナ６
０、及びディスク装置６１などから、画像入力コントロ
ーラ６２を通して、画像が入力される。入力された画像
データは、メモリ２０のデータ格納領域２２に格納され
る。From the terminal 50, a human gives a processing instruction and checks the display of the result. The terminal 50 is controlled by the terminal controller 51, whereby input from the keyboard and display on the display are performed. Scanner 6
An image is input from 0, the disk device 61, or the like through the image input controller 62. The input image data is stored in the data storage area 22 of the memory 20.

【００２５】プリンタ７０には、プリンタコントローラ
７１を通してデータが送られ、認識結果が出力される。
システムバス８０は、各種データの通信を行うためのも
のである。Data is sent to the printer 70 through the printer controller 71, and the recognition result is output.
The system bus 80 is for communicating various data.

【００２６】図２は、表認識処理の手順を示すフローチ
ャートである。まず、全体の処理の流れについて説明す
る。FIG. 2 is a flowchart showing the procedure of table recognition processing. First, the overall processing flow will be described.

【００２７】スキャナ６０あるいはディスク装置６１の
ディスクから表の画像データを読み込み、メモリ２０上
のデータ格納領域２２に格納する（ステップ１０１）。
次に、読み込まれた画像データをサーチし、表の外枠線
を抽出する（ステップ１０２）。本実施例においては、
この表外枠をフォーマット変換の基準線とする。ステッ
プ１０２の処理については、図７を用いて後に詳述す
る。The image data of the table is read from the scanner 60 or the disk of the disk device 61 and stored in the data storage area 22 on the memory 20 (step 101).
Next, the read image data is searched to extract the outer frame line of the table (step 102). In this embodiment,
This outer frame is used as a reference line for format conversion. The process of step 102 will be described later in detail with reference to FIG.

【００２８】次に、フォーマット罫線と入力画像の一致
度を評価する（ステップ１０３）。これは、以下のよう
な処理である。まず、フォーマット情報データベース３
１から、１つのフォーマット情報を読み出す。１つのフ
ォーマット情報は、１つの表のフォーマットを表す罫線
情報３２とフィールド情報３３とからなるが、詳しくは
図３〜６を用いて後述する。読み出したフォーマット情
報の罫線情報３２から得られる罫線を、入力した画像上
の罫線とマッチングし、その対応する罫線数の全罫線数
に占める割合に基づいて一致度を求め、メモリ２０上に
格納する。このステップ１０３の処理については、図１
１を用いて後に詳述する。Next, the matching degree between the format ruled line and the input image is evaluated (step 103). This is the following processing. First, format information database 3
One format information is read from 1. One piece of format information consists of ruled line information 32 and field information 33 representing the format of one table, which will be described later in detail with reference to FIGS. The ruled line obtained from the ruled line information 32 of the read format information is matched with the ruled line on the input image, the degree of coincidence is calculated based on the ratio of the corresponding number of ruled lines to the total number of ruled lines, and stored in the memory 20. . The process of step 103 is shown in FIG.
It will be described in detail later using 1.

【００２９】次に、データベース３１中に格納された全
てのフォーマット情報について、ステップ１０３を実行
したか否かを判別する（ステップ１０４）。全てのフォ
ーマット情報についてステップ１０３を実行していたら
ステップ１０６に進み、そうでないときはステップ１０
３に戻って、次のフォーマット情報を読み出して画像の
一致度の評価を続ける。Next, it is determined whether or not step 103 has been executed for all the format information stored in the database 31 (step 104). If step 103 has been executed for all the format information, go to step 106, otherwise go to step 10.
Returning to step 3, the next format information is read out and the evaluation of the degree of coincidence of the image is continued.

【００３０】次に、評価結果（フォーマット数と同数だ
け求められた評価値）から、入力された画像（の表）が
複数のフォーマットのいずれであるかを判定する（ステ
ップ１０６）。ステップ１０６の判定の処理について
は、図１５を用いて後に詳述する。Next, from the evaluation results (evaluation values obtained by the same number as the number of formats), it is judged which of the plurality of formats the input image (table) is (step 106). The determination process of step 106 will be described later in detail with reference to FIG.

【００３１】ステップ１０６の判定で一致するフォーマ
ットがあったか否かを判別し（ステップ１０７）、一致
するフォーマットがなければ、ステップ１０８以降の認
識処理を行わずに処理を終了する。一致するフォーマッ
トがあったときは、そのフォーマットを用いて、ステッ
プ１０８以降の認識処理を行う。In step 106, it is judged whether or not there is a matching format (step 107). If there is no matching format, the processing is terminated without performing the recognition processing after step 108. If there is a matching format, the recognition processing from step 108 onward is performed using that format.

【００３２】まず、該当するフォーマットの罫線情報３
２を読み出し、入力された画像データの罫線とマッチン
グさせるとともに罫線の位置補正を行い、さらに画像デ
ータから罫線部分を消去する（ステップ１０８）。ステ
ップ１０８の処理については、図１６を用いて後に詳述
する。First, ruled line information 3 of the corresponding format
2 is read out, the ruled lines are matched with the ruled lines of the input image data, the position of the ruled lines is corrected, and the ruled line portion is deleted from the image data (step 108). The process of step 108 will be described later in detail with reference to FIG.

【００３３】次に、罫線の補正結果と、該当フォーマッ
トのフィールド情報３３とから、フィールド位置を設定
する（ステップ１０９）。ステップ１０９の処理につい
ては、図１８を用いて後に詳述する。Next, a field position is set from the ruled line correction result and the field information 33 of the corresponding format (step 109). The process of step 109 will be described later in detail with reference to FIG.

【００３４】さらに、認識対象の各フィールド中の画像
から文字を認識し、認識結果をプリンタ７０に出力する
（ステップ１１０）。ステップ１１０の処理について
は、図２０を用いて後に詳述する。Further, characters are recognized from the image in each field to be recognized, and the recognition result is output to the printer 70 (step 110). The process of step 110 will be described later in detail with reference to FIG.

【００３５】以上が、表認識処理の概略である。The above is the outline of the table recognition process.

【００３６】図３及び図４は、本実施例の表認識システ
ムで認識する表のフォーマットパターンの例である。3 and 4 show examples of table format patterns recognized by the table recognition system of this embodiment.

【００３７】これらのフォーマットは、フォーマット情
報データベース３１として、それぞれのフォーマットご
とに、罫線情報３２とフィールド情報３３とに分けて格
納されている。罫線情報３２は、表を構成する罫線の位
置の情報である。位置の情報は、表の領域の左上を原点
（０，０）、右下を（１００００，１００００）とし
た、直交（ｘｙ）座標系で表す。フィールド情報３３
は、罫線で囲まれた領域であるフィールドの上下左右の
罫線と、その領域の種別などの情報で構成される。These formats are separately stored in the format information database 31 as ruled line information 32 and field information 33 for each format. The ruled line information 32 is information on the positions of the ruled lines forming the table. The position information is represented by an orthogonal (xy) coordinate system in which the upper left of the table area is the origin (0,0) and the lower right is (10000,10000). Field information 33
Is composed of ruled lines at the top, bottom, left and right of a field which is an area surrounded by ruled lines, and information such as the type of the area.

【００３８】図５に、罫線情報３２の内容を示す。FIG. 5 shows the contents of the ruled line information 32.

【００３９】罫線情報３２は、縦線情報３２−ａと横線
情報３２−ｂとに分けて格納されている。The ruled line information 32 is divided into vertical line information 32-a and horizontal line information 32-b and stored.

【００４０】縦線情報３２−ａ及び横線情報３２−ｂと
も、罫線の１本ごとに、その罫線の番号５０１−ａ，５
０１−ｂ、及びその罫線の始点の座標５０２−ａ，５０
２−ｂと終点の座標５０３−ａ，５０３−ｂを持つ。さ
らに、縦線情報３２−ａでは、その縦線の始点で交わる
横線の番号５０４−ａと、その縦線の終点で交わる横線
の番号５０５−ａとを持つ。また、横線情報３２−ｂで
は、その横線の始点で交わる縦線の番号５０４−ｂと、
その横線の終点で交わる縦線の番号５０５−ｂとを持
つ。Both the vertical line information 32-a and the horizontal line information 32-b have a ruled line number 501-a, 5 for each ruled line.
01-b and the coordinates 502-a, 50 of the starting point of the ruled line
2-b and coordinates 503-a and 503-b of the end point. Further, the vertical line information 32-a has a horizontal line number 504-a that intersects at the start point of the vertical line and a horizontal line number 505-a that intersects at the end point of the vertical line. In the horizontal line information 32-b, a vertical line number 504-b intersecting at the starting point of the horizontal line,
It has a vertical line number 505-b that intersects at the end of the horizontal line.

【００４１】縦線及び横線の本数は、それぞれ、縦線本
数情報５０６−ａ、横線本数情報５０６−ｂに格納され
ている。The numbers of vertical lines and horizontal lines are stored in vertical line number information 506-a and horizontal line number information 506-b, respectively.

【００４２】図６に、フィールド情報３３の内容を示
す。FIG. 6 shows the contents of the field information 33.

【００４３】フィールド情報３３には、フィールドを特
定する番号であるフィールド番号６０１、フィールド上
線（そのフィールドの上側の横線）の番号６０２、フィ
ールド下線（そのフィールドの下側の横線）の番号６０
３、フィールド左線（そのフィールドの左側の縦線）の
番号６０４、及びフィールド右線（そのフィールドの右
側の縦線）の番号６０５が格納されている。これらの番
号６０２〜６０５は、図５の罫線情報３２の罫線の番号
５０１−ａ，５０１−ｂによって表される。すなわち、
フィールドを囲む４本の罫線の罫線情報に基づいて、そ
のフィールドの位置が同定されることになる。The field information 33 includes a field number 601, which is a number for specifying a field, a field upper line (horizontal line above the field) number 602, and a field underline (horizontal line below the field) number 60.
3, a field left line (vertical line on the left side of the field) number 604 and a field right line (vertical line on the right side of the field) number 605 are stored. These numbers 602 to 605 are represented by the ruled line numbers 501-a and 501-b of the ruled line information 32 in FIG. That is,
The position of the field is identified based on the ruled line information of four ruled lines surrounding the field.

【００４４】さらに、フィールド情報３３には、フィー
ルド種別６０６、フィールド標題６０７、及びフィール
ド個数６０８が格納されている。Further, the field information 33 stores a field type 606, a field title 607, and a field number 608.

【００４５】フィールド種別６０６は、そのフィールド
の内容を定義する。フィールド種別６０６を用いて、認
識結果をデータベースに格納する際の分類を行ったり、
フィールドの属性（数字しかない、ある限定された文字
しか出現しないなど）を定義することができる。フィー
ルド標題６０７は、そのフィールドの認識結果を出力す
る際に、そのフィールドの内容が何なのか人間にわかる
ように表示するために利用する標題である。The field type 606 defines the contents of the field. The field type 606 is used to perform classification when storing the recognition result in the database,
You can define the attributes of the field (only numbers, only certain restricted characters, etc.). The field title 607 is a title used for displaying the result of recognition of the field so that a person can understand what the content of the field is.

【００４６】図７は、図２のステップ１０２の表外枠抽
出処理の詳細なフローチャートである。以下、ステップ
ごとに処理の内容を説明する。FIG. 7 is a detailed flowchart of the outer frame extraction processing in step 102 of FIG. The contents of the process will be described below step by step.

【００４７】ステップ７０１：画像中から縦線の位置を
抽出するために、画像データ中の黒画素を縦方向（ｙ軸
方向）に投影する。この投影は、すべてのｘ座標につい
て、ｙ軸方向に黒画素の個数の総和をとることにより行
う。Step 701: In order to extract the position of the vertical line from the image, the black pixels in the image data are projected in the vertical direction (y-axis direction). This projection is performed by taking the total number of black pixels in the y-axis direction for all x coordinates.

【００４８】ステップ７０２：ステップ７０１で求めた
投影値が、所定の閾値以上になるピーク点（ｘ座標）を
すべて求める。求めたピーク点の周辺に縦線が存在する
ことになる。また、求めたピーク点のうちｘ座標が最小
のピーク点を求める。そのピーク点の周辺に、表外枠の
左線が存在することになる。さらに、求めたピーク点の
うちｘ座標が最大のピーク点を求める。そのピーク点の
周辺に、表外枠の右線が存在することになる。Step 702: All peak points (x coordinates) where the projection values obtained in step 701 are equal to or more than a predetermined threshold value are obtained. There will be vertical lines around the calculated peak point. Further, among the obtained peak points, the peak point having the smallest x coordinate is obtained. The left line of the outer frame exists around the peak point. Further, of the obtained peak points, the peak point with the maximum x coordinate is obtained. The right line of the outer frame exists around the peak point.

【００４９】ただし、画像の周辺には、ノイズが存在す
る場合があるので、ピーク点の抽出範囲は、画像の両端
から一定距離以上離れていなければならないとする。こ
の値は、画像入力する機器や、入力されるデータ中の表
のサイズなどにより適切な値を設定する。However, since noise may exist around the image, it is assumed that the peak point extraction range must be separated from both ends of the image by a certain distance or more. As this value, an appropriate value is set depending on the image input device, the size of the table in the input data, and the like.

【００５０】ステップ７０３：ステップ７０２で求めた
表外枠線に対応する左右のピーク点（ｘ座標）のそれぞ
れについて、ピーク点から左右方向の一定の値の範囲を
縦線、すなわち左右の外枠縦線の存在する範囲（ｘ座標
の値）とする。Step 703: For each of the left and right peak points (x coordinates) corresponding to the outer frame lines obtained in step 702, a range of constant values in the left and right direction from the peak point is a vertical line, that is, the left and right outer frames. The range (value of x coordinate) where the vertical line exists.

【００５１】ステップ７０４：すべてのｙ座標の値ごと
に、ステップ７０３で設定された左側の外枠縦線の存在
する範囲で、左側（ｘ座標の小さい方）から画像をサー
チし、最初に白画素から黒画素へ変化する変化点の座標
をすべて求める。右側の外枠縦線についても、同様にし
て、ステップ７０３で設定された右側の外枠縦線の存在
する範囲で右側（ｘ座標の大きい方）から画像をサーチ
し、最初に白画素から黒画素へと変化する変化点の座標
をすべて求める。Step 704: An image is searched for from the left side (smaller x-coordinate) within the range where the outer frame vertical line on the left side set in step 703 exists for each value of y-coordinates, and white is first searched. Find all the coordinates of the change points that change from pixels to black pixels. Similarly, for the right outer frame vertical line, an image is searched for from the right side (the one with the larger x coordinate) within the range where the right outer frame vertical line set in step 703 exists, and white pixels are first changed to black. Find all the coordinates of the change points that change to pixels.

【００５２】ステップ７０５：左側外枠縦線及び右側外
枠縦線のそれぞれについて、ステップ７０４で得られた
変化点の座標すべてについての近似直線を求める。Step 705: For each of the left outer frame vertical line and the right outer frame vertical line, approximate straight lines are obtained for all the coordinates of the change points obtained in step 704.

【００５３】変化点の座標を、（ｘ［ｉ］，ｙ［ｉ］）
（０≦ｉ＜Ｎ（＝変化点の数））とすると、近似直線は
次の数式１で求められる。The coordinates of the change point are (x [i], y [i])
If (0 ≦ i <N (= number of change points)), the approximate straight line is obtained by the following mathematical formula 1.

【００５４】[0054]

【数１】 [Equation 1]

【００５５】以上のステップ７０１〜７０５により、入
力画像の表の左右の外枠縦線が求められたことになる。By the above steps 701 to 705, the left and right outer frame vertical lines in the table of the input image are obtained.

【００５６】ステップ７０６：画像中から横線の位置を
抽出するために、画像データ中の黒画素を横方向（ｘ軸
方向）に投影する。この投影は、すべてのｙ座標につい
て、ｘ軸方向に黒画素の個数の総和をとることにより行
う。Step 706: In order to extract the position of the horizontal line from the image, the black pixels in the image data are projected in the horizontal direction (x-axis direction). This projection is performed by taking the total number of black pixels in the x-axis direction for all y coordinates.

【００５７】ステップ７０７：ステップ７０６で求めた
投影値が、所定の閾値以上になるピーク点（ｙ座標）を
すべて求める。求めたピーク点の周辺に横線が存在する
ことになる。また、求めたピーク点のうちｙ座標が最小
のピーク点を求める。そのピーク点の周辺に、表外枠の
上線が存在することになる。さらに、求めたピーク点の
うちｙ座標が最大のピーク点を求める。そのピーク点の
周辺に、表外枠の下線が存在することになる。Step 707: All the peak points (y coordinates) where the projection values obtained in step 706 are equal to or more than a predetermined threshold value are obtained. A horizontal line exists around the obtained peak point. Further, among the obtained peak points, the peak point having the smallest y coordinate is obtained. The outer line of the outer frame will exist around the peak point. Further, among the obtained peak points, the peak point having the largest y coordinate is obtained. The underline of the outer frame exists around the peak point.

【００５８】ただし、画像の周辺には、ノイズが存在す
る場合があるので、ピーク点の抽出範囲は、画像の両端
から一定距離以上離れていなければならないとする。こ
の値は、画像入力する機器や、入力されるデータ中の表
のサイズなどにより適切な値を設定する。However, since noise may exist around the image, it is assumed that the extraction range of the peak points must be apart from both ends of the image by a certain distance or more. As this value, an appropriate value is set depending on the image input device, the size of the table in the input data, and the like.

【００５９】ステップ７０８：ステップ７０７で求めた
表外枠線に対応する上下のピーク点（ｙ座標）のそれぞ
れについて、ピーク点から上下方向の一定の値の範囲を
横線、すなわち上下の外枠横線の存在する範囲（ｙ座標
の値）とする。Step 708: For each of the upper and lower peak points (y-coordinates) corresponding to the outer frame line obtained in step 707, a horizontal line represents a range of constant values in the vertical direction from the peak point, that is, upper and lower outer frame horizontal lines. Is a range (value of y coordinate) in which exists.

【００６０】ステップ７０９：すべてのｘ座標の値ごと
に、ステップ７０８で設定された上側の外枠横線の存在
する範囲で、上側（ｙ座標の小さい方）から画像をサー
チし、最初に白画素から黒画素へ変化する変化点の座標
をすべて求める。下側の外枠横線についても、同様にし
て、ステップ７０８で設定された下側の外枠横線の存在
する範囲で下側（ｙ座標の大きい方）から画像をサーチ
し、最初に白画素から黒画素へと変化する変化点の座標
をすべて求める。Step 709: An image is searched from the upper side (smaller y-coordinate) within the range where the upper outer frame horizontal line set in step 708 exists for each x-coordinate value, and the white pixel is first searched. Find all the coordinates of the change point that changes from black pixel to black pixel. Similarly, for the lower outer frame horizontal line, an image is searched from the lower side (larger y coordinate) within the range where the lower outer frame horizontal line set in step 708 exists, and the white pixel is first searched. Find all the coordinates of the change points that change to black pixels.

【００６１】ステップ７１０：上側外枠横線及び下側外
枠横線のそれぞれについて、ステップ７０９で得られた
変化点の座標すべてについての近似直線を求める。Step 710: For each of the upper outer frame horizontal line and the lower outer frame horizontal line, approximate straight lines are obtained for all the coordinates of the change points obtained in step 709.

【００６２】変化点の座標を、（ｘ［ｉ］，ｙ［ｉ］）
（０≦ｉ＜変化点の数）とすると、近似直線は次の数式
２で求められる。The coordinates of the change point are (x [i], y [i])
If (0 ≦ i <number of change points), the approximate straight line is obtained by the following mathematical formula 2.

【００６３】[0063]

【数２】 [Equation 2]

【００６４】以上のステップ７０６〜７１０により、入
力画像の表の上下の外枠横線が求められたことになる。By the above steps 706 to 710, the upper and lower outer frame horizontal lines in the table of the input image are obtained.

【００６５】図８及び図９は、図７のステップ７０１〜
７０５の手順により入力画像から外枠縦線を抽出してい
る様子を図示したものである。横線の抽出も、縦線の抽
出と同様の手順で求められるので、説明は省略する。8 and 9 show steps 701 to 701 of FIG.
7 illustrates a state in which outer frame vertical lines are extracted from an input image by the procedure of 705. Since the extraction of the horizontal line is also performed by the same procedure as the extraction of the vertical line, the description will be omitted.

【００６６】図８は、画像データから表外枠の縦線の存
在する範囲を求めた結果である。FIG. 8 shows the result of obtaining the range where the vertical line of the outer frame exists from the image data.

【００６７】画像データ８０１中に、表データが格納さ
れている。この画像８０１は、左上を原点とし、右方向
にｘ軸、下方向にｙ軸を持つ、直交座標系である。８０
２は、図７のステップ７０１でこの画像の黒画素を投影
した結果である。８０３は、ステップ７０２でピーク点
を判定するための閾値である。８０４が、最も左のピー
ク位置である。点線８０５から点線８０６の範囲が、ピ
ーク点８０４から左右に得られる表外枠左線の存在する
範囲である。この範囲は、ステップ７０３で設定され
る。この間をサーチして、表左線を見つける。Table data is stored in the image data 801. This image 801 is an orthogonal coordinate system with the upper left as the origin, the x axis in the right direction, and the y axis in the downward direction. 80
2 is the result of projecting the black pixels of this image in step 701 of FIG. 803 is a threshold value for determining the peak point in step 702. 804 is the leftmost peak position. The range from the dotted line 805 to the dotted line 806 is the range in which the outer frame left line obtained from the peak point 804 on the left and right exists. This range is set in step 703. A search is made during this period to find the left line in the table.

【００６８】図９は、表外枠左線を求める様子を示して
いる。FIG. 9 shows how to find the left line of the outer frame.

【００６９】図８の点線８０５と点線８０６に挾まれた
範囲の、画素の各ライン（各ｙ座標ごと）について、点
線８０５の位置から順に右方向に画素を追跡する。そし
て、最初に見つかった、白画素から黒画素への変化点を
外枠の位置とする。この位置は、図７のステップ７０４
で求められる。各ラインで見つかった点を、上述の数式
１で、得られる直線で近似することで、表外枠左線９０
１が得られる。For each line of pixels (for each y coordinate) in the range between the dotted lines 805 and 806 in FIG. 8, the pixels are traced rightward from the position of the dotted line 805. Then, the change point from the white pixel to the black pixel that is found first is set as the position of the outer frame. This position corresponds to step 704 in FIG.
Required by. By approximating the point found in each line with the obtained straight line by the above-mentioned mathematical formula 1, the left line 90
1 is obtained.

【００７０】図１０は、図８の画像８０１から、表外枠
の上下左右の４線を求めた結果を示す。FIG. 10 shows the results obtained by obtaining the four lines above, below, left and right of the outer frame from the image 801 of FIG.

【００７１】表の外枠の、上下左右の線が得られたの
で、表の外枠は、この横線２本（上線、下線）と、縦線
２本（左線、右線）の交点で表現する。表の左上の交点
１００１の座標を（ｘｌｕ，ｙｌｕ）、右上の交点１０
０２の座標を（ｘｒｕ，ｙｒｕ）、左下の交点１００３
の座標を（ｘｌｄ，ｙｌｄ）、右下の交点１００４の座
標を（ｘｒｄ，ｙｒｄ）とする。Since the upper, lower, left, and right lines of the outer frame of the table were obtained, the outer frame of the table is the intersection of the two horizontal lines (upper line, underline) and the two vertical lines (left line, right line). Express. The coordinates of the intersection 1001 on the upper left of the table are (xlu, ylu), and the intersection 10 on the upper right
The coordinate of 02 is (xru, yru), and the intersection 1003 at the lower left
Is (xld, yld), and the coordinate of the lower right intersection 1004 is (xrd, yrd).

【００７２】図１１は、図２のステップ１０３のフォー
マット一致度評価処理の詳細なフローチャートである。
フォーマット一致度は、フォーマットの表を構成する各
罫線について、順次、入力画像とマッチングして、画像
中に存在するかどうかを調べ、存在する罫線の数のフォ
ーマットの全罫線に占める割合を求めることで得る。以
下、各ステップごとに処理の内容を説明する。FIG. 11 is a detailed flowchart of the format matching degree evaluation processing in step 103 of FIG.
For format matching, each ruled line that makes up the format table is sequentially matched with the input image to check if they exist in the image, and find the ratio of the number of existing ruled lines to all the ruled lines of the format. Get at. The contents of the process will be described below for each step.

【００７３】ステップ１１０１：カウンターＣＲＴとＥ
ＲＲを、０でクリアする。ＣＲＴは罫線情報３２の罫線
が入力画像中に存在した場合にその数をカウントするた
めのカウンタ、ＥＲＲは入力画像中に存在しなかった罫
線のカウンタである。Step 1101: Counter CRT and E
Clear RR to 0. The CRT is a counter for counting the number of ruled lines in the ruled line information 32 when they exist in the input image, and the ERR is a counter for ruled lines that did not exist in the input image.

【００７４】ステップ１１０２：マッチング対象のフォ
ーマットの罫線情報３２から、罫線を１本読みだす。罫
線情報３２は、図５に示したように、表の左上の座標を
（０，０）、右下の座標を（１００００，１００００）
とした座標系で罫線の始終点の座標を持っている。一
方、図２のステップ１０２（図７）で、図１０に示すよ
うに画像８０１の左上を（０，０）とする座標系で表の
外枠が得られているので、これに基づいて罫線情報３２
から読み出した罫線を座標変換する。Step 1102: One ruled line is read from the ruled line information 32 of the matching target format. As for the ruled line information 32, as shown in FIG. 5, the upper left coordinate of the table is (0, 0) and the lower right coordinate is (10000, 10000).
The coordinate system has the coordinates of the start and end points of the ruled line. On the other hand, in step 102 (FIG. 7) of FIG. 2, since the outer frame of the table is obtained in the coordinate system in which the upper left of the image 801 is (0, 0) as shown in FIG. 10, the ruled line is based on this. Information 32
The coordinate of the ruled line read from is converted.

【００７５】表の４隅の点の画像中での座標を基に、罫
線情報３２に示された罫線の始終点の座標を画像中の座
標に変換すると、下記の数式３のようになる。When the coordinates of the start and end points of the ruled line indicated by the ruled line information 32 are converted into the coordinates in the image based on the coordinates of the four corner points in the table in the image, the following formula 3 is obtained.

【００７６】[0076]

【数３】 [Equation 3]

【００７７】ステップ１１０４：座標変換した罫線を、
入力画像とマッチングし、評価値を得る。マッチングの
処理内容については、図１２を参照して後に詳細に説明
する。Step 1104: The coordinate-converted ruled line is
Matches the input image and obtains the evaluation value. Details of the matching processing will be described later with reference to FIG.

【００７８】ステップ１１０５：ステップ１１０４の結
果、罫線が存在すると判定されたときは、ステップ１１
０６に進む。そうでなければ、罫線無しとして、ステッ
プ１１０７に進む。Step 1105: If the result of step 1104 is that a ruled line exists, step 11
Proceed to 06. Otherwise, it is determined that there is no ruled line and the process proceeds to step 1107.

【００７９】ステップ１１０６：画像中に罫線情報３２
の罫線が存在したので、存在した罫線数のカウンタＣＲ
Ｔに１を加える。Step 1106: Ruled line information 32 in the image
Since there is a ruled line, the counter CR for the number of ruled lines that existed
Add 1 to T.

【００８０】ステップ１１０７：画像中に罫線情報３２
の罫線が存在しなかったので、存在しない罫線数のカウ
ンタＥＲＲに１を加える。Step 1107: Ruled line information 32 in the image
Since there is no such ruled line, 1 is added to the counter ERR for the number of ruled lines that does not exist.

【００８１】ステップ１１０８：罫線情報３２の全ての
罫線についてマッチングが終了したか否か判定する。終
了していなければ、ステップ１１０２に戻って次の罫線
について同様に処理する。全ての罫線についてのマッチ
ングが終われば、ステップ１１０９に進む。Step 1108: It is determined whether or not matching has been completed for all ruled lines in the ruled line information 32. If not completed, the process returns to step 1102 and the next ruled line is similarly processed. When the matching is completed for all ruled lines, the process proceeds to step 1109.

【００８２】ステップ１１０９：フォーマットの全罫線
に対する、画像中に存在した罫線数の割合を、フォーマ
ット一致の評価値とする。Step 1109: The ratio of the number of ruled lines existing in the image to all the ruled lines of the format is set as the evaluation value for format matching.

【００８３】図１２は、図１１のステップ１１０４の罫
線と画像のマッチング処理の詳細なフローチャートを示
す。罫線と画像のマッチングは、画像上で、罫線の近辺
の、罫線に垂直な短い黒ランを抽出し、この黒ランを基
に、画像上の罫線の位置を求め、この罫線を構成する黒
ランの数が、閾値以上かどうかで判定する。以下、各ス
テップの処理内容を説明する。FIG. 12 shows a detailed flowchart of the matching process between the ruled line and the image in step 1104 of FIG. The matching of ruled lines and images is performed by extracting a short black run near the ruled line that is perpendicular to the ruled line on the image, calculates the position of the ruled line on the image based on this black run, and determines the black run that forms this ruled line. Is determined by whether or not the number of The processing contents of each step will be described below.

【００８４】ステップ１２０１：マッチング対象の罫線
が縦罫線か否かを判定する。縦罫線なら、ステップ１２
０２に進む。そうでなければ、横罫線なので、ステップ
１２１２に進む。Step 1201: It is determined whether the ruled line to be matched is a vertical ruled line. If it is a vertical ruled line, step 12
Go to 02. If not, the line is a horizontal ruled line, and the process proceeds to step 1212.

【００８５】以下、縦罫線の場合について、ステップ１
２０２からステップ１２１１までの処理の内容を、図１
３及び図１４を用いて説明する。図１３は、画像８０１
の罫線の周辺の一部を示す。１３０４，１３０５は画像
中の黒画素の部分であり、１３０４は罫線、１３０５は
文字を示している。図１４は、図１３の１３０６の部分
を拡大した内容を示す。Below, in the case of vertical ruled lines, step 1
The contents of the processing from 202 to step 1211 are shown in FIG.
3 and FIG. 14 will be described. FIG. 13 shows an image 801.
The part around the ruled line is shown. Reference numerals 1304 and 1305 denote black pixel portions in the image, 1304 denotes a ruled line, and 1305 denotes a character. FIG. 14 shows the enlarged contents of the portion 1306 of FIG.

【００８６】ステップ１２０２：画像中に、マッチング
対象のフォーマット中の罫線（罫線情報３２から読み出
した罫線であり、以下、これをフォーマット罫線と呼
ぶ）を設定する。図１３の１３０１が画像中に設定した
フォーマット罫線を示す。そして、このフォーマット罫
線１３０１を中心として、点線１３０２及び１３０３に
挾まれる一定幅の領域を設定する。この範囲で、画像中
の罫線を検索する。Step 1202: A ruled line in the format to be matched (a ruled line read from the ruled line information 32, which will be referred to as a format ruled line hereinafter) is set in the image. Reference numeral 1301 in FIG. 13 indicates a format ruled line set in the image. Then, with the format ruled line 1301 as the center, a region having a constant width sandwiched between the dotted lines 1302 and 1303 is set. Within this range, the ruled line in the image is searched.

【００８７】ステップ１２０３：図１３の画像の罫線１
３０４や文字１３０５を構成する横方向の黒ランから、
ステップ１２０２で設定した範囲（点線１３０２から１
３０３の範囲）中に含まれる、閾値γ以下の長さの黒ラ
ンを抽出する。Step 1203: Ruled line 1 of the image in FIG.
From the horizontal black runs that make up 304 and characters 1305,
Range set in step 1202 (dotted lines 1302 to 1
(Range of 303), a black run having a length equal to or less than the threshold value γ is extracted.

【００８８】図１４において、黒く帯状に塗り潰した黒
ランは、閾値γ以下の長さの黒ランとして抽出される。
黒ラン１３０４−２，１３０４−３は、長さが閾値γ以
上であったため、抽出されなかった。In FIG. 14, the black run filled in a black band is extracted as a black run having a length equal to or less than the threshold value γ.
The black runs 1304-2 and 1304-3 were not extracted because the length was equal to or greater than the threshold value γ.

【００８９】ステップ１２０４：ステップ１２０３で抽
出された全ての黒ラン（１３０４−１、１３０５−１な
ど）の重心（１４０２−１、１４０２−２など）を求
め、この重心位置（黒ランの数だけある）とフォーマッ
ト罫線１３０１とのずれ量をそれぞれ求める。ずれ量と
は、重心位置がフォーマット罫線１３０１上であれば
０、重心位置がフォーマット罫線１３０１より右にあれ
ば画素単位でのフォーマット罫線１３０１までの距離、
重心位置がフォーマット罫線１３０１より左にあれば画
素単位でのフォーマット罫線１３０１までの距離に−１
を乗じた値である。Step 1204: The centroids (1402-1, 1402-2, etc.) of all the black runs (1304-1, 1305-1, etc.) extracted in step 1203 are obtained, and the centroid positions (as many as the number of black runs) are obtained. And the format ruled line 1301. The shift amount is 0 when the center of gravity is on the format ruled line 1301 and is the distance to the format ruled line 1301 in pixel units when the center of gravity is on the right of the format ruled line 1301.
If the position of the center of gravity is to the left of the format ruled line 1301, the distance to the format ruled line 1301 in units of pixels is -1.
It is the value multiplied by.

【００９０】ステップ１２０５：ステップ１２０４で求
めた、黒ランのずれ量について、ずれ量の値に対する、
黒ランの数のヒストグラム１４０３を作成する。Step 1205: Regarding the deviation amount of the black run obtained in Step 1204, with respect to the deviation amount value,
A histogram 1403 of the number of black runs is created.

【００９１】ステップ１２０６：ステップ１２０５で作
成したヒストグラムの値が、最大となるずれ量１４０４
を求める。Step 1206: The amount of deviation 1404 that maximizes the value of the histogram created in step 1205.
Ask for.

【００９２】ステップ１２０７：ステップ１２０６で求
めたずれ量だけ、フォーマット罫線をずらした位置を中
心として、点線１４０５と１４０６に挾まれた一定幅の
範囲１４０７を設定する。Step 1207: A range 1407 having a constant width sandwiched by dotted lines 1405 and 1406 is set around the position where the format ruled line is displaced by the amount of shift obtained in step 1206.

【００９３】ステップ１２０８：ステップ１２０３で抽
出された、閾値γ以下の長さの黒ランのうち、ステップ
１２０７で設定した範囲中に、重心が含まれるランだけ
を抽出する。Step 1208: Of the black runs having a length equal to or smaller than the threshold value γ extracted in step 1203, only the runs including the center of gravity within the range set in step 1207 are extracted.

【００９４】ステップ１２０９：ステップ１２０８で抽
出された黒ランの数が、フォーマット罫線の画素を単位
とした長さ（画素数）の、定数δ（＜１）倍より大きけ
れば、ステップ１２１０のステップに進む。そうでなけ
れば、ステップ１２１１に進む。Step 1209: If the number of black runs extracted in Step 1208 is larger than the constant δ (<1) times the length (the number of pixels) of the pixel of the format ruled line as a unit, go to Step 1210. move on. Otherwise, proceed to step 1211.

【００９５】ステップ１２１０：画像中に、フォーマッ
ト罫線に対応する罫線が存在すると判定する。Step 1210: It is judged that there is a ruled line corresponding to the format ruled line in the image.

【００９６】ステップ１２１１：画像中に、フォーマッ
ト罫線に対応する罫線が存在しないと判定する。Step 1211: It is judged that there is no ruled line corresponding to the format ruled line in the image.

【００９７】ステップ１２０１でマッチング対象の罫線
が横罫線であった場合は、ステップ１２０２からステッ
プ１２０８の代りに、ステップ１２１２からステップ１
２１８が実行される。ステップ１２１２からステップ１
２１８は、ステップ１２０２からステップ１２０８の処
理をｘ座標とｙ座標を交換した座標系で行うことで、同
様に実現できるので、説明は省略する。If the ruled line to be matched is a horizontal ruled line in step 1201, instead of steps 1202 to 1208, steps 1212 to 1
218 is executed. Step 1212 to Step 1
Since step 218 can be similarly realized by performing the processing from step 1202 to step 1208 in the coordinate system in which the x coordinate and the y coordinate are exchanged, the description thereof will be omitted.

【００９８】図１５は、図２のステップ１０６のフォー
マット判定処理の詳細なフローチャートである。フォー
マットの一致度の評価値が最大のフォーマットが、その
画像に対応するフォーマットである。同じ評価値が得ら
れるなら、罫線数が多いフォーマットを、画像に対応す
るフォーマットとする。ただし、評価値が、閾値βより
小さければ、あてはまるフォーマットがないとして、エ
ラーにする。以下、各ステップについて説明する。FIG. 15 is a detailed flowchart of the format judgment processing in step 106 of FIG. The format having the largest evaluation value of the format matching degree is the format corresponding to the image. If the same evaluation value is obtained, the format with a large number of ruled lines is set as the format corresponding to the image. However, if the evaluation value is smaller than the threshold value β, it is determined that there is no applicable format and an error occurs. Each step will be described below.

【００９９】ステップ１５０１：各フォーマットの番号
を、その罫線数で降順にソートする。Step 1501: The numbers of each format are sorted in descending order by the number of ruled lines.

【０１００】ステップ１５０２：対応フォーマット番号
ｆｍｔ＿ｎｏ＝０、比較フォーマット番号ｉ＝１に初期
化する。Step 1502: Initialize corresponding format number fmt_no = 0 and comparison format number i = 1.

【０１０１】ステップ１５０３：ｉ番目のフォーマット
の一致度の評価値とｆｍｔ＿ｎｏ番目のフォーマットの
一致度の評価値とを比較する。ｉ番目の評価値の方が、
ｆｍｔ＿ｎｏ番目の評価値よりも大きければ、ステップ
１５０４に進む。そうでなければ、ステップ１５０５に
進む。Step 1503: The evaluation value of the coincidence degree of the i-th format is compared with the evaluation value of the coincidence degree of the fmt_no-th format. The i-th evaluation value is
If it is larger than the fmt_no-th evaluation value, the process proceeds to step 1504. Otherwise, it proceeds to step 1505.

【０１０２】ステップ１５０４：対応フォーマット番号
ｆｍｔ＿ｎｏ＝ｉとする。Step 1504: Set the corresponding format number fmt_no = i.

【０１０３】ステップ１５０５：ｉの値に１を加える。Step 1505: Add 1 to the value of i.

【０１０４】ステップ１５０６：ｉの値が、全フォーマ
ット数と同じになったか否かを調べる。同じであれば、
全てのフォーマットについて比較を行ったので、ステッ
プ１５０７に進む。そうでなければ、次のフォーマット
と比較するために、ステップ１５０３に進む。Step 1506: It is checked whether the value of i has become the same as the total number of formats. If the same,
Since the comparison has been made for all the formats, the process proceeds to step 1507. Otherwise, proceed to step 1503 to compare with the next format.

【０１０５】ステップ１５０７：ｆｍｔ＿ｎｏ番目のフ
ォーマットの評価値が、閾値β以上であるか否かを判定
する。閾値β以上ならば、ステップ１５０８に進む。そ
うでなければ、ステップ１５０９に進む。Step 1507: It is judged whether the evaluation value of the fmt_no-th format is equal to or larger than the threshold value β. If it is greater than or equal to the threshold β, the process proceeds to step 1508. Otherwise, proceed to step 1509.

【０１０６】ステップ１５０８：対応フォーマットが存
在するとして、フォーマット番号ｆｍｔ＿ｎｏを設定す
る。Step 1508: Assuming that the corresponding format exists, the format number fmt_no is set.

【０１０７】ステップ１５０９：対応フォーマットが存
在しないとし、マッチングエラーとして、ｆｍｔ＿ｎｏ
＝−１を設定する。Step 1509: Assuming that the corresponding format does not exist, a matching error, fmt_no
= -1 is set.

【０１０８】図１６は、図２のステップ１０８の罫線補
正及び画像からの罫線消去の処理の詳細なフローチャー
トである。各罫線ごとに、図１２に示したフローチャー
トのステップ１２０８，１２１８で抽出された黒ランを
消去し、この重心の近似直線を求めることで、罫線の位
置を画像に合わせて補正する。以下、各ステップの内容
を説明する。FIG. 16 is a detailed flowchart of the ruled line correction and ruled line erase processing from the image in step 108 of FIG. For each ruled line, the black run extracted in steps 1208 and 1218 of the flowchart shown in FIG. 12 is erased, and the approximate straight line of the center of gravity is obtained to correct the position of the ruled line in accordance with the image. The contents of each step will be described below.

【０１０９】ステップ１６０１：罫線番号のカウンタｉ
をｉ＝０に初期設定する。Step 1601: Ruler number counter i
Is initialized to i = 0.

【０１１０】ステップ１６０２：ｉ番目のフォーマット
罫線と画像とのマッチングを行う。これは、図１１のス
テップ１１０４と同じく、図１２のフォーマット罫線と
画像のマッチング処理を呼出すことにより行う。Step 1602: Match the i-th format ruled line with the image. This is performed by calling the format ruled line and image matching process of FIG. 12 as in step 1104 of FIG.

【０１１１】ステップ１６０３：マッチングの結果、画
像中に対応する罫線が存在するなら、ステップ１６０４
に進む。そうでなければ、ステップ１６０６に進む。Step 1603: If there is a corresponding ruled line in the image as a result of matching, step 1604
Proceed to. Otherwise, proceed to step 1606.

【０１１２】ステップ１６０４：図１７は、罫線の位置
補正の様子を示す図である。ステップ１６０４では、こ
の図１７に示すように、フォーマット罫線と画像とのマ
ッチングにより抽出された黒ランを通る近似直線１７０
１を求める。近似直線１７０１は、抽出された全黒ラン
の重心を（ｘ［ｉ］，ｙ［ｉ］）（０≦ｉ＜黒ラン数）
とすると、縦線の場合は上記の数式１を用い、横線の場
合は上記の数式２を用いて、それぞれ求められる。Step 1604: FIG. 17 is a diagram showing how the position of the ruled line is corrected. In step 1604, as shown in FIG. 17, an approximate straight line 170 passing through the black run extracted by the matching between the format ruled line and the image.
Ask for 1. The approximate straight line 1701 represents the center of gravity of all the extracted black runs as (x [i], y [i]) (0 ≦ i <the number of black runs).
Then, in the case of a vertical line, the above Equation 1 is used, and in the case of a horizontal line, the above Equation 2 is used to obtain each.

【０１１３】ステップ１６０５：ステップ１６０２の罫
線と画像とのマッチングで抽出された黒ランを画像から
消去することで、画像中から罫線を消去する。Step 1605: The black run extracted by the matching between the ruled line and the image in step 1602 is erased from the image to erase the ruled line from the image.

【０１１４】ステップ１６０６：罫線番号ｉに１を加え
る。Step 1606: Add 1 to the ruled line number i.

【０１１５】ステップ１６０７：罫線番号ｉが全罫線数
より小さければ、次の罫線の補正と画像中から罫線を消
去するために、ステップ１６０２に進む。そうでなけれ
ば、ステップ１６０８に進む。Step 1607: If the ruled line number i is smaller than the total number of ruled lines, the process proceeds to step 1602 to correct the next ruled line and erase the ruled line from the image. Otherwise, proceed to step 1608.

【０１１６】ステップ１６０８：全罫線について、その
始終点で交差する罫線との交点を求め、補正された罫線
として、その始点及び終点の座標を更新する。Step 1608: For all the ruled lines, the intersections with the ruled lines intersecting at the start and end points are obtained, and the coordinates of the start point and the end point are updated as the corrected ruled lines.

【０１１７】図１８は、図２のステップ１０９のフィー
ルド位置補正の処理の詳細なフローチャートである。フ
ィールドの上下左右の罫線の式から、フィールドの存在
する位置を求める。以下、各ステップについて説明す
る。FIG. 18 is a detailed flowchart of the field position correction processing in step 109 of FIG. The position where the field exists is calculated from the ruled lines on the top, bottom, left and right of the field. Each step will be described below.

【０１１８】ステップ１８０１：フォーマット情報デー
タベース３１中の、画像に対応するフォーマットのフィ
ールド情報３３から、順次、フィールドのデータを読み
だす。Step 1801: Field data is sequentially read from the field information 33 of the format corresponding to the image in the format information database 31.

【０１１９】ステップ１８０２：フィールド情報３３
は、図６に示したようにフィールドを囲む上下左右の４
本の罫線より構成されるので、罫線の番号に基づいて、
ステップ１０８で補正された罫線の情報を読み出す。読
み出されたフィールドの上線及び下線の２本の横線と、
左線及び右線の２本の縦線の交点を求め、これをフィー
ルド位置の情報とする。Step 1802: Field information 33
Is the upper, lower, left, and right 4 surrounding the field as shown in FIG.
Because it is composed of ruled lines of a book, based on the number of ruled lines,
The information of the ruled line corrected in step 108 is read. Two horizontal lines, an upper line and an underline of the read field,
The intersection of two vertical lines, the left line and the right line, is obtained, and this is used as field position information.

【０１２０】ステップ１８０３：ステップ１８０２で求
められたフィールドの４隅の座標（左上、右上、左下、
右下）をフィールド位置情報３５として出力する。Step 1803: Coordinates of four corners of the field obtained in Step 1802 (upper left, upper right, lower left,
(Lower right) is output as field position information 35.

【０１２１】図１９に、フィールド位置情報のフォーマ
ットを示す。１９０１に、フィールドを特定するための
フィールド番号が格納される。１９０２に、フィールド
の画像上での左上の座標が格納される。１９０３に、フ
ィールドの画像上での右上の座標が格納される。１９０
４に、フィールドの画像上での左下の座標が格納され
る。１９０５に、フィールドの画像上での右下の座標が
格納される。FIG. 19 shows the format of the field position information. A field number for identifying the field is stored in 1901. In 1902, the upper left coordinate on the image of the field is stored. In 1903, the upper right coordinates of the field on the image are stored. 190
The lower left coordinate on the image of the field is stored in 4. In 1905, the lower right coordinates on the image of the field are stored.

【０１２２】ステップ１８０４：まだ、フィールド位置
を求めていないフィールドが存在すするなら、ステップ
１８０１に戻って次のフィールドの処理を行う。そうで
なければ、フィールド位置の設定の処理を終わる。Step 1804: If there is a field for which the field position has not yet been obtained, the process returns to step 1801 to process the next field. If not, the processing for setting the field position ends.

【０１２３】図２０は、図２のステップ１１０のフィー
ルド内文字認識処理の詳細なフローチャートである。ス
テップ１０９で得られたフィールドに含まれる文字パタ
ーン（画像）を抽出し、これを文字認識する。以下、各
ステップについて説明する。FIG. 20 is a detailed flowchart of the in-field character recognition processing in step 110 of FIG. The character pattern (image) included in the field obtained in step 109 is extracted, and the character is recognized. Each step will be described below.

【０１２４】ステップ２００１：画像中より、黒画素の
連結成分を抽出する。黒画素の連結成分抽出について
は、様々な方式が公知となっているので、その方式の説
明は省略する。Step 2001: A connected component of black pixels is extracted from the image. Various methods have been publicly known for extraction of connected components of black pixels, and therefore description thereof will be omitted.

【０１２５】ステップ２００２：フィールド番号カウン
タｉをｉ＝０に初期化する。Step 2002: The field number counter i is initialized to i = 0.

【０１２６】ステップ２００３：フィールド位置情報３
５（図１９）より、ｉ番目のフィールドのフィールド位
置を読みだす。Step 2003: Field position information 3
From 5 (FIG. 19), the field position of the i-th field is read.

【０１２７】ステップ２００４：ステップ２００１で抽
出した黒画素連結成分から、フィールド内に含まれる連
結成分だけを抽出する。更にこれらの連結成分を統合し
て、文字の外接矩形を作成し、文字パターンを抽出す
る。Step 2004: From the black pixel connected components extracted in step 2001, only the connected components included in the field are extracted. Furthermore, these connected components are integrated to create a circumscribing rectangle of a character, and a character pattern is extracted.

【０１２８】ステップ２００５：ステップ２００４で抽
出された文字パターンを文字認識する。Step 2005: Character recognition is performed on the character pattern extracted in step 2004.

【０１２９】ステップ２００６：認識結果をプリンタ７
０に出力する。Step 2006: The recognition result is sent to the printer 7.
Output to 0.

【０１３０】ステップ２００７：フォーマット番号カウ
ンタｉに１を加える。Step 2007: Add 1 to the format number counter i.

【０１３１】ステップ２００８：フォーマット番号カウ
ンタｉの値が全フィールド数より小さければ、次のフィ
ールドの文字認識のために、ステップ２００３に戻る。
そうでなければ、文字認識処理が終わりとなる。Step 2008: If the value of the format number counter i is smaller than the total number of fields, the process returns to step 2003 for character recognition of the next field.
If not, the character recognition process ends.

【０１３２】次に、第２の実施例として、解析された表
の一部のフィールドが更に罫線により分割される場合の
表解析方式について説明する。Next, as a second embodiment, a table analysis method in the case where some fields of the analyzed table are further divided by ruled lines will be described.

【０１３３】図２１は、フォーマット中の１つのフィー
ルドが、さらに罫線により分割される場合の処理の流れ
を図示したものである。FIG. 21 is a diagram showing the flow of processing when one field in the format is further divided by ruled lines.

【０１３４】２１０１は、図３のフォーマット例１と同
じ表のフォーマットである。このフォーマット２１０１
中、フィールド２１０２は、このフィールドを更に細か
く分割するフォーマット（以下、これをフィールド分割
フォーマットと呼ぶ）を有する。Reference numeral 2101 is the same table format as the format example 1 in FIG. This format 2101
The field 2102 has a format (hereinafter, referred to as a field division format) for further dividing this field.

【０１３５】２１０３，２１０４は、フィールド２１０
２の上線と下線であり、それぞれ、横線情報３２−ｂの
番号５と番号６の罫線であるものとする。２１０５，２
１０６も、同様に、フィールド２１０２の左線と右線で
あり、それぞれ、縦線情報３２−ａの番号８と番号９の
罫線であるものとする。Fields 2103 and 2104 are field 210.
2 is the upper line and the lower line, and is the ruled line of the numbers 5 and 6 of the horizontal line information 32-b, respectively. 2105, 2
Similarly, 106 are left and right lines of the field 2102, and are ruled lines of numbers 8 and 9 of the vertical line information 32-a, respectively.

【０１３６】２１０７は、このフィールド２１０２に対
応したフィールド分割フォーマットである。フィールド
分割フォーマット２１０７を構成する罫線は、それぞ
れ、このフォーマット上での線番号を持っており、この
線番号に基づいてフィールド情報が作成される。このフ
ィールド分割フォーマット２１０７を、フィールド２１
０２に当てはめると、罫線２１０８は、フィールド２１
０２のフィールド上線２１０３に対応する。同様に、罫
線２１０９はフィールド２１０２のフィールド下線２１
０４に、罫線２１１０はフィールド２１０２のフィール
ド左線２１０５に、罫線２１１１はフィールド２１０２
のフィールド右線２１０６に、それぞれ対応する。Reference numeral 2107 is a field division format corresponding to this field 2102. Each ruled line forming the field division format 2107 has a line number in this format, and field information is created based on this line number. This field division format 2107 is used for the field 21
02, the ruled line 2108 becomes the field 21
02 corresponding to the field top line 2103. Similarly, the ruled line 2109 is the field underline 21 of the field 2102.
04, the ruled line 2110 is the field left line 2105 of the field 2102, and the ruled line 2111 is the field 2102.
Corresponding to the field right line 2106 of each.

【０１３７】２１１２，２１１３，２１１４は、新たな
罫線として追加される。この結果、合成結果に示したよ
うに、２本の横線２１１６，２１１７、及び１本の縦線
２１１８が、それぞれ、罫線として追加される。これら
の罫線の追加に伴い、フィールドが増加する。2112, 2113, and 2114 are added as new ruled lines. As a result, two horizontal lines 2116 and 2117 and one vertical line 2118 are added as ruled lines, respectively, as shown in the synthesis result. The fields increase with the addition of these ruled lines.

【０１３８】図２２は、以上の処理を行う場合の、図２
のフローチャートの変更点である。フィールドの位置設
定（ステップ１０９）と、フィールド内文字認識（ステ
ップ１１０）との間に、フィールド分割フォーマットの
判定及びフォーマット情報の変更のステップ２２０１を
追加する。FIG. 22 shows a case where the above processing is performed.
This is a modification of the flowchart of FIG. Between the field position setting (step 109) and the in-field character recognition (step 110), step 2201 of determining the field division format and changing the format information is added.

【０１３９】ステップ２２０１では、全フォーマットに
ついて、フィールド分割フォーマットデータベース３６
からフィールド分割フォーマットのデータ（例えば図２
１の２１０７）を読みだし、フィールド分割の有無のチ
ェックと、フィールド分割フォーマットの表全体のフォ
ーマット情報への追加及び変更処理を行う。これによ
り、フィールド位置情報３５には分割フォーマットの分
のフィールドも反映される。その後ステップ１１０で、
フィールド分割フォーマットの表全体のフォーマット情
報への追加及び変更処理後のフォーマット情報に基づい
て、各フィールドの文字認識処理を行なう。In step 2201, the field division format database 36 is set for all formats.
To field split format data (eg
2107) of No. 1), the presence / absence of field division is checked, and addition / change processing of the field division format to the format information of the entire table is performed. As a result, the field for the divided format is also reflected in the field position information 35. Then in step 110,
Character recognition processing of each field is performed based on the format information after the addition and modification processing of the field division format to the format information of the entire table.

【０１４０】図２３は、フィールド分割の判定及びフィ
ールド分割フォーマットの表全体のフォーマット情報へ
の追加及び変更処理の詳細なフローチャートである。FIG. 23 is a detailed flowchart of the field division determination and field division format addition / modification processing to the format information of the entire table.

【０１４１】図２４に、フィールド分割フォーマットの
例を示す。これらのフィールド分割フォーマットは、表
全体のフォーマット情報と同様に、左上を（０，０）、
右下を（１００００，１００００）とする座標系におけ
るフォーマット情報として表現されている。FIG. 24 shows an example of the field division format. Similar to the format information of the entire table, these field division formats have (0,0) at the upper left,
It is expressed as format information in the coordinate system whose lower right is (10000, 10000).

【０１４２】図２５に、これらのフィールド分割フォー
マットの内容を示す。フィールド分割フォーマット情報
３７には、そのフィールド分割フォーマットの対象とな
るフィールドの種別３７−１と、フィールド分割フォー
マットの罫線情報３７−２と、フィールド分割フォーマ
ットのフィールド情報３７−３とが格納される。FIG. 25 shows the contents of these field division formats. The field division format information 37 stores a field type 37-1, which is a target of the field division format, ruled line information 37-2 of the field division format, and field information 37-3 of the field division format.

【０１４３】フィールド種別３７−１は、図６のフィー
ルド情報３３のフィールド種別６０６で表される。すな
わち、例えば図６のフィールド情報３３において、ある
フィールドのフィールド種別６０６がｐであったとし、
図２５のフィールド分割フォーマット情報３７にフィー
ルド種別３７−１がｐである情報が格納されていたとす
ると、そのフィールドは対応するフィールド分割フォー
マット情報に基づいてフィールド分割されていることに
なる。罫線情報３７−２は、図５に示した罫線情報３２
と同様の内容を持つ。フィールド情報３７−３は、図６
に示したフィールド情報３３と同様の内容を持つ。The field type 37-1 is represented by the field type 606 of the field information 33 in FIG. That is, for example, in the field information 33 of FIG. 6, if the field type 606 of a certain field is p,
If the field division format information 37 of FIG. 25 stores information in which the field type 37-1 is p, the field is field-divided based on the corresponding field division format information. The ruled line information 37-2 is the ruled line information 32 shown in FIG.
It has the same contents as. The field information 37-3 is shown in FIG.
It has the same contents as the field information 33 shown in FIG.

【０１４４】以下、図２３に従って処理内容を説明す
る。The processing contents will be described below with reference to FIG.

【０１４５】ステップ２３０１：フィールド分割フォー
マットのチェックを順次行なうため、チェックの対象と
なるフィールドの番号を指定するカウンタｉをｉ＝０に
初期化する。Step 2301: Since the field division formats are sequentially checked, the counter i designating the number of the field to be checked is initialized to i = 0.

【０１４６】ステップ２３０２：フィールド番号ｉのフ
ィールドのフィールド種別（図６の６０６）を取り出
し、そのフィールド種別の値と同じ値のフィールド種別
３７−１を持つフィールド分割フォーマット情報３７
を、フィールド分割フォーマットデータベース３６か
ら、抽出する。Step 2302: The field type (606 of FIG. 6) of the field of the field number i is taken out, and the field division format information 37 having the field type 37-1 having the same value as the value of the field type 37
From the field division format database 36.

【０１４７】ステップ２３０３：処理対象となるフィー
ルド番号ｉのフィールドのフィールド種別に対応するフ
ィールド分割フォーマットがデータベース中にあれば、
ステップ２３０４に進む。そうでなければ、そのフィー
ルドはフィールド分割されていないということであるか
ら、フィールド分割の処理を行なわずに、ステップ２３
１０に進む。Step 2303: If the field division format corresponding to the field type of the field of the field number i to be processed exists in the database,
Proceed to step 2304. If it is not, it means that the field is not field-divided, so the field-division processing is not performed and step 23 is executed.
Go to 10.

【０１４８】ステップ２３０４：対応するフォーマット
が存在する場合は、ステップ１０６（図１５）と同じ手
順で、ステップ２３０３で抽出されたすべてのフィール
ド分割フォーマットに対するフォーマット判定を行う。Step 2304: When the corresponding format exists, the format judgment is performed for all the field division formats extracted in step 2303 by the same procedure as step 106 (FIG. 15).

【０１４９】ステップ２３０５：フォーマット判定の結
果、あてはまるフォーマットが存在するなら、ステップ
２３０６に進む。そうでなければ、フィールド分割処理
を行なわずに、ステップ２３１０に進む。Step 2305: As a result of the format judgment, if the applicable format exists, proceed to step 2306. Otherwise, the field division process is not performed and the process proceeds to step 2310.

【０１５０】ステップ２３０６：図１６のフローチャー
トに示した罫線の位置補正と画像からの罫線消去の処理
と同じ手順で、フィールド中のフィールド分割フォーマ
ットに対応した罫線の位置補正及び画像からの罫線消去
を行なう。Step 2306: Correct the position of the ruled line corresponding to the field division format in the field and erase the ruled line from the image by the same procedure as the process of correcting the position of the ruled line and deleting the ruled line from the image shown in the flowchart of FIG. To do.

【０１５１】ステップ２３０７：フィールド分割フォー
マットの罫線情報を、表全体のフォーマット情報の罫線
情報３２に加える。Step 2307: The ruled line information of the field division format is added to the ruled line information 32 of the format information of the entire table.

【０１５２】図２６に、罫線情報３２−１３２−２にフ
ィールド分割フォーマットの罫線情報を追加した状態を
示す。ここでは、図２１に示したフィールド分割フォー
マットの罫線情報を追加したとする。縦の罫線情報３２
−１の最後に、追加分の新たな罫線情報２６０１が加え
られる。横の罫線情報３２−２の最後に、追加分の新た
な罫線情報２６０２が加えられる。これにより、フィー
ルド分割フォーマット上での０番の縦線（図２１の２１
１２）は、表全体のフォーマット上では３７番（図２１
の２１１８）になる。また、フィールド分割フォーマッ
ト上での０番と１番の横線（図２１の２１１２，２１１
３）は、表全体のフォーマット上では１８番と１９番
（図２１の２１１６，２１１７）となる。FIG. 26 shows a state in which ruled line information in the field division format is added to the ruled line information 32-132-2. Here, it is assumed that the ruled line information of the field division format shown in FIG. 21 is added. Vertical ruled line information 32
New ruled line information 2601 is added to the end of -1. New additional ruled line information 2602 is added to the end of the horizontal ruled line information 32-2. As a result, the number 0 vertical line (21 in FIG. 21) on the field division format is
12) is No. 37 in the format of the entire table (Fig. 21).
2118). Also, the horizontal lines of 0 and 1 in the field division format (2112, 211 in FIG. 21).
3) is number 18 and number 19 (2116 and 2117 in FIG. 21) in the format of the entire table.

【０１５３】ステップ２３０８：表全体のフィールド情
報３３へのフィールド分割フォーマットのフィールド情
報の追加及び変更を行なう。すなわち、分割対象のフィ
ールドのフィールド情報を、フィールド分割フォーマッ
トのフィールド情報に置き換える。これにより、分割フ
ィールドを加えた新たなフィールド情報３３を作成す
る。Step 2308: Add and change the field information of the field division format to the field information 33 of the entire table. That is, the field information of the field to be divided is replaced with the field information in the field division format. As a result, new field information 33 including the divided fields is created.

【０１５４】図２７に、フィールド情報の変更処理の様
子を示す。表全体のフィールド情報２７０１の、分割対
象のフィールド２７０２を、フィールド分割フォーマッ
トのフィールド情報２７０３に置き換える。この際、フ
ィールド分割フォーマット上での上下左右の罫線番号
を、図２６で示した表全体の中での罫線番号に変換する
（２７０４）。FIG. 27 shows how the field information is changed. The field 2702 to be divided in the field information 2701 of the entire table is replaced with the field information 2703 of the field division format. At this time, the ruled line numbers of the top, bottom, left and right in the field division format are converted into the ruled line numbers in the entire table shown in FIG. 26 (2704).

【０１５５】ステップ２３０９：図１８のフィールドの
位置設定処理と同じ手順で、ステップ２３０８で新たに
作成したフィールド情報３３に基づいて、そのフィール
ドを囲む上下左右の罫線の交点から、フィールド位置情
報３５をその４隅の座標で設定する。Step 2309: In the same procedure as the field position setting process of FIG. 18, based on the field information 33 newly created in step 2308, the field position information 35 is obtained from the intersection of the upper, lower, left and right ruled lines surrounding the field. Set with the coordinates of the four corners.

【０１５６】ステップ２３１０：フィールド番号カウン
タｉに、１を加える。Step 2310: Add 1 to the field number counter i.

【０１５７】ステップ２３１１：フィールド番号カウン
タｉが、フィールド数よりも小さければ、次のフィール
ドの処理のために、ステップ２３０２に進む。そうでな
ければ、フィールド分割処理を終える。Step 2311: If the field number counter i is smaller than the number of fields, proceed to Step 2302 for processing the next field. If not, the field division process ends.

【０１５８】上述の第１及び第２の実施例とも、フォー
マット判定の際には、フォーマットの全罫線を用いて判
定していたが、他のフォーマットと差異のある部分だけ
を用いて判定することにより、高速化が可能である。In both the first and second embodiments described above, all the ruled lines of the format are used for the format determination, but only the portions that differ from other formats are used for the determination. Therefore, the speed can be increased.

【０１５９】図２８は、このような判定に用いるフォー
マット判定用罫線情報を作成する方式のフローチャート
である。FIG. 28 is a flow chart of a method of creating format determination ruled line information used for such determination.

【０１６０】３つのフォーマットＡ，Ｂ，Ｃがある場
合、まず、フォーマットＡとフォーマットＢについて、
それぞれのフォーマット中の全ての罫線を、他のフォー
マットの罫線と比較し、長さや位置などで類似したもの
がない罫線だけを抽出する。これにより、ＡとＢを区別
する罫線が得られる。以上の処理を、ＡとＣ、ＢとＣの
組み合わせについても行なうことで、それぞれのフォー
マットを判定するのに必要な罫線が得られる。これをフ
ォーマット判定用罫線情報３４として、それぞれのフォ
ーマットについて得る。When there are three formats A, B, and C, first, for format A and format B,
All ruled lines in each format are compared with ruled lines of other formats, and only ruled lines that have no similar length or position are extracted. As a result, a ruled line that distinguishes A and B is obtained. By performing the above processing also for the combinations of A and C and B and C, the ruled lines necessary for judging the respective formats can be obtained. This is obtained as format determination ruled line information 34 for each format.

【０１６１】フォーマット判定用罫線情報３４を、フォ
ーマット判定時に、罫線情報３２と置き換えることで、
フォーマット判定処理の高速化が実現できる。By replacing the ruled line information 34 for format determination with the ruled line information 32 at the time of format determination,
It is possible to speed up the format determination process.

【０１６２】以下、図２８の各ステップについて説明す
る。The steps of FIG. 28 will be described below.

【０１６３】ステップ２８０１：複数のフォーマットか
ら、２つのフォーマットを選択する組み合わせを全て作
成する。Step 2801: Create all combinations for selecting two formats from a plurality of formats.

【０１６４】ステップ２８０２：ステップ２８０１で作
成したフォーマットの組み合わせの１つから、２つのフ
ォーマットの罫線情報を読み出す。Step 2802: The ruled line information in two formats is read from one of the format combinations created in step 2801.

【０１６５】ステップ２８０３：各フォーマットの全罫
線を比較し、それぞれ、長さ、位置が類似した罫線を除
く。Step 2803: All ruled lines of each format are compared and ruled lines having similar lengths and positions are excluded.

【０１６６】ステップ２８０４：ステップ２８０３で除
去されなかった罫線を、フォーマット判定用罫線情報３
４に格納する。例えば、フォーマットＡとＢを比較した
場合、３４−１に、フォーマットＡの罫線情報でフォー
マットＢと対応しなかった罫線が格納される。また、３
４−２に、フォーマットＢの罫線情報でフォーマットＡ
と対応しなかった罫線が格納される。Step 2804: The ruled lines not removed in step 2803 are converted to the ruled line information 3 for format determination.
Store in 4. For example, when formats A and B are compared, the ruled line information of format A that does not correspond to format B is stored in 34-1. Also, 3
4-2, format A with the ruled line information of format B
The ruled lines that do not correspond to are stored.

【０１６７】ステップ２８０５：ステップ２８０１で求
めた他のフォーマット同士の組み合わせについても、ス
テップ２８０２からステップ２８０４を実行し、それぞ
れの組み合わせで発生したフォーマット判定用罫線を求
める。Step 2805: Steps 2802 to 2804 are executed for combinations of other formats obtained in step 2801 as well, and format determination ruled lines generated in each combination are obtained.

【０１６８】以上の処理により、フォーマット判定用罫
線情報が作成される。By the above processing, format determination ruled line information is created.

【０１６９】[0169]

【発明の効果】本発明によれば、画像の非線形なずれを
フィールドを囲む罫線を用いることで補正しているの
で、高精度なフィールド抽出が可能である。また、画像
データで罫線が一部途切れている場合などでも、正しい
フィールド抽出が可能である。According to the present invention, since the non-linear shift of the image is corrected by using the ruled line surrounding the field, highly accurate field extraction can be performed. Further, even if a ruled line is partly broken in the image data, correct field extraction is possible.

【０１７０】フォーマットをフィールドの集合としてで
なく、表を構成する罫線の集合と、罫線に囲まれるフィ
ールドとしてして記述することで、フィールド単位での
位置補正では困難であった問題（罫線近くに存在する”
１”などの影響）を排除することができる。By describing the format not as a set of fields but as a set of ruled lines forming a table and a field surrounded by the ruled lines, it is difficult to correct the position on a field-by-field basis. Exists ”
1) etc.) can be eliminated.

【０１７１】複数のフィールドにまたがる長い罫線を罫
線情報中で１本の罫線とすると、画像の非線形なひずみ
に対応し切れない場合がある。しかし、本発明では、１
本の罫線を途中で分断し複数の罫線情報として取り扱う
ことができる。したがって、長い罫線についても、部分
的な直線近似により、十分対応が可能である。If a long ruled line extending over a plurality of fields is used as one ruled line in the ruled line information, it may not be possible to deal with the nonlinear distortion of the image. However, in the present invention, 1
It is possible to divide the ruled lines of a book in the middle and handle them as a plurality of ruled line information. Therefore, even a long ruled line can be sufficiently dealt with by partial linear approximation.

【０１７２】フォーマットに対応する線要素の有無を判
定することで、複数のフォーマットの判定が、ＩＤ等の
付加情報無しで可能となる。また、フォーマットの部分
的な変動（例えば、フィールド内に別の構造を有する場
合など）に対しても、変動部分のフォーマットを持つだ
けで、表認識が可能となる。By determining the presence / absence of a line element corresponding to a format, it is possible to determine a plurality of formats without additional information such as ID. In addition, even for a partial variation of the format (for example, when the field has another structure), it is possible to recognize the table only by having the format of the varying portion.

[Brief description of drawings]

【図１】本発明の第１の実施例のシステム構成図FIG. 1 is a system configuration diagram of a first embodiment of the present invention.

【図２】表認識処理全体の概略フローチャート図FIG. 2 is a schematic flowchart of the entire table recognition process.

【図３】フォーマットデータの例（その１）を示す図FIG. 3 is a diagram showing an example (part 1) of format data.

【図４】フォーマットデータの例（その２）を示す図FIG. 4 is a diagram showing an example (part 2) of format data.

【図５】フォーマット情報中の罫線情報の内容を示す図FIG. 5 is a diagram showing the contents of ruled line information in the format information.

【図６】フォーマット情報中のフィールド情報の内容を
示す図FIG. 6 is a diagram showing the contents of field information in format information.

【図７】表外枠抽出処理のフローチャート図FIG. 7 is a flowchart of an outer frame extraction process.

【図８】表外枠抽出処理における表外枠の概略位置判定
の様子を示す図FIG. 8 is a diagram showing a state of a rough position determination of an outer frame in the outer frame extraction processing.

【図９】表外枠抽出処理における表外枠の補正の様子を
示す図FIG. 9 is a diagram showing how the outer frame is corrected in the outer frame extraction processing.

【図１０】表外枠抽出処理における表外枠全体の抽出結
果を示す図FIG. 10 is a diagram showing the extraction result of the entire outer frame in the outer frame extraction processing.

【図１１】画像とフォーマットの一致度評価処理のフロ
ーチャート図FIG. 11 is a flowchart of a process of evaluating the degree of coincidence between the image and the format

【図１２】罫線と画像のマッチング処理のフローチャー
ト図FIG. 12 is a flowchart of matching processing of ruled lines and images.

【図１３】罫線と画像のマッチング処理における罫線の
検索範囲の設定の様子を示す図FIG. 13 is a diagram showing how a search range for ruled lines is set in a ruled line / image matching process.

【図１４】罫線と画像のマッチング処理における罫線の
存在範囲の判定の様子を示す図FIG. 14 is a diagram showing how a ruled line existing range is determined in a ruled line and image matching process.

【図１５】画像データのフォーマットの同定処理のフロ
ーチャート図FIG. 15 is a flowchart of identification processing of image data format.

【図１６】罫線の位置補正処理と画像中の罫線消去処理
のフローチャート図FIG. 16 is a flowchart of ruled line position correction processing and ruled line removal processing in an image.

【図１７】罫線の位置補正処理の様子を示す図FIG. 17 is a diagram showing a state of ruled line position correction processing.

【図１８】認識フィールドの位置設定処理のフローチャ
ート図FIG. 18 is a flowchart of a recognition field position setting process.

【図１９】フィールド位置情報の内容を示す図FIG. 19 is a diagram showing the contents of field position information.

【図２０】フィールド内文字認識処理のフローチャート
図FIG. 20 is a flowchart of a character recognition process in a field.

【図２１】フィールドの分割処理の概要を示す図FIG. 21 is a diagram showing an outline of field division processing.

【図２２】フィールドの分割処理の図２のフローチャー
トへの追加分を示す図22 is a diagram showing the addition of field division processing to the flowchart of FIG. 2;

【図２３】フィールドの分割判定及び分割処理のフロー
チャート図FIG. 23 is a flowchart of field division determination and division processing.

【図２４】フィールド分割フォーマットの例を示す図FIG. 24 is a diagram showing an example of a field division format.

【図２５】フィールド分割フォーマット情報の内容を示
す図FIG. 25 is a diagram showing the contents of field division format information.

【図２６】表全体の罫線情報へのフィールド分割フォー
マットの罫線情報追加を示す図FIG. 26 is a diagram showing addition of ruled line information of a field division format to ruled line information of the entire table.

【図２７】表全体のフィールド情報へのフィールド分割
フォーマットの追加を示す図FIG. 27 is a diagram showing addition of a field division format to field information of the entire table.

【図２８】フォーマット判定用罫線情報の作成処理のフ
ローチャート図FIG. 28 is a flowchart of a process of creating format determination ruled line information.

[Explanation of symbols]

１０ＣＰＵ２０メモリ３０ハードディスク４０ハードディスク
コントローラ５０入出力端末５１端末コントロー
ラ６０イメージスキャナ６１ディスク６２画像入力コントローラ７０プリンタ７１プリンタコントローラ８０システムバス３１フォーマット情報データベース３２フォーマ
ット罫線情報３３フォーマットフィールド情報３４フォーマ
ット判定用線分情報３５フィールド位置情報３６部分フォーマッ
トデータベース10 CPU 20 Memory 30 Hard Disk 40 Hard Disk Controller 50 Input / Output Terminal 51 Terminal Controller 60 Image Scanner 61 Disk 62 Image Input Controller 70 Printer 71 Printer Controller 80 System Bus 31 Format Information Database 32 Format Ruled Line Information 33 Format Field Information 34 Format Judgment Line Minute information 35 Field position information 36 Partial format database

Claims

[Claims]

1. A table recognition method for reading a tabular document composed of ruled lines of a predetermined format as image data using image input means and recognizing the table structure from the image data. A reference line extracting step for extracting the reference line of the table from the table, a coordinate converting step for converting the ruled line of the predetermined format into coordinates on the image data based on the extracted reference line, and the coordinate converted ruled line. And a position correction step of correcting the position of the ruled line on the basis of the detected positional deviation and detecting the positional deviation from the image data.

2. A table recognition method in which a tabular document composed of ruled lines of a predetermined format is read as image data using image input means and the table structure is recognized from the image data. A reference line extraction step of extracting the reference line of the table from the table, and reading ruled line information representing a ruled line of one format from a plurality of prepared format information, and obtaining the degree of coincidence between the read ruled line and the image data. A matching degree evaluation step, a step of performing the matching degree evaluation process on all of the plurality of format information, and obtaining a matching degree corresponding to each of the plurality of formats; and a format having the highest matching degree, Determining a corresponding format corresponding to the image data, ruled lines of the corresponding format and the image A step of detecting a positional deviation from the data and correcting the position of the ruled line based on the detected positional deviation; and a step of obtaining the position of the field surrounded by the ruled line based on the position of the corrected ruled line. A table recognition method characterized by being provided.

3. A table recognition method for reading a tabular document composed of ruled lines of a predetermined format as image data using image input means and recognizing the table structure from the image data. A reference line extraction step of extracting a reference line of the table from the table, and reading ruled line information representing a ruled line of one format from a plurality of prepared format information, and using the read ruled line as a reference with the extracted reference line. A coordinate conversion step of performing coordinate conversion into coordinates on the image data, matching the coordinate-converted ruled line with the image data, and checking whether there is a ruled line corresponding to the coordinate-converted ruled line in the image data. A ruled line search step for determining whether or not the coordinate conversion step and the ruled line search step are performed in the plurality of formats. For each format, the step of obtaining the ratio of the matched ruled line to the total number of ruled lines for each format and setting the ratio as the degree of matching between the format and the image data, and the format with the highest degree of matching, Determining a corresponding format corresponding to the image data, detecting a positional deviation between a ruled line of the corresponding format and the image data, and correcting the position of the ruled line based on the detected positional deviation; And a step of determining the position of the field surrounded by the ruled line based on the position of the subsequent ruled line.

4. The table recognition method according to claim 2, wherein when the highest degree of coincidence does not reach a predetermined threshold value, it is rejected and the subsequent table recognition process is not performed.

5. A ruled line forming a table is divided into partial units forming one field, and the position is corrected for each divided ruled line partial unit.
Table recognition method described in.

6. A ruled line forming a table is divided into unit parts forming one field, a plurality of continuous parts of the divided ruled line parts are combined, and the position is set in units of the combined part. 4. The table recognition method according to claim 1, wherein the correction is performed.

7. The table recognition method according to claim 1, wherein the position is corrected for each ruled line forming the table.

8. For a table having a format in which some of the fields in the table can be split into multiple fields,
The partial format in the field is prepared as field division format information different from the format information, and further, the field division format in the field is determined using the field division format information, The table recognition method according to claim 2 or 3, wherein the obtained field division format is added to the corresponding format, and the added field is recognized in the same manner as other fields.

9. A ruled line for format determination is prepared in advance by extracting characteristic ruled lines that do not appear in other formats from the ruled lines forming the format and preparing this as ruled line information for format determination. The table recognition method according to claim 2, wherein the format is determined by determining the degree of coincidence between the image and the image.

10. A table recognition device for reading a tabular document composed of ruled lines of a predetermined format as image data using image input means and recognizing the table structure from the image data. A reference line extracting means for extracting the reference line of the table from the table, a coordinate converting means for converting the ruled line of the predetermined format into the coordinates on the image data with reference to the extracted reference line, and the coordinate-converted ruled line. A table recognition device comprising: a position correction unit that detects a position shift from the image data and corrects the position of the ruled line based on the detected position shift.

11. A table recognition device for reading a document in a tabular format composed of ruled lines of a predetermined format as image data using image input means and recognizing the table structure from the image data. Storage means for storing format information that defines the format of the format, reference line extraction means for extracting the reference line of the table from the image data, and ruled line information representing a ruled line of one format from the plurality of format information of the storage means And a matching degree evaluation unit that obtains the matching degree between the read ruled line and the image data, and the matching degree evaluation process is performed on all of the plurality of format information to correspond to each of the plurality of formats. The means for obtaining the degree of coincidence and the format with the highest degree of coincidence are determined to be the corresponding format corresponding to the image data. Means for detecting the positional deviation between the ruled line of the corresponding format and the image data, and correcting the position of the ruled line based on the detected positional deviation, and the ruled line based on the position of the corrected ruled line. And a means for determining the position of the field surrounded by.

12. A table recognition device for reading a tabular document composed of ruled lines of a predetermined format as image data by using an image input means and recognizing the table structure from the image data. Storage means for storing format information that defines the format of the format, reference line extraction means for extracting the reference line of the table from the image data, and ruled line information representing a ruled line of one format from the plurality of format information of the storage means And a coordinate conversion unit for converting the read ruled line into coordinates on the image data with the extracted reference line as a reference, and matching the coordinate-converted ruled line with the image data, A ruled line search means for determining whether or not there is a ruled line corresponding to the coordinate-converted ruled line, and the coordinate conversion means and the ruled line The processing of the searching means is performed for all ruled lines of the plurality of formats, and the ratio of the matched ruled lines to the total number of ruled lines is calculated for each format, and the ratio is used as the degree of matching between the format and the image data. Means for determining the format having the highest degree of coincidence as a corresponding format corresponding to the image data, detecting a positional deviation between the ruled line of the corresponding format and the image data, and based on the detected positional deviation, the ruled line And a means for correcting the position of the field surrounded by the ruled line based on the corrected position of the ruled line.

13. A tabular document composed of ruled lines of a predetermined format and characters written in a field surrounded by the ruled lines is read as image data by using an image input means, A method for automatically reading characters for recognizing characters in a table structure and a field, comprising a reference line extracting step of extracting a reference line of a table from the image data, and a single format from a plurality of format information prepared in advance. The ruled line information representing the ruled lines of the image data is read, and the matching score evaluation step of obtaining the matching score between the read ruled lines and the image data, and the matching score evaluation process are performed for all of the plurality of format information. The step of obtaining the degree of coincidence corresponding to each format, and the format having the highest degree of coincidence A step of determining a corresponding format corresponding to the data, a step of detecting a positional deviation between the ruled line of the corresponding format and the image data, and a step of correcting the position of the ruled line based on the detected positional deviation; Characters characterized by including the step of obtaining the position of the field enclosed by the ruled line based on the position of the ruled line, and the step of performing character recognition by extracting the character pattern contained in the field after position correction Automatic reading method.

14. A tabular document composed of ruled lines of a predetermined format and characters described in a field surrounded by the ruled lines is read as image data by using image input means, An automatic reading device for recognizing characters in a table structure and a field, a storage means for storing format information defining a plurality of table formats, and a reference line extraction for extracting a reference line of the table from the image data. Means, read out ruled line information representing a ruled line of one format from the plurality of format information in the storage means, and obtain a match degree between the read ruled line and the image data; and a process of the match degree evaluation. For all of the plurality of format information, and obtaining a matching degree corresponding to each of the plurality of formats. Means for determining the format having the highest degree of coincidence as a corresponding format corresponding to the image data, detecting a positional deviation between the ruled line of the corresponding format and the image data, and based on the detected positional deviation, the ruled line Of the position of the field surrounded by the ruled line based on the position of the ruled line after the correction, and the character pattern included in the field after the position correction for character recognition. A device for automatically reading characters, characterized by comprising: