JPH0743718B2

JPH0743718B2 - Multimedia document structuring method

Info

Publication number: JPH0743718B2
Application number: JP1264919A
Authority: JP
Inventors: 寛屋代; 達也村上; 好博嶋; 浩道藤澤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1989-10-13
Filing date: 1989-10-13
Publication date: 1995-05-15
Anticipated expiration: 2010-05-15
Also published as: JPH03127169A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は文書処理方式に係り、マルチメディア文書画像
から文書の論理構造（章・節など）を抽出し、ファイル
に格納するのに好適なマルチメディア文書構造化方式に
関するものである。The present invention relates to a document processing method, and is suitable for extracting a logical structure (chapter / section, etc.) of a document from a multimedia document image and storing it in a file. The present invention relates to a multimedia document structuring method.

[Conventional technology]

従来では、文書画像を用いてキーワード検索を行うシス
テムが提案されている。Conventionally, a system for performing a keyword search using a document image has been proposed.

・田中譲，堀井秀行：「トランスメディア・マシン・ア
ンド・イッツ・キーワード・サーチ・オーバー・イメー
ジ・テキスツ」，リアオ′88,1988年、“Transmedia Ma
chine and Its Keyword Search over Image Texts",RIA
O′88,1988 このシステムにおけるキーワード検索の方式は、キーワ
ードの標準画像パターン対文書画像中の文字パターンの
マッチング処理によって実現されている。このため、情
報を既存の文字コードで表現しているワードプロセッサ
等、他のシステムで文書画像の持つ情報を用いることが
できなかった。・ Joh Tanaka, Hideyuki Horii: “Transmedia Machine and It's Keyword Search Over Image Texts”, Rio 1988, 1988, “Transmedia Ma
chine and Its Keyword Search over Image Texts ", RIA
O'88,1988 The keyword search method in this system is realized by the matching process of the standard image pattern of the keyword and the character pattern in the document image. For this reason, it is impossible to use the information that the document image has in other systems such as a word processor that expresses the information in the existing character code.

また、近年の文書処理においては、文書内容として文字
テキスト情報だけでなく、図や表などの非テキスト情報
も統一的に扱うことが要求されている。この文字テキス
ト情報，非テキスト情報が混在している文書をマルチメ
ディア文書と呼ぶ。マルチメディア文書には、複数のメ
ディアが存在し、各メディア（文字領域，写真領域，図
領域等）間に構造が存在する。この構造には、ページや
字間，行間などの空間的な割付けを決めるレイアウト構
造と章・節などの意味的な構造を決める論理構造があ
る。Further, in recent document processing, it is required that not only character and text information but also non-text information such as figures and tables be uniformly handled as document content. A document in which character text information and non-text information are mixed is called a multimedia document. A multimedia document has a plurality of media, and a structure exists between the media (character area, photo area, figure area, etc.). This structure includes a layout structure that determines the spatial layout of pages, character spacing, line spacing, etc., and a logical structure that determines the semantic structure of chapters and sections.

現在、計算機の世界で文書の論理構造を表現するための
形式には次のようなものがある。Currently, there are the following formats for expressing the logical structure of a document in the computer world.

・フォーマッタ（ディー・イー・クヌス，「ザ・テフブ
ック」，アディソン・ウエスレー,1984年，“The TEXbo
ok",ADDISON WESLEY,1984） TEXなどの著名なフォーマッタでは、章・節といった文
書の論理構造を表現するためのコマンドを用意してい
る。このコマンドを用いることによって、章題を強調文
字にしたり、目次の生成を行なうことを可能としてい
る。・ Formatter (Dee Knus, "The Tefbook", Addison Wesley, 1984, "The TEXbo
ok ", ADDISON WESLEY, 1984) Well-known formatters such as TEX provide commands for expressing the logical structure of a document such as chapters and sections. By using this command, the chapter title can be highlighted. , It is possible to generate a table of contents.

・ODA/ODIF,T.73 これらは文書の情報交換を行なうためのデータ形式であ
り、主に規格として存在する。前者のODA/ODIF（「オフ
ィス・ドキュメント・アーキテクチャ（オー・ディー・
エー）アンド・インターチェンジ・フォーマット」，ア
イエスオー8613,1988年，“Office Document Architect
ure（ODA）and Interchange Format",ISO8613,1988）
は、ISOのOSIの規格であり、後者のT.73（「レコメンデ
ーション・ティードット73・ドキュメント・インターフ
ェース・プロトルコ・フォー・ザ・テレマティック・サ
ービス」，シー・シー・アイ・ティー・ティー,1984
年，“Recomendation T.73Document Interface Protoco
l for the Telematic Services",CCITT,1984）はCCITT
の規格である。なお、ODA/ODIFはT.73との互換機能を含
んだ形で開発されている。これらのフォーマットの特徴
は、文書を論理構造と割り付け構造の両者で表現できる
点にある。-ODA / ODIF, T.73 These are data formats for exchanging document information, and are mainly present as standards. The former ODA / ODIF (“Office Document Architecture (OD
A) and Interchange Format ", IS Oh 8613, 1988," Office Document Architect
ure (ODA) and Interchange Format ", ISO8613,1988)
Is the OSI standard of ISO, and the latter T.73 ("Recommendation T. Dot 73 Document Interface Pro Turkey for the Telematic Service", CIT Tea , 1984
Year, “Recomendation T.73 Document Interface Protoco
l for the Telematic Services ", CCITT, 1984) is CCITT
Is the standard. ODA / ODIF is being developed in a form that includes compatibility with T.73. The feature of these formats is that the document can be expressed by both the logical structure and the layout structure.

上記で述べたものは、すべて、文書の整形出力を得るた
めに、論理構造が用いられている。すなわち、文書の内
容と文書の論理構造を入力すれば整形された文書が自動
的に得られる。All of the above mentioned uses a logical structure to get the formatted output of the document. That is, if the contents of the document and the logical structure of the document are input, a formatted document is automatically obtained.

文書の割付け構造は、紙面の印刷上の配置を表現したも
のであり、「書式」や「レイアウト構造」などと呼ばれ
る場合がある。以下では、「レイアウト情報構造」は
「割付け構造」と同様な意味を持つ用語とみなし、「割
付け構造」を用いることで統一する。「書式」について
は、ある文書クラスに対する共通な「割付け構造」を示
すものとみなすこととする。The document layout structure expresses a print layout on paper and is sometimes called a "format" or a "layout structure". In the following, "layout information structure" is regarded as a term having the same meaning as "layout structure", and is unified by using "layout structure". The "format" shall be regarded as indicating the common "allocation structure" for a certain document class.

人間が文書を読む場合、文書の割付け構造を見ながら論
理構造を推測することによって、文書の内容をより正確
に把握することができる。逆に言えば、文書の割付け構
造は、文書の論理をより効果的に判り易くするためにあ
る。また、前項で述べたように、計算機における文書の
論理構造の表現は、文書の整形のために用いられてい
る。文書の整形とは、割付け構造を文書に加えることに
他ならない。When a person reads a document, the content of the document can be grasped more accurately by inferring the logical structure while looking at the layout structure of the document. Conversely, the layout structure of the document is to make the logic of the document easier to understand. Further, as described in the previous section, the representation of the logical structure of the document in the computer is used for shaping the document. Formatting a document is nothing more than adding a layout structure to the document.

以上のことから、文書においては、論理構造と割付け構
造が密接な関係にあると考えられる。従って、文書の論
理構造を理解するための手段として、文書の割付け構造
を用いることが有用であると考えられる。From the above, it is considered that the logical structure and the layout structure are closely related in the document. Therefore, it is considered useful to use the layout structure of the document as a means for understanding the logical structure of the document.

前述の画像ファイリングシステムにおいて、格納してあ
る文書画像から文書の構造を求める方式がいくつか提案
されている。In the above-mentioned image filing system, some methods have been proposed for obtaining the structure of a document from a stored document image.

・辻本他：「英文文書のレイアウト理解」、昭和63年電
気通信情報学会春季全国大会論文集,D−477,昭和63年・西村他：「レイアウト構造による紙面識別に関する検
討」，電子通信学会技術報告PRU87−120,昭和62年上記の２つの方式は、文字単位に抽出した複数の矩形領
域から、ボトムアップに割り付け構造を生成する方式で
ある。生成した構造を用いることによって、構造の類似
性判断を可能としている。しかし、意味構造に関する知
識を持たないために、抽出する文書の要素を判定して、
書誌事項を抽出することができない。・ Tsujimoto et al .: "Understanding layout of English documents", 1988 IEICE Spring National Conference, D-477, 1988. ・ Nishimura et al .: "Study on paper identification by layout structure", IEICE technology. Report PRU87-120, 1987 The above two methods generate bottom-up layout structure from multiple rectangular areas extracted in character units. By using the generated structure, it is possible to judge the similarity of the structures. However, since he does not have knowledge about the semantic structure, the elements of the document to be extracted are judged,
Bibliographic items cannot be extracted.

・東野他：「矩形領域の集合表現に基づく知識表現言語
FDLと文書画像理解への応用」，電子通信学会技術報告P
RU86−31,昭和61年書式定義言語を用いてトップダウンにタイトルの著者名
など書誌事項の抽出を行う方式であるが、テキスト中の
章・節など階層を持った構造を抽出してデータ構造に変
換することはできない。また、入力した文書画像が定義
された書式を満たさない場合には、リジェクトすること
を可能としている。論文の標題ページにおけるタイトル
など意味的な情報が常に文書の同じ場所に割り付けられ
る場合には有効であるが、割り付けられる位置や個数が
文書によって異なる章，節の抽出には対応できない。こ
の方式では、文書を分割しているだけであるため、文書
の要素を抽出することはできるが、各要素間の関連をデ
ータ構造として抽出することができなかった。Higashino et al .: "Knowledge Representation Language Based on Set Representation of Rectangular Area
FDL and its application to document image understanding ”, IEICE Technical Report P
RU86-31, 1986 This is a method of extracting bibliographic items such as the author name of the title from the top down using the format definition language, but the structure of the data such as chapters and sections in the text is extracted. Cannot be converted to. Further, if the input document image does not satisfy the defined format, it can be rejected. This is effective when semantic information such as the title on the title page of a paper is always assigned to the same place in a document, but it cannot handle extraction of chapters or sections whose assigned position and number differ depending on the document. In this method, since the document is only divided, the elements of the document can be extracted, but the relation between the elements cannot be extracted as a data structure.

また、文書の意味構造のうち、レイアウト構造の各要素
と対応している一階層しか持たない要素、たとえば書誌
事項や図表などの抽出は可能であったが、上記の章，
節，項など階層的な構造を構成する要素を抽出すること
はできなかった。In the semantic structure of a document, it is possible to extract elements having only one layer corresponding to each element of the layout structure, such as bibliographic items and charts.
It was not possible to extract the elements that make up the hierarchical structure such as clauses and terms.

[Problems to be solved by the invention]

従来技術の問題点を整理すると次のようになる。 The problems of the prior art are summarized as follows.

従来の技術では、処理対象が文書画像１ページのみであ
り、ページ中の領域の関係を抽出することは可能であっ
たが、複数ページにわたる領域の関係を抽出することは
できなかった。In the conventional technique, the processing target is only one page of the document image, and it is possible to extract the relationship between the areas in the page, but it is not possible to extract the relationship between the areas over a plurality of pages.

従来の技術では、文書画像から文字列・写真・図形など
の文書の構造要素を矩形領域として分離・抽出する。こ
こで抽出した矩形領域の絶対座標および相対座標から、
各矩形領域の位置関係を調べることによって、ボトムア
ップもしくはトップダウンに文書の割り付け構造を解析
する。その結果書誌事項を抽出したり、文書の割り付け
構造を用いた紙面の類似判断を行うことが可能となる。
しかし、従来の技術では、解析の結果得られた関係情報
をファイルなどの２次記憶装置に出力する手段を持って
いなかったため、検索する度毎に解析を行なわなければ
ならなかった。In the conventional technique, the structural elements of the document such as character strings, photographs and figures are separated and extracted as rectangular areas from the document image. From the absolute coordinates and relative coordinates of the rectangular area extracted here,
The layout structure of a document is analyzed bottom-up or top-down by examining the positional relationship of each rectangular area. As a result, it becomes possible to extract bibliographic items and make a similarity determination on the paper surface using the document layout structure.
However, in the conventional technique, since there is no means for outputting the relation information obtained as a result of the analysis to the secondary storage device such as a file, the analysis must be performed each time the retrieval is performed.

[Means for Solving the Problems]

前記した従来技術における課題を解決するため本発明は
次下の手段を有することを特徴とする。In order to solve the above-mentioned problems in the prior art, the present invention has the following means.

まず、文書の論理構造と階層的に表現可能な手続きを定
める。文書の割り付け構造から文書の論理構造を推定す
るための手段を設ける。この手段を設けることによっ
て、文書画像から文書の論理構造を抽出することが可能
となる。First, we define the logical structure of the document and the procedure that can be expressed hierarchically. A means for estimating the logical structure of a document from the document allocation structure is provided. By providing this means, it becomes possible to extract the logical structure of the document from the document image.

次に、文書クラスの論理構造を表現する手段と文書クラ
スに属する一文書の特定な論理構造を設ける。ここでい
う、文書クラスは、共通な割付け構造・論理構造を持つ
文書の集合を意味する。また、文書クラスの論理構造を
階層的に表現する手段を設け、かつ、前記の論理構造を
推定する手段において推定された論理構造の要素と前記
文書クラスの論理構造の要素に対応付ける手段を設け
る。以上の手段を用いることによって、入力文書に固有
の論理構造を生成することが可能となる。Next, a means for expressing the logical structure of the document class and a specific logical structure of one document belonging to the document class are provided. Here, the document class means a set of documents having a common layout structure / logical structure. Further, means for hierarchically expressing the logical structure of the document class is provided, and means for associating the element of the logical structure estimated by the means for estimating the logical structure with the element of the logical structure of the document class is provided. By using the above means, it becomes possible to generate a logical structure unique to the input document.

[Action]

本発明の方式を用いることによって、共通な文書の割付
け構造・論理構造、その両者の関係を記述しておくこと
で、スキャナ等で入力したマルチメディア文書から文書
に固有な割り付け構造及び論理構造を抽出することがで
きる。例えば、学会論文に関する文書の割付け構造、論
理構造、及び両者の関係には共通のフォーマットがあ
る。本発明では、この共通な部分を予め記述しておくこ
とによって、論文一つ一つの割付け構造と論理構造を抽
出することができる。By using the method of the present invention, the common document allocation structure / logical structure and the relationship between them are described, so that a unique allocation structure and logical structure can be determined from a multimedia document input by a scanner or the like. Can be extracted. For example, there is a common format for the layout structure, the logical structure, and the relationship between the documents related to academic papers. In the present invention, by describing this common part in advance, it is possible to extract the layout structure and logical structure for each paper.

抽出した論理構造に関しては、抽出した結果の論理構造
と画像、あるいは、この画像について文字認識を行った
結果の文字テキストを合わせてファイルに構造化して格
納する。格納した構造化マルチメディア文書のデータを
用いることで、論理構造情報を含めた文書の検索が可能
となる。例えば、章題のみを検索したり、目次を参照し
たりすることが可能となる。As for the extracted logical structure, the extracted logical structure and an image, or the character text resulting from character recognition of this image are combined and structured and stored in a file. By using the stored structured multimedia document data, it becomes possible to search the document including the logical structure information. For example, it is possible to search only the chapter title or refer to the table of contents.

〔Example〕

第１図は、本発明のマルチメディア文書構造化方式の一
実施例を示すブロック図である。第１図を用いて本発明
の方式について簡単に説明する。FIG. 1 is a block diagram showing an embodiment of a multimedia document structuring system of the present invention. The method of the present invention will be briefly described with reference to FIG.

図中の100で示すカラーマルチメディア文書とは、テキ
スト，カラー写真，図・表が混在する文書のことを言
う。このカラーマルチメディア文書を、カラースキャナ
などの手段で計算機内の記憶装置に格納する場合、大き
な容量を必要とする。例えば、A4サイズの文書を8dot/m
mの解像度、RGB（赤・緑・青）各色256階調の色表現で
読み込んだ場合、１頁当たり12MBの容量が必要である。A color multimedia document indicated by 100 in the figure means a document in which text, color photographs, and figures / tables are mixed. When this color multimedia document is stored in the storage device in the computer by means such as a color scanner, a large capacity is required. For example, A4 size document is 8dot / m
If you read in a color representation of m resolution and 256 gradations of RGB (red, green, blue), 12 MB per page is required.

本発明のマルチメディア構造化方式では、まず、カラー
領域抽出部110で、入力したマルチメディアカラー文書
画像を単一色で表現できる部分とそうでない部分に分離
する。単一色で表現できる部分とはテキスト部分や単一
色で表現された図・表部分であり、単一色で表現できな
い部分はカラー写真である。以下、単一色で表現できな
い部分のことをフルカラー領域と呼ぶ。カラー領域抽出
部110では、入力マルチメディア文書画像100中のフルカ
ラー領域の存在する領域の外接矩形領域を抽出する。同
様に単一色の領域も外接矩形領域として抽出する。カラ
ー領域抽出部110で抽出したフルカラー領域はカラー補
正部111を経た後、カラー画像圧縮部112でカラー画像の
圧縮を行う。In the multimedia structuring method of the present invention, first, the color area extracting unit 110 separates the input multimedia color document image into a portion that can be expressed by a single color and a portion that cannot be expressed. The part that can be expressed in a single color is a text part or the figure / table part expressed in a single color, and the part that cannot be expressed in a single color is a color photograph. Hereinafter, a portion that cannot be represented by a single color is called a full-color area. The color area extraction unit 110 extracts the circumscribed rectangular area of the area in the input multimedia document image 100 where the full-color area exists. Similarly, a single color area is also extracted as a circumscribed rectangular area. The full-color area extracted by the color area extraction unit 110 passes through the color correction unit 111, and then the color image compression unit 112 compresses the color image.

次に、前記のカラー領域抽出部110で抽出した単一色の
画像データを用いて、２値化処理120を行う。この処理
で、以降処理すべきデータを単一色のデータに絞り込
み、データ量を1/3にする。Next, binarization processing 120 is performed using the single color image data extracted by the color area extraction unit 110. By this processing, the data to be processed thereafter is narrowed down to single color data, and the data amount is reduced to 1/3.

前記の２値化処理部120を経て得られた画像データは書
誌事項抽出処理部130に送られる。書誌事項は、論文の
表題ページ中のタイトル、著者名,UDC分類番号，パージ
番号，柱などである。これらは書誌事項は、論文の種類
ごとに一定の書式で記載されている。この書式の情報を
論文の種類ごとに予め記述しておき、東野他：「矩形領
域の集合演算に基づく知識表現言語FDLと文書画像理解
への応用」信学技報PRU86−31,昭和61年で述べられてい
る方式を用いることで、書誌事項の抽出が可能となる。The image data obtained through the binarization processing unit 120 is sent to the bibliographic item extraction processing unit 130. Bibliographic information includes titles, author names, UDC classification numbers, purge numbers, pillars, etc. in title pages of articles. The bibliographic items are described in a certain format for each type of paper. Information in this format is described in advance for each type of paper, and Higashino et al .: "Knowledge Representation Language FDL Based on Set Operations of Rectangular Areas and Its Application to Document Image Understanding," IEICE Tech. PRU86-31, 1986 It is possible to extract bibliographic items by using the method described in.

また、図表領域抽出処理部140では、図表領域の抽出が
行われる。インデックス情報抽出部141では、図表領域
抽出部140で抽出した図表領域からキーワードとなるべ
き情報を抽出し、線画認識部142で画像として表現され
ている情報をベクトルデータで表現する。In addition, the chart area extraction processing unit 140 extracts the chart area. The index information extraction unit 141 extracts information that should be a keyword from the graphic region extracted by the graphic region extraction unit 140, and the information represented as an image by the line drawing recognition unit 142 is represented by vector data.

前記の処理110,130,140で得られたカラー写真領域，書
誌領域，図・表領域を除いた領域が本文の領域である。
処理150では本文の領域を抽出し、行単位に分割する。The area excluding the color photographic area, the bibliographical area, and the figure / table area obtained by the above-described processing 110, 130, 140 is the text area.
In process 150, the body region is extracted and divided into lines.

処理151では処理130,141,150で得られた書誌事項領域，
インデックス情報領域，本文領域中の文字パターンにつ
いては文字認識処理を行なう。この処理では、文字パタ
ーンから文字を表現するための文字コードとフォント情
報を得る。処理152では文字認識処理部150で抽出した文
字を正しく認識したかどうかを判定するために辞書の照
合を行う。In process 151, the bibliographic item area obtained in processes 130, 141, 150,
Character recognition processing is performed for character patterns in the index information area and body area. In this processing, a character code and font information for expressing a character are obtained from the character pattern. In process 152, the dictionary is collated to determine whether or not the character extracted by the character recognition processing unit 150 is correctly recognized.

処理160では、処理150で得られた行領域の座標と処理15
1で得られた文字コード及びフォント情報を用いて論理
構造の要素の抽出を行なう。In process 160, the coordinates of the line area obtained in process 150 and process 15 are processed.
The elements of the logical structure are extracted using the character code and font information obtained in 1.

論理構造生成部170では、処理160で抽出された論理構造
を計算機内で表現するためのデータを生成する。処理11
2,処理142,処理152では、それぞれカラー画像，線画，
文字が分離・抽出され、各メディアに適した表現に変換
される。これらのばらばらにした文書内容を、論理構造
生成部で生成した構造によって、関係づける。The logical structure generation unit 170 generates data for expressing the logical structure extracted in the process 160 in the computer. Processing 11
2, processing 142, processing 152, color image, line drawing,
Characters are separated and extracted, and converted into an expression suitable for each media. The separated document contents are related by the structure generated by the logical structure generation unit.

以上の処理を経て、マルチメディアカラー文書101か
ら、マルチメディア構造化ファイル180で得られる。Through the above processing, the multimedia structured file 180 is obtained from the multimedia color document 101.

論理構造抽出部160の実施例について述べる前に、論理
構造を抽出するための原理について述べる。Before describing the embodiment of the logical structure extraction unit 160, the principle for extracting the logical structure will be described.

文書の構造には論理構造と割り付け構造がある。論理構
造は章・節などの文書の意味的な構造のことであり、割
り付け構造は、紙面の印刷上の配置を表現したものであ
る。本節では、文書画像から文書の割り付け構造を用い
て、文書の論理構造を抽出する方法について述べる。The document structure has a logical structure and a layout structure. The logical structure is a semantic structure of a document such as a chapter or a section, and the layout structure is a representation of a print layout on a paper surface. This section describes a method for extracting the logical structure of a document from the document image using the layout structure of the document.

文献（文部省大学学術局編，「ドキュメンテーションハ
ンドブック），東京電機大学出版局，第22頁から第25
頁，昭和45年）では、文書（特に論文）の論理構造は、
（ｉ）表題、（ii）著者名、（iii）抄録、（iv）目
次、（ｖ）使用記号、特殊記号の一覧、（Vi）まえが
き、（vii）本論、（Viii）むすび、（ix）謝辞、
（ｘ）引用文献、（xi）討論，回答から構成されるべき
であると述べられている。Literature (Ministry of Education, Academic Affairs Bureau, "Documentation Handbook", Tokyo Denki University Press, pages 22 to 25)
Page, 1965), the logical structure of documents (especially papers) is
(I) title, (ii) author's name, (iii) abstract, (iv) table of contents, (v) symbols used, list of special symbols, (Vi) preface, (vii) main subject, (Viii) conclusion, (ix) Thanks,
It is stated that it should consist of (x) cited documents, (xi) discussion, and answers.

（vii）の本論については、章・節・段落に細分化され
る。さらに、マルチメディア文書となれば、図や表など
のテキストとは異なったメディアが含まれることにな
る。文献では、これらの論理構造が、文書を介しらコミ
ュニケーションを円滑に行なうために必要なものとされ
ている。The subject of (vii) is subdivided into chapters, sections, and paragraphs. Further, a multimedia document will include media different from text such as figures and tables. In the literature, these logical structures are necessary for smooth communication through documents.

前述した規格ODA/ODIFでは文書の論理構造を記述する際
に、共通論理構造と特定論理構造の２つを用いている。
共通論理構造は、ある文書クラスにおける共通な論理構
造を表現したものである。ここで言う文書クラスとは、
実存する文書の上位概念を相当するものであり、例えば
「情報処理学会の論文誌に掲載されている論文の集合」
などが挙げられる。また、特定文書論理構造とは、ある
特定の文書の論理構造を表現したものである。In the above-mentioned standard ODA / ODIF, two common logical structures and specific logical structures are used when describing the logical structure of a document.
The common logical structure expresses a common logical structure in a certain document class. The document class here is
It corresponds to a superordinate concept of an existing document, and is, for example, "a set of papers published in a journal of the Information Processing Society of Japan".
And so on. In addition, the specific document logical structure represents the logical structure of a specific document.

第２図は、文書クラスとして雑誌「HITACHI REVIEW」を取り上げ、その中に掲載されてい
る論文の共通論理構造をODA/ODIFで表現したものであ
る。図中の210で示すSEQは順序関係を表わし、下部の構
成要素の組に順序関係があることを意味する。「論文」
200は、「UDC」210,「表題」220,「著者名」230,「要
約」240,「本文」250,「参考文献リスト」260という順
番で構成される。図中の211で示すREPは繰返し構造を表
わし、下部の構成要素の組が複数存在することを意味す
る・「本文」250は複数の「章」から構成される。ま
た、図中の212で示すSELは下部の構成要素のどれか一つ
を任意に選択するという意味を持つ。「図・表」2513は
任意存在する。Figure 2 shows the journal “HITACHI REVIEW” as a document class, and the common logical structure of the articles published in it is expressed in ODA / ODIF. In the figure, 210 represents an order relation, which means that the set of lower components has an order relation. "paper"
200 is composed of "UDC" 210, "title" 220, "author name" 230, "summary" 240, "text" 250, and "reference list" 260 in this order. REP indicated by reference numeral 211 in the figure represents a repeating structure, which means that there are a plurality of sets of constituent elements at the bottom. “Body” 250 is composed of a plurality of “chapter”. Further, SEL indicated by 212 in the figure has the meaning of arbitrarily selecting one of the lower components. The “figure / table” 2513 is optional.

上記の「HITACHI REVIEW」を文書クラスの例として、そ
の論理構造を抽出するために必要な書式を説明する。Using "HITACHI REVIEW" above as an example of a document class, the format required to extract its logical structure is explained.

（１）章題，節題の抽出章題，節題の行間は本文中の行間に比べて広い。また、
章題・節題で使用されている文字フォントは本文中で使
用されているものとは異なる。(1) Extraction of chapter and section The line spacing of chapters and sections is wider than the line spacing in the text. Also,
The character fonts used in chapters and sections are different from those used in the text.

（２）章，節の抽出章題，節題の下に続いている。(2) Extraction of chapters and sections Following chapters and sections.

（３）段落の抽出段落の行頭は字下げが存在する。(3) Extraction of paragraphs Indentation exists at the beginning of paragraphs.

（４）参考文献リストの抽出上記（２）の章，節の抽出と同様な考え方で抽出でき
る。ただし、参考文献リストを示すヘッダ“REFERENCE
S"がセンタリングされている。章題，節題は左寄せであ
る。(4) Extraction of reference list This can be extracted in the same way as the extraction of chapters and sections in (2) above. However, the header "REFERENCE" showing the reference list
S "is centered. Chapters and sections are left-justified.

（５）各参考文献の抽出参考文献リストは、章，節の一種として見ることができ
る。すると、各参考文献は段落に相当する。普通の章，
節における段落と異なる点は、各参考文献はハンギング
パラグラフであるということである。すなわち、各参考
文献は先頭の行が左詰めで残りの行は字下げされてい
る。前述した文書の論理構造に関する知識をもとに、マ
ルチメディア文書から章・節の抽出を行なう。マルチメ
ディア文書から章・節を抽出するために用いる方式を次
に示す。(5) Extracting each reference The reference list can be viewed as a type of chapter or section. Each reference then corresponds to a paragraph. Ordinary chapter,
The difference with paragraphs in sections is that each reference is a hanging paragraph. That is, in each reference, the first line is left-justified and the remaining lines are indented. Chapters and sections are extracted from multimedia documents based on the knowledge of the logical structure of documents described above. The method used to extract chapters / sections from a multimedia document is shown below.

（１）マルチメディア文書をテキスト領域と非テキスト
領域に分離する。非テキスト領域には、図や表，写真な
どが存在する。この処理は、マルチメディア文書におけ
る論理構造抽出のための前処理部分に相当する。(1) Separate a multimedia document into a text area and a non-text area. In the non-text area, there are figures, tables, photographs, etc. This process corresponds to the preprocessing part for extracting the logical structure in the multimedia document.

（２）まず、割付け構造に着目して、論理構造の構成要
素に分離する。たとえば、ページ中の本文部分の切り出
し、本文部分がマルチカラムであれば、カラム単位に分
離する。(2) First, pay attention to the layout structure and separate it into the components of the logical structure. For example, if the body part of a page is cut out and the body part is multi-column, it is separated into column units.

（３）カラム単位に分離したら、その中の行，単語を要
素として論理構造を抽出する。これは、章・節に関連す
る論理構造は、前項で述べたとおり、行・単語レベルの
書式から得ることができるためである。(3) Once separated into columns, the logical structure is extracted using the lines and words in the columns as elements. This is because the logical structure related to chapters / sections can be obtained from the line / word level format as described in the previous section.

本実施例においては、文書の論理構造を抽出するための
手段として、書式定義言語FDL（Form Definition Langu
age）を用いた。書式定義言語FDLは、文書の書式を定義
し、与えられた文書を書式の各構成要素に分解する機能
を持つ。In this embodiment, as a means for extracting the logical structure of a document, the FDL (Form Definition Language) is used.
age) was used. The format definition language FDL has a function of defining a format of a document and decomposing a given document into each component of the format.

第３図は、文書の章題・節題に関連する書式を示すため
の図である。第３図では文書の行を矩形で示してある。
301はそれぞれ本文を示す行である。302は章題・節題を
示す行である。303,304,305,306はそれぞれ矩形領域の
Ｙ座標を示している。前記したように、章題・節題領域
302は本文領域301と比べて、行間が広い。これは、303,
304の距離、305、306の距離が他の矩形領域よりも広い
ことを表現している。FIG. 3 is a diagram showing a format related to a chapter / section of a document. In FIG. 3, the lines of the document are shown as rectangles.
Each 301 is a line showing the text. 302 is a line indicating a chapter / section. Reference numerals 303, 304, 305, and 306 respectively indicate the Y coordinates of the rectangular area. As mentioned above, chapter and section areas
302 has a wider line space than the text area 301. This is 303,
It expresses that the distance of 304 and the distances of 305 and 306 are wider than other rectangular areas.

第４図は、書式定義言語FDLで章題・節題に対する書式
を表現した例である。まず、前半の３行では、矩形領域
の間の空間について記述している。401のSPACEは矩形領
域間の空白について記述するための述語である。また、
?YO,?Y1は次に示す条件で得られた空白の位置を示す座
標を記憶するための変数である。402では、空白を調べ
る際にＹ座標方向の少ない方から調べることを指定して
いる。403では、空白の大きさが2.5mm以上であることを
指定している。この数値は、行間が行の高さの３倍であ
るという知識をもとに割り出した数値である。以上の手
続きで、章題・節題を表す矩形領域の上部の空白の座標
?Y0、および、?Y1の座標が得られる。次の４行では、章
題・節題の矩形領域の下部の空白を取り出す。最初の３
行は、401,402,403で説明したものと同じ意味を持つ。4
04で示す部分で、空白を探索する領域を狭めている。こ
れは、再び、前の３行で取り出した空白と同じ部分を取
り出さないためである。Fig. 4 is an example of expressing the formats for chapters and sections in the format definition language FDL. First, in the first three lines, the space between the rectangular areas is described. SPACE of 401 is a predicate for describing a space between rectangular areas. Also,
? YO and? Y1 are variables for storing the coordinates indicating the position of the blank obtained under the following conditions. In 402, when checking a blank, it is specified to check from the side with the smallest Y coordinate direction. In 403, the size of the blank is specified to be 2.5 mm or more. This value is a value calculated based on the knowledge that the line spacing is three times the line height. With the above procedure, the blank coordinates at the top of the rectangular area representing the chapter / section
The coordinates of? Y0 and? Y1 are obtained. In the next four lines, the blank space at the bottom of the rectangular area of the chapter / section is extracted. First 3
The row has the same meaning as described in 401, 402, 403. Four
In the area indicated by 04, the area to search for blanks is narrowed. This is because the same part as the blank extracted in the previous three lines is not extracted again.

第５図は本発明の方式の一実施例であるマルチメディア
文書構造化方式を採用したファイリング装置の機能ブロ
ック図である。装置全体の構成と動作を説明する。FIG. 5 is a functional block diagram of a filing apparatus adopting a multimedia document structuring system which is an embodiment of the system of the present invention. The configuration and operation of the entire device will be described.

本装置は構造化したい文書をディジタル画像として読み
込むための画像走査部501、原画像や構造化した文書の
テキスト・部分画像を表示するためのディスプレイ50
3、読み込んだ文書画像をページ単位で記憶するため頁
イメージ記憶部502、文書の構造化処理を行なうために
必要な書式情報を格納する書式データ格納部504、構造
化した文書を蓄積する文書格納部505、読み込んだ文書
画像の構造解析をおこなうための文書構造解析部506、
文字パターン化されたデジタル画像から文字を認識する
ための文字認識部507、および、膨張・収縮・まびきな
どの画像処理を行なうための画像処理部508から構成さ
れる。This apparatus includes an image scanning unit 501 for reading a document to be structured as a digital image, and a display 50 for displaying an original image and text / partial images of a structured document.
3. A page image storage unit 502 for storing the read document image in page units, a format data storage unit 504 for storing the format information necessary for structuring the document, and a document storage for accumulating the structured document. Unit 505, a document structure analysis unit 506 for performing a structure analysis of the read document image,
It is composed of a character recognition unit 507 for recognizing characters from a digital image formed into a character pattern, and an image processing unit 508 for performing image processing such as expansion / contraction / blinking.

処理の流れの概要を次に説明する。入力すべき文書510
を画像走査部501にセットする。画像走査部501で書類51
0を光学的に走査して画像データとして入力し、頁イメ
ージ記憶部502に格納する。画像データ110は画像の濃淡
画像の濃淡データに対してある閾値を定めて２値化した
り２値画像データからなる。次に、頁イメージ記憶部50
3に格納された入力画像データを構造化するため、入力
文書の書式を記述したデータを書式データ格納部504か
ら読み込む。文書構造解析部506は、書式データ記憶部5
04から読出された書式データをもとに、頁イメージ記憶
部502に格納されている文書画像データの構造解析を行
なう。なお、この文書構造解析部506では、書式データ
記憶部504に格納されている書式データに応じて、適
宜、文字認識部507、および画像処理部508を呼び出す。The outline of the processing flow will be described below. Documents to be entered 510
Is set in the image scanning unit 501. Document 51 in the image scanning unit 501
0 is optically scanned, input as image data, and stored in the page image storage unit 502. The image data 110 is composed of binary image data or binarized by setting a certain threshold for the grayscale data of the image. Next, the page image storage unit 50
In order to structure the input image data stored in 3, the data describing the format of the input document is read from the format data storage unit 504. The document structure analysis unit 506 includes a format data storage unit 5
Based on the format data read from 04, the structure analysis of the document image data stored in the page image storage unit 502 is performed. The document structure analysis unit 506 appropriately calls the character recognition unit 507 and the image processing unit 508 according to the format data stored in the format data storage unit 504.

次に、本発明方式の一部である論理構造抽出方式につい
て説明する。第６図は本発明の方式を用いた場合のPAD
図（Program Analysis Diaglam）である。本方式におけ
るマルチメディア文書画像の構造解析では、まず、文字
列領域の抽出600を行なう。Next, a logical structure extraction method which is a part of the method of the present invention will be described. FIG. 6 is a PAD when the method of the present invention is used.
It is a figure (Program Analysis Diaglam). In the structure analysis of the multimedia document image in this method, first, a character string area is extracted 600.

次に、ページ番号や柱部分などを除いた文字列領域の抽
出601を行なう。文字列領域の抽出では、図・表の非文
字列領域やページ番号・柱部分を除いた領域を、本文文
字列本文領域とする。この文字列本文領域に対して、カ
ラム単位の領域分離602、行単位の領域分離603、単語単
位の領域分離604を行なう。これらの領域分離は、前も
って定義してある書式情報をもとにして行なう。このと
きに用いる書式情報はカラム間の空白領域の大きさ，行
間スペース，単語間スペースの値である。Next, extraction 601 of the character string area excluding the page number and the pillar portion is performed. In the extraction of the character string area, the area excluding the non-character string area of the figure / table and the page number / column portion is defined as the body character string body area. For this character string body area, column-based area separation 602, line-based area separation 603, and word-based area separation 604 are performed. These areas are separated based on the format information defined in advance. The format information used at this time is the value of the blank area between columns, the space between lines, and the space between words.

テキスト・非テキストの分離のためのアルゴリズムの処
理について説明する。テキスト・非テキストの分離で
は、図や表の領域と比較した場合に文字の領域は行方向
に広がる傾向を持つという知識を用いる。例えば、横書
きの文書であれば、横方向は黒画素が詰まっているが、
縦方向は行間ごとに白画素の領域が表れる。しかし、図
や表は、領域のほぼ前面にわたって画素が分布してい
る。この知識を利用し、前処理に画像処理の手法を適用
する。The processing of the algorithm for separating text / non-text will be described. In the separation of text and non-text, the knowledge that the character area tends to expand in the row direction when compared to the figure or table area is used. For example, in a horizontally written document, black pixels are blocked in the horizontal direction,
In the vertical direction, a white pixel area appears for each line interval. However, in the figures and tables, pixels are distributed almost in front of the region. Using this knowledge, we apply image processing techniques to preprocessing.

第７図は、論理構造の抽出を詳細に説明したPAD図であ
る。論理構造の抽出処理は、カラムの領域について行単
位に処理を行なう。まず、処理対象行として一番上の行
を選択する処理701を行なう。次に、論理構造の処理対
象とする行の両側の行間を求める処理702を行なう。論
理構造の抽出処理を行なう前に行単位の外接矩形領域の
座標値を求めておけば、画像処理などの複雑な処理を行
なわずとも効率的に行間を調べることができる。次に、
ステップ703で、両側の行間が本文中の行間よりも広い
場合には、この行は章題・節題であるとみなす。FIG. 7 is a PAD diagram illustrating in detail the extraction of the logical structure. The logical structure extraction processing is performed for each row in the column area. First, a process 701 of selecting the top line as a process target line is performed. Next, a process 702 for finding the line spacing on both sides of the line to be processed in the logical structure is performed. If the coordinate value of the circumscribing rectangular area for each line is obtained before the extraction process of the logical structure, the space between lines can be efficiently checked without performing complicated processing such as image processing. next,
In step 703, when the line spacing on both sides is wider than the line spacing in the text, this line is considered to be a chapter / section.

次に、章・節の構造化処理を行なう。章・節の構造化処
理とは、本文を章単位、または節単位に分離する処理の
ことである。Next, the chapter / section structuring process is performed. The chapter / section structuring process is a process of separating the text into chapter units or section units.

第８図は章・節の構造化処理の説明図である。章・節の
構造化処理では、前述の処理703で章題・節題であると
判定された処理対象行に章・節であることを示すインデ
スクを付加する処理801を行なう。また、次に、章題・
節題の前の行をインデクスが章・節の終わりであること
を示すインデクスを付加する処理80を行なう。以上の処
理を用いることによって、章・節単位の分離をすること
ができる。前述の２つの処理801,802で付加したインデ
クスによって、章・節の領域を取り出すことができる。
この領域についてパラグラフの分離処理を行なう。FIG. 8 is an explanatory diagram of the structuring process of chapters / sections. In the chapter / section structuring process, a process 801 for adding an indesk indicating a chapter / section to the processing target line determined to be the chapter / section in the above-described process 703 is performed. Also, next,
A process 80 for adding an index indicating that the index is the end of a chapter / section to the line before the section title is performed. By using the above processing, it is possible to separate each chapter / section. The chapter / section area can be extracted by the index added in the above-described two processes 801 and 802.
Paragraph separation processing is performed for this area.

パラグラフの分離処理では、インデンテーション（字下
げ）の情報を用いる。字下げは、パラグラフに対する一
般的な書式情報である。Indentation (indentation) information is used in the paragraph separation process. Indentation is general formatting information for paragraphs.

第９図は、パラグラフの分離処理を詳細に述べたもので
ある。パラグラフの分離処理では、まず、ステップ901
で、処理対象行を章・節として分離した領域の最初の行
とする。処理対象行は、次の一連の処理が終了したら、
次の行に切り替える。FIG. 9 details the paragraph separation process. In the paragraph separation process, first, step 901.
Then, the line to be processed is the first line of the area separated into chapters and sections. The line to be processed is
Switch to the next line.

ステップ902では、章として抽出した領域に対して、処
理対象行の左側の空間を調べる。この左側の空間が字下
げの情報となる。判定処理903では、もし、左側の空間
が章として抽出した領域よりも下がっているかどうかの
判定を行う。もし、ここで字下げが行われていることが
確認されれば、判定処理904を行う。判定処理904では、
この処理対象行が章・節での第１行目であれば、処理対
象行を第１パラグラフの先頭行とし、そうでなければ、
処理対象行の前の行までを第１パラグラフとするインデ
ックスを付加する。第１パラグラフ目の第１行目は、イ
ンデンテーションが行われないことがあるために、この
処理を行う必要がある。In step 902, the space on the left side of the row to be processed is checked for the area extracted as the chapter. The space on the left side is the indentation information. In the determination processing 903, it is determined whether the left space is lower than the area extracted as the chapter. If it is confirmed here that the indentation is being performed, the determination processing 904 is performed. In the determination processing 904,
If this line to be processed is the first line in the chapter / section, the line to be processed is the first line of the first paragraph, and otherwise,
An index is added to the line before the line to be processed as the first paragraph. This processing needs to be performed on the first line of the first paragraph because indentation may not be performed in some cases.

上述の処理を上から順番に行ない、１つのカラムに対し
て処理を行なったら、次のカラム（右のカラム）に対し
ても同様な処理を行なう。このとき、章・節を示すため
のインデックスやフラグは、前のカラムの状態のまま保
持する。The above-described processing is performed in order from the top, and after processing one column, the same processing is performed for the next column (right column). At this time, the indexes and flags for indicating the chapters / sections are retained in the state of the previous column.

このように、カラム単位に処理を行なっていくため、論
文、雑誌など一つの文書が複数ページにまたがっていて
も論理構造の抽出が可能である。In this way, since processing is performed in column units, it is possible to extract the logical structure even if one document such as a paper or a journal spans multiple pages.

また、行単位の抽出が行われた時点で、処理対象行につ
いて文字認識の処理も行なう。これは、行間といった大
局的な書式情報だけでは章題・節題を抽出できない場合
があるためである。このような場合に対処する方法につ
いて第10図を用いて説明する。第10図は、行間・字間と
いった大局的な書式情報だけでは論理構造を分離抽出で
きない文書の例である。第10図（ａ）に示す矩形領域10
01はパラグラフの最後行を示す領域、矩形領域1002はタ
イトル行を示す領域、矩形領域1003は次の章の最初のパ
ラグラフの先頭行を示す領域、矩形領域1004は矩形領域
1003で示した行に続く行を示す領域である。第10図
（ｂ）に示す矩形領域1012はパラグラフの最後行を示す
領域、矩形領域1011は矩形領域1012で示した行の前の行
を示す領域、矩形領域1013は矩形領域1011,1012で示し
た行と同じ章に含まれる次のパラグラフの先頭行を示す
領域、矩形領域1014は矩形領域1013を示した行に続く行
を示す領域である。第10図（ａ）の矩形領域1001の文字
列は章題を示しており、第10図（ｂ）の矩形領域1002は
パラグラフの最後を示している。しかし、各矩形領域の
幾何学的な位置関係は、図10（ａ），図10（ｂ）ともに
同じである。このため、行間・字間といった幾何学的な
情報を用いて、章題・節題を抽出することは難しい。一
方、章題・節題に用いる文字フォントは、本文中の文字
フォントよりも大きったり、あるいは、種類が違ってい
たりする。文字のフォント情報を用いて前述の構造解析
処理を行えば、より強力な抽出能力を持たせることが可
能となる。Further, when the line-by-line extraction is performed, the character recognition process is also performed on the process target line. This is because it may not be possible to extract chapter / section titles only with global format information such as line spacing. A method for coping with such a case will be described with reference to FIG. FIG. 10 is an example of a document in which the logical structure cannot be separated and extracted only by general format information such as line spacing and character spacing. Rectangular area 10 shown in FIG. 10 (a)
01 indicates the last line of the paragraph, rectangular region 1002 indicates the title line, rectangular region 1003 indicates the first line of the first paragraph of the next chapter, and rectangular region 1004 indicates the rectangular region.
This is an area indicating a line following the line indicated by 1003. A rectangular area 1012 shown in FIG. 10 (b) is an area showing the last line of the paragraph, a rectangular area 1011 is an area showing a row before the row shown by the rectangular area 1012, and a rectangular area 1013 is shown by rectangular areas 1011 and 1012. The rectangular area 1014 is an area indicating the first row of the next paragraph included in the same chapter as the line, and the rectangular area 1014 is an area indicating the row following the line indicating the rectangular area 1013. The character string in the rectangular area 1001 in FIG. 10 (a) indicates a chapter, and the rectangular area 1002 in FIG. 10 (b) indicates the end of the paragraph. However, the geometrical positional relationship of each rectangular area is the same in both FIG. 10 (a) and FIG. 10 (b). For this reason, it is difficult to extract chapter and section titles using geometric information such as line spacing and character spacing. On the other hand, the character fonts used for chapter and section titles are larger or different in type than the character fonts in the text. If the above structural analysis processing is performed using the font information of characters, it becomes possible to have a stronger extraction capability.

以上の処理を行うことによって、論理構造の要素である
章や節などの領域を抽出することが可能となる。これを
ファイルに格納するためには、取り出した領域の階層関
係を取り出すことが必要である。この関係を取り出し、
木構造データとしてファイルに格納する処理は論理構造
生成部で行われる。By performing the above processing, it is possible to extract the areas such as chapters and sections that are elements of the logical structure. In order to store this in a file, it is necessary to extract the hierarchical relation of the extracted area. Take out this relationship,
The process of storing the tree structure data in the file is performed by the logical structure generation unit.

次に、論理構造生成部の詳細について述べる。Next, details of the logical structure generation unit will be described.

第11図は典型的な文書の形式を示しており、通常、第11
図（ａ），（ｂ）のように２ページにまたがっている。
ここで、1101は章、1102はその章に含まれる章題・同様
に、1103は章1101に含まれるパラグラフを示している。
また、1104は章題、1105,1107,1108はパラグラフを示
し、1106,1109はそれぞれ、図を示している。従来の技
術では、これらの領域を分割するために例えば公知のFD
Lという文法手段を用いて、（defform章ブロック・・・・・・（form章題ブロック（...））・・・・・・（form章題ブロック（...））・・・・・・）として、章を章題及びパラグラフの包含関係を前述すれ
ば、割付け構造と論理構造の両方の関係を記述したこと
になる。しかし、章題1104とパラグラフ1105,1107,1108
はページにまたがっているためにFDLでは章として記述
することができない。なぜならば、FDLでは文書の物理
的な配置だけしか記述できないからである。従って、ペ
ージやカラムなどの物理的に離れた領域にまたがった論
理構造の要素を連結する必要がある。Figure 11 shows a typical document format, usually
It spans two pages as shown in FIGS.
Here, 1101 indicates a chapter, 1102 indicates a title included in the chapter, and 1103 indicates a paragraph included in the chapter 1101.
Also, 1104 indicates a chapter, 1105, 1107, 1108 indicate paragraphs, and 1106, 1109 indicate figures. In the prior art, in order to divide these areas, for example, known FD
Using the grammar means of L, (defform chapter block ..... (form chapter block (...)) ..... (form chapter block (...)) ...・・) If the chapter describes the inclusion relations of the chapter title and paragraphs, it means that the relation between both the allocation structure and the logical structure is described. However, chapter 1104 and paragraphs 1105,1107,1108
Cannot span chapters in FDL because it spans pages. This is because FDL can describe only the physical layout of the document. Therefore, it is necessary to connect the elements of the logical structure extending over physically separated areas such as pages and columns.

論理構造抽出部では、ページ画像を入力し、本文テキス
ト部分，段，章題，章，節という順番に領域を分割して
いく。ここで、論理構造として必要な部分は、章，節で
あり、パージやカラムは割り付け構造の要素である。こ
のときに抽出した領域を図11に示す。In the logical structure extraction unit, a page image is input, and the area is divided in the order of text body part, column, chapter title, chapter, section. Here, necessary parts as a logical structure are chapters and sections, and purge and columns are elements of an allocation structure. The area extracted at this time is shown in FIG.

第12図は、文書クラスの論理構造の一例を示したもので
ある。この図では、「本文1201は章1202、参考文献1206
から構成される」、「章1202は章題1203、節1204、説明
文付図1206から構成される」、さらに「説明文付図1206
は図1207と説明文1208から構成される」という構造の階
層的な上下関係を表している。この構造の各要素は、文
書に固有の概念ではなく、「章」「節」など複数の文書
に共通な概念を示している。本実施例では、この共通論
理構造を表現する言語を設定し、この言語を用いて共通
論理構造関係表への登録を容易にした。例えば、第12図
の共通論理構造は次のように表現できる。FIG. 12 shows an example of the logical structure of a document class. In this figure, the text 1201 is chapter 1202, reference 1206.
”,“ Chapter 1202 consists of chapter title 1203, section 1204, explanatory notes 1206 ”, and“ Explanatory notes 1206
Is composed of FIG. 1207 and description 1208 ”. Each element of this structure indicates a concept common to a plurality of documents such as “chapter” and “section”, not a concept unique to the document. In this embodiment, a language for expressing this common logical structure is set, and this language is used to facilitate registration in the common logical structure relation table. For example, the common logical structure of FIG. 12 can be expressed as follows.

（deflogic本文（consist-of（章参考文献）））（deflogic章（consist-of（章題節説明文付図）））（deflogic説明文付図（consist-of（図説明文）））（defform章ブロック（logical章）・・・・・・）（deffom章の続きブロック（logical章continued））・・・・・・）文書が複数ページにわたる場合には、分離した論理構造
の要素の関係を取り出すことができなかった。(Deflogic main text (consist-of (chapter reference))) (deflogic chapter (consist-of (chapter description with explanatory text))) (deflogic explanatory text (consist-of (figure description))) (defform chapter Block (logical chapter) ・・・・・・) (Continued block of the deffom chapter (logical chapter continued)) ・・・・・・) When the document spans multiple pages, the relationship between the elements of the separated logical structure is extracted. I couldn't.

このようにすることで、領域分割の手続きだけを用いて
論理構造抽出のための手続きを記述する場合よりも簡単
に記述することが可能である。By doing so, it is possible to describe more easily than the case of describing the procedure for extracting the logical structure using only the area division procedure.

第13図は、この関係を共通論理構造表に登録したところ
示したものである。この共通論理構造表は、共通論理構
造の親子関係を示した表であり、第12図のリンク1212か
ら1217までと第13図の表の部分1212から1217が、それぞ
れ対応している。例えば、第12図の本文と章の関係を示
すリンク1211は、第13図では、「本文が親であり、章が
子となる」ことを示している。FIG. 13 shows the relationship registered in the common logical structure table. This common logical structure table is a table showing the parent-child relationship of the common logical structure, and links 1212 to 1217 in FIG. 12 correspond to the parts 1212 to 1217 in the table in FIG. 13, respectively. For example, the link 1211 indicating the relationship between the text and the chapter in FIG. 12 indicates that “the text is a parent and the chapter is a child” in FIG.

論理構造領域抽出部で得られた結果は、第14図に示す特
定論理構造関係表1401、特定論理構造数カウント表140
2、共通・特定論理構造関係表1403、オブジェクト管理
表1404、の各表に登録される。特定論理構造関係表1401
には、入力した文書に特定の文書構造が親と子の関係で
格納される。特定論理構造数カウント表1402には、共通
論理構造とその共通論理構造に対応する特定論理構造の
数を格納する。この特定論理構造数のカウント表1402
の、カウント数は特定論理構造の名前を作成するときに
用いる。また、共通・特定論理構造関係表1403は、スタ
ックになっており、抽出した特定論理構造の要素名称を
対応する共通論理構造の所に格納する。オブジェクト管
理表1404には特定論理構造の要素名称と抽出した画像を
識別するための名前と矩形領域を表現するために必要な
２点の座標を示している。The results obtained by the logical structure area extraction unit are the specific logical structure relation table 1401 and the specific logical structure number count table 140 shown in FIG.
2. Registered in each of the common / specific logical structure relation table 1403 and the object management table 1404. Specific logical structure relationship table 1401
Stores a specific document structure in the input document in a parent-child relationship. The specific logical structure number count table 1402 stores the common logical structure and the number of specific logical structures corresponding to the common logical structure. Count table of this specific logical structure number 1402
The count number is used when creating the name of a specific logical structure. The common / specific logical structure relation table 1403 is a stack, and the element names of the extracted specific logical structure are stored in the corresponding common logical structure. The object management table 1404 shows the element name of the specific logical structure, the name for identifying the extracted image, and the coordinates of two points necessary for expressing the rectangular area.

論理構造生成部では上述の表1401,1402,1403,1404を次
のステップに従って埋めていく。The logical structure generation unit fills the above tables 1401, 1402, 1403, 1404 according to the following steps.

（１）まず、共通論理構造の最上位の要素名のインスタ
ンス生成処理を行う。インスタンスの生成処理は、共通
論理構造の要素名に番号を付けして、新しい名前を生成
し、共通・特定論理構造関係表に名前を登録することで
ある。例えば、第13図の例では、共通論理構造要素名の
最上位が「本体」であることがわかるので、特定論理構
造要素名として「本体＃１」を付加する。次に、共通・
特定論理構造関係表の親の欄に「本体」を子の欄に生成
した名前である「本体＃１」を登録する。(1) First, an instance generation process of the highest element name of the common logical structure is performed. The instance generation processing is to number the element names of the common logical structure, generate a new name, and register the name in the common / specific logical structure relation table. For example, in the example of FIG. 13, it can be seen that the highest level of the common logical structure element name is "body", so "body # 1" is added as the specific logical structure element name. Next, common
In the parent column of the specific logical structure relation table, "body" is registered in the child column, and the generated name "body # 1" is registered.

FDLの各フォームを先頭から呼び出し、論理構造の指定
があったフォームに対して、次の処理を行なう。Each form of FDL is called from the beginning, and the following process is performed for the form for which the logical structure is specified.

論理構造の指定は、例えば、次のようにして行なう。The logical structure is specified, for example, as follows.

（defform章ブロック（logical章）・・・・・・）（defform章の続きブロック（logical章continued）・・・・・・）（２）FDLの各フォームで、論理構造の指定があった場
合には、指定した共通論理構造要素名に対応するインス
タンスを新しく生成する。次に、指定した共通論理構造
要素名を共通論理構造表から、この共通論理構造要素名
の親の名前を表引きする。この親の最新の子供を共通・
特定論理構造関係表から探し、対応する特定論理構造要
素と新しく生成したインスタンスを特定論理構造要素関
係表に、それぞれ、親と子の関係で登録する。例えば、
指定された論理構造要素名が「節」であった場合には、
親は「章」であることが共通論理構造関係表からわか
る。第13図の共通・特定論理構造関係表1301から、この
章の最新インスタンスが「章＃１」であることがわか
り、特定論理構造関係表1401には、「章＃１」と「節＃
２」がそれぞれ、親子関係として登録される。(Defform chapter block (logical chapter) ・・・・・・) (Continued block of defform chapter (logical chapter continued) ・・・・・・) (2) When the logical structure is specified in each FDL form , A new instance corresponding to the specified common logical structure element name is newly generated. Next, the designated common logical structure element name is looked up from the common logical structure table to the parent name of this common logical structure element name. The latest child of this parent is common
The specific logical structure relation table is searched, and the corresponding specific logical structure element and the newly created instance are registered in the specific logical structure element relation table in the parent-child relationship. For example,
If the specified logical structure element name is "section",
It can be seen from the common logical structure relation table that the parent is a “chapter”. It can be seen from the common / specific logical structure relation table 1301 in FIG. 13 that the latest instance of this chapter is “chapter # 1”, and the specific logical structure relation table 1401 shows “chapter # 1” and “section # 1”.
2 ”is registered as a parent-child relationship.

（３）また、論理構造の指定で論理構造の続きであると
わかった場合には、新しくインスタンスを生成せずに、
抽出した領域を指定した論理構造要素を最新のインスタ
ンスの領域として新しくオブジェクト管理表に登録す
る。(3) If the logical structure is specified to be a continuation of the logical structure, a new instance is not generated,
A logical structure element that specifies the extracted area is newly registered in the object management table as the latest instance area.

以上の処理の結果、共通・特定論理構造関係表1401に
は、共通論理構造に対応する特定論理構造の要素が登録
され、特定論理構造関係表には、各文書の包含関係が登
録される。また、オブジェクト管理表1403には、分割し
た領域とそれに対応する特定論理構造要素名1403がそれ
ぞれ登録される。As a result of the above processing, the element of the specific logical structure corresponding to the common logical structure is registered in the common / specific logical structure relationship table 1401, and the inclusion relation of each document is registered in the specific logical structure relationship table. Further, in the object management table 1403, the divided areas and the specific logical structure element names 1403 corresponding thereto are registered respectively.

[Brief description of drawings]

第１図は本発明の方式の一実施例を示す機能ブロック
図、第２図は本発明の方式で対象とする入力文書の論理
構造をODA/ODIFで表現した図、第３図は文書の章題・節
題に関連する書式を示す図、第４図は書式定義言語FDL
で章題・節題に対する書式を表現した図、第５図は本発
明の方式を用いたシステムを示す機能ブロック図、第６
図は本発明の方式を説明するためのPAD図、第７図は論
理構造の抽出を説明するためのPAD図、第８図は章・節
の構造化処理を説明するためのPAD図、第９図はパラグ
ラフの分離処理を説明するためのPAD図、第10図は幾何
学的な情報を用いただけでは論理構造を抽出できない文
書を示す図、第11図は論理構造生成部で論理構造を生成
する文書の一例、第12図は共通論理構造の一例を示す
図、第13図は共通論理構造の親子関係を表形式で示した
図、第14図は特定論理構造と共通・特定論理構造関係を
生成するために必要な表形式を示す図。 110…カラー領域抽出部、111…カラー補正部、112…カ
ラー画像圧縮部、120…２値化処理部、130…書誌事項抽
出部、140…図表領域抽出部、141…インデックス情報抽
出部、142…線画認識部、150…本文領域抽出部、151…
文字認識部、152…単語照合部、160…論理構造抽出部、
170…論理構造生成部。FIG. 1 is a functional block diagram showing an embodiment of the system of the present invention, FIG. 2 is a diagram expressing the logical structure of an input document targeted by the system of the present invention in ODA / ODIF, and FIG. 3 is a document. Figure showing the format related to chapters and sections, Figure 4 is the format definition language FDL
FIG. 5 is a diagram showing a format for a chapter / section, and FIG. 5 is a functional block diagram showing a system using the method of the present invention.
FIG. 7 is a PAD diagram for explaining the method of the present invention, FIG. 7 is a PAD diagram for explaining extraction of a logical structure, FIG. 8 is a PAD diagram for explaining structuring processing of chapters / sections, and FIG. 9 is a PAD diagram for explaining the paragraph separation process, FIG. 10 is a diagram showing a document in which the logical structure cannot be extracted only by using geometric information, and FIG. 11 is a diagram showing the logical structure in the logical structure generator. An example of a generated document, FIG. 12 is a diagram showing an example of a common logical structure, FIG. 13 is a diagram showing a parent-child relationship of the common logical structure in a table format, and FIG. 14 is a specific logical structure and a common / specific logical structure. The figure which shows the tabular form required in order to generate a relationship. 110 ... Color area extraction section, 111 ... Color correction section, 112 ... Color image compression section, 120 ... Binarization processing section, 130 ... Bibliographic matter extraction section, 140 ... Chart area extraction section, 141 ... Index information extraction section, 142 ... Line drawing recognition section, 150 ... Body area extraction section, 151 ...
Character recognition unit, 152 ... Word matching unit, 160 ... Logical structure extraction unit,
170 ... Logical structure generation unit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤澤浩道東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (56)参考文献特開昭63−201867（ＪＰ，Ａ) ─────────────────────────────────────────────────── ─── Continuation of front page (72) Hiromichi Fujisawa Inventor Hiromichi 1-280, Higashi Koigokubo, Kokubunji, Tokyo Inside Central Research Laboratory, Hitachi, Ltd. (56) Reference JP-A-63-201867 (JP, A)

Claims

[Claims]

1. A multimedia document structuring method for extracting a layout structure and a logical structure peculiar to a document from a multimedia document, the input device inputting the multimedia document as a digital image, which is common to the input document. A first grammar expression storing means for storing an expression written in a first grammar that describes a logical structure hierarchically, and a logical structure such as an extracted chapter / section is extracted as a parent-child relationship,
A means for storing the extracted information in a file in a table format; a second means for storing an expression written in a second grammar that describes the input document as a set of a plurality of rectangular areas;
Grammar expression storage means for storing, in the second grammar, variables that represent absolute or relative sizes of rectangular areas and absolute or relative relationships between the rectangular areas, and The second area is selected from the rectangular area including a description of a search method.
Means for searching a rectangular area specified by an expression written according to the grammar of, and assigning a value determined from the search result to a variable in the expression,
A multimedia document structuring method having means for dividing an area based on an unsolved result of analysis of the same variable.

2. The multimedia document structuring system according to claim 1, wherein one of the elements in the logical structure expressed by the first grammar expression means in the second grammar and the element A multimedia document structuring system characterized in that it has means for associating the divided areas in the second grammar expressing means with one another and combining them into one logical structure.

3. The multimedia document structuring system according to claim 1, wherein the means for storing in the file further includes: for homogeneous data such as texts, images and charts in the document, Having feature extraction means suitable for data,
The multimedia document structuring method, wherein the result of the extracting means is extracted as a rectangular area and attribute information.

4. The multimedia document structuring system according to claim 1, wherein rectangular area extracting means for extracting a rectangular area from the input digital image, layout information such as line spacing, time, and column of the input document. Is represented by relative coordinate values of the rectangular area obtained by the rectangular area extracting means, and means for estimating a logical structure such as a chapter / section from the input document using the layout information. Multimedia document structuring method.

5. The multimedia document structuring method according to claim 1, wherein an area expressing a logical structure obtained by using the logical structure estimating means is extracted as a rectangular area from the input digital image. A multimedia document structuring method characterized by comprising rectangular area extracting means.

6. The multimedia document structuring method according to claim 5, wherein the rectangular area extracting unit further includes inputting the input digital image when the input digital image is expressed in color. A multimedia document structuring system characterized by having a means for separating an area of a single color and an area of mixed colors.

7. The multimedia document structuring system according to claim 5, wherein the rectangular area extracting means further comprises: when the input digital image is represented by a gray image of a single color. A method for structuring a multimedia document, comprising means for extracting, as a rectangular area, an area in which the degree of shading is not clear from an input digital image.