JP2002202973A

JP2002202973A - Structured document management device

Info

Publication number: JP2002202973A
Application number: JP2001291628A
Authority: JP
Inventors: Takashi Shimojima; 崇下島; Masao Ito; 正雄伊藤; Takeshi Tsurubayashi; 健鶴林; Osamu Katayama; 修片山; Shinichi Nakai; 信一中井
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2000-10-25
Filing date: 2001-09-25
Publication date: 2002-07-19
Anticipated expiration: 2021-09-25
Also published as: JP3632643B2

Abstract

(57)【要約】【課題】様々な論理構造を指定した検索をすることの
できる構造化文書装置を提供する。【解決手段】構造化文書を扱う文書管理システムにお
いて、論理構造位置を特定するための情報を、最上位階
層から順にタグ名を連ねて記述したパス名称と、パス名
称の各階層の出現順序を連ねて記述したパス階層で管理
することにより、様々な構造化文書検索を実現すること
ができる。 (57) [Summary] [PROBLEMS] To provide a structured document device capable of performing a search specifying various logical structures. SOLUTION: In a document management system for handling structured documents, information for specifying a logical structure position includes a path name in which tag names are described in order from the highest hierarchy, and an appearance order of each hierarchy of the path name. Various structured document searches can be realized by managing in a path hierarchy described in succession.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ＳＧＭＬやＸＭＬ
などの論理的な構造要素を有する構造化文書を計算機を
用いて管理する文書管理システムにおける、論理構造を
指定した検索を行なう構造化文書検索方法に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to SGML and XML
The present invention relates to a structured document search method for performing a search specifying a logical structure in a document management system for managing a structured document having a logical structural element using a computer.

【０００２】[0002]

【従来の技術】電子化文書の増大に伴い、マニュアル、
議事録、仕様書等、論理的構造を有する文書を扱う構造
化文書に対する関心が高まっている。それにより、文書
内容のみによる検索だけでなく、構造化文書の特長を生
かした、論理構造を指定した検索を行なう機能が重要と
なる。構造化文書はその論理構造がＤＴＤ(Ｄｏｃｕｍ
ｅｎｔＴｙｐｅＤｅｆｉｎｉｔｉｏｎ：文書型定義)
によって定義される。2. Description of the Related Art With the increase in electronic documents, manuals,
There is an increasing interest in structured documents that handle documents having a logical structure, such as minutes and specifications. Therefore, not only the search based on the document content but also the function of performing the search specifying the logical structure by utilizing the features of the structured document is important. The logical structure of a structured document is DTD (Docum).
ent Type Definition: Document type definition)
Defined by

【０００３】従来、構造化文書管理システムにおける文
書の検索装置としては、特開平１０−２４０７５２号公
報（以下、公知例と呼ぶ）に記載された発明が知られて
いる。Conventionally, as a document retrieval apparatus in a structured document management system, an invention described in Japanese Patent Application Laid-Open No. 10-240752 (hereinafter referred to as a known example) is known.

【０００４】以下、公知例の概要について説明する。そ
の文書登録システムの構成図は図３３に示すとおりであ
る。公知例では登録する際、まず文書構造解析プログラ
ム３３０１にて登録対象文書の持つ論理構造を解析し
て、解析済み文書データを作成し解析済み文書データ格
納領域３３０５に登録する。The outline of the known example will be described below. The configuration diagram of the document registration system is as shown in FIG. In registration in the known example, first, the document structure analysis program 3301 analyzes the logical structure of the document to be registered, creates analyzed document data, and registers it in the analyzed document data storage area 3305.

【０００５】次に、構造インデックス作成プログラム３
３０２にて各登録対象文書の持つ論理構造を、登録順に
従って順次重ね合わせ、文書中における出現位置および
種別が同じである要素群は単一のメタ要素によって代表
させ、文書中における出現位置が同じである文字列デー
タ群は単一のメタ文字列データによって代表させること
により、メタ要素群およびメタ文字列データ群（公知例
ではこれらを総称してメタノードと呼ぶ）の木構造から
構成される構造インデックスを生成し該構造インデック
スを構成する全てのメタノードに対して、それらを構造
インデックスの中で一意に識別する識別子（公知例では
これを文脈識別子と呼ぶ）を与え、構造インデックス格
納領域３３０６に登録する。Next, a structure index creation program 3
At 302, the logical structures of the documents to be registered are sequentially superimposed according to the registration order, and the elements having the same appearance position and type in the document are represented by a single meta element, and the appearance positions in the document are the same. Is represented by a single meta-character string data, thereby forming a meta-element group and a meta-character string data group (in a known example, these are collectively called a metanode). An index is generated, and an identifier (in a well-known example, this is called a context identifier) for uniquely identifying each of the metanodes constituting the structure index is assigned to the metanode and registered in the structure index storage area 3306. I do.

【０００６】図３４は上記構造インデックスを作成する
過程を示す図である。図３４において、文書１、文書
２、文書３は、それぞれ登録対象文書の解析済み文書デ
ータを表わしている。これらの解析済み文書データの構
造を既存の構造インデックス上に順次重ね合わせること
により、構造インデックスが形成されていく。まず最初
に文書１が入力されると、最初の段階では構造インデッ
クスは初期状態（空）であるため、該解析済みデータと
等価な木構造が生成されてそのまま構造インデックスに
登録され、構造インデックスは３４０１に示す状態とな
る。新たに生成されたメタ要素にはＥ１からＥ５までの
文脈識別子、新たに生成されたメタ文字列データにはＣ
１からＣ３までの文脈識別子が割り当てられる。次に文
書２が入力されると、既存の構造インデックス（３４０
１）と構造が重複する部分については何も行わず、３４
０１上に対応する部分がなかった部分構造（図中の網掛
け部分）だけが新たに登録される。新たに生成されたメ
タ要素には文脈識別子Ｅ６およびＥ７、新たに生成され
たメタ文字列データには文脈識別子Ｃ４が割り当てられ
る。次に文書３が入力されると、既存の構造インデック
ス（３４０２）と構造が重複する部分については何も行
わず、３４０２上に対応する部分がなかった部分構造
（図中の網掛け部分）だけが新たに登録される。新たに
生成されたメタ要素には文脈識別子Ｅ８、Ｅ９およびＥ
１０、新たに生成されたメタ文字列データには文脈識別
子Ｃ５およびＣ６が割り当てられる。このようにして、
３個の文書が登録された段階で、構造インデックスは３
４０３に示す状態となる。FIG. 34 is a diagram showing a process of creating the structure index. In FIG. 34, a document 1, a document 2, and a document 3 each represent analyzed document data of a registration target document. The structure index is formed by sequentially superimposing the structure of the analyzed document data on the existing structure index. First, when document 1 is input, since the structure index is in the initial state (empty) at the initial stage, a tree structure equivalent to the analyzed data is generated and registered as it is in the structure index. The state shown in FIG. The newly generated meta element has context identifiers from E1 to E5, and the newly generated meta character string data has C
Context identifiers from 1 to C3 are assigned. Next, when document 2 is input, the existing structure index (340
No action is taken for the part whose structure overlaps with 1), and 34
Only the partial structure (the shaded portion in the figure) in which no corresponding portion exists on 01 is newly registered. The context identifiers E6 and E7 are assigned to the newly generated meta element, and the context identifier C4 is assigned to the newly generated meta character string data. Next, when the document 3 is input, nothing is performed on a portion whose structure overlaps with the existing structure index (3402), and only the partial structure (shaded portion in the figure) having no corresponding portion on the 3402 Is newly registered. Context identifiers E8, E9 and E
10. Context identifiers C5 and C6 are assigned to the newly generated meta-character string data. In this way,
When three documents are registered, the structure index becomes 3
The state shown in 403 is obtained.

【０００７】次に、構造化全文データ生成プログラム３
３０３にて各登録対象文書について、その文書に対応す
る解析済み文書データ中に含まれるすべての文字列と、
その文字列を構造インデックス中で示される文脈識別子
との対応関係の定義から構成されるデータ（公知例では
これを構造化全文データと呼ぶ）を生成し、構造化全文
データ格納領域３３０７に登録する。Next, a structured full-text data generating program 3
At 303, for each document to be registered, all the character strings included in the analyzed document data corresponding to the document,
Data composed of the definition of the correspondence between the character string and the context identifier indicated in the structure index (this is called structured full-text data in a known example) is generated and registered in the structured full-text data storage area 3307. .

【０００８】次に、文字列インデックス作成プログラム
３３０４にて、各登録対象文書に対応する構造化全文デ
ータから、前記文脈識別子を含んだ全文検索を行なうた
めの文字列インデックスを作成し、文字列インデックス
格納領域３３０８に登録する。Next, a character string index for performing a full-text search including the context identifier is created from the structured full-text data corresponding to each registration target document by a character string index creating program 3304, and the character string index is created. Register in the storage area 3308.

【０００９】図３５は、文字列インデックスの例を示し
たものであり、部分文字列（３４０４）を２文字とした
場合の例を示している。各部分文字列に対して該部分文
字列を含む文書を識別する文書識別子（３４０５）、該
文書中において前記部分文字列を含む文字列データの文
書構造中における位置を識別する文脈識別子（３４０
６）、文書中における前記部分文字列の文字位置（３４
０７）から構成されている。なお、図中の“Ｘ”は文字
列の直前に位置する文字の位置を“Ｘ”として相対的な
文字位置を示している。FIG. 35 shows an example of a character string index, in which the partial character string (3404) is made up of two characters. For each partial character string, a document identifier (3405) for identifying a document containing the partial character string, and a context identifier (340) for identifying the position in the document structure of the character string data containing the partial character string in the document
6), the character position of the partial character string in the document (34)
07). Note that "X" in the figure indicates a relative character position where the position of the character located immediately before the character string is "X".

【００１０】また、公知例における検索は、まず前記構
造インデックスを参照し、指定された構造条件を満たす
文脈識別子の集合を決定する。In the search in the known example, first, a set of context identifiers satisfying a specified structure condition is determined by referring to the structure index.

【００１１】次に、それらの文脈識別子をキーとして文
字列を検索することにより、指定条件を満たす文書群を
求める。Next, a document group that satisfies the designated condition is obtained by searching a character string using the context identifier as a key.

【００１２】また、公知例における登録の際に、例えば
強調表示のような非構造的要素（ＭｉｘｅｄＣｏｎｔ
ｅｎｔと呼ぶ：詳細は実施の形態３で説明する）が含ま
れる場合、該構造を無視して文字列インデックスを作成
する。In addition, at the time of registration in a known example, non-structural elements such as highlighting (Mixed Content) are used.
ent: the details will be described in the third embodiment), and the character string index is created ignoring the structure.

【００１３】[0013]

【発明が解決しようとする課題】上記従来技術の方法で
は、図３５に示すように全文検索を行なうための文字列
インデックス内に、登録文書を識別する文書識別子と、
論理構造に関する情報である文脈識別子と、文字連鎖の
位置を示す文字位置という３要素のデータを含んでいる
ため、前記文字列インデックスのサイズが大きくなり、
そのためメモリ量が増大し、装置のコストアップにつな
がるという課題を有していた。According to the above-mentioned prior art method, as shown in FIG. 35, a document identifier for identifying a registered document is stored in a character string index for performing full-text search.
Since it includes three-element data of a context identifier which is information on a logical structure and a character position indicating a position of a character chain, the size of the character string index increases,
Therefore, there is a problem that the amount of memory increases, which leads to an increase in the cost of the device.

【００１４】また上記従来技術の方法では、図３５に示
すように文字列インデックス内の各文字連鎖に論理構造
に関する情報である文脈識別子を含んでいるため、複数
の登録文書の１つについて要素実体を追加、変更したこ
とにより、複数の登録文書の論理構造を順次重ね合わせ
ることによって形成される構造インデックス（図３４）
が変化した場合、文字列インデックスの文脈識別子を更
新する必要が発生し、要素実体の文字連鎖数が膨大の場
合、処理量も膨大になるという課題を有していた。Further, in the above-mentioned conventional method, as shown in FIG. 35, each character chain in the character string index includes a context identifier which is information on a logical structure. Is added or changed, a structure index formed by sequentially superimposing the logical structures of a plurality of registered documents (FIG. 34)
Has changed, the context identifier of the character string index needs to be updated. If the number of character chains of the element entity is enormous, the processing amount becomes enormous.

【００１５】以下、この課題について具体例を通して詳
細に説明する。Hereinafter, this problem will be described in detail through specific examples.

【００１６】図３６は２つの文書が登録されている場合
の例で、このうち１つの登録文書を変更する例を示して
いる。この例では、文書１と文書２の論理構造は同一で
あるので、作成される構造インデックスの論理構造も文
書１又は２と同一である。この例では、文書２の第１章
と第２章の間に新たな章を追加して３つの章から成る文
書に変更する例を示している。すなわち文書２に新たに
第２章となるブロック（図３６の４０００）を追加する
例を示している。このとき、変更前に第２章であったブ
ロック（図３６の４００１）が第３章となるが、変更前
の構造インデックスには、文書１、２とも第２章までし
かなかった為、文書２の第３章に相当する文脈識別子は
存在しない（図３６の変更前構造インデックス）。そこ
で、図３６（変更後構造インデックス）に示すように構
造インデックスを更新する必要がある。FIG. 36 shows an example in which two documents are registered, in which one registered document is changed. In this example, since the logical structures of the document 1 and the document 2 are the same, the logical structure of the created structure index is also the same as that of the document 1 or 2. In this example, an example is shown in which a new chapter is added between the first and second chapters of the document 2 and the document is changed to a document including three chapters. That is, an example in which a block (4000 in FIG. 36) serving as a second chapter is newly added to the document 2 is shown. At this time, the block (4001 in FIG. 36), which was the second chapter before the change, becomes the third chapter. There is no context identifier corresponding to the third chapter of No. 2 (structure index before change in FIG. 36). Therefore, it is necessary to update the structure index as shown in FIG. 36 (changed structure index).

【００１７】図３６の更新後の構造インデックスに示す
ように、文書２で新たに第３章となった要素実体に対応
する文脈識別子は‘Ｃ４’となっている。しかし変更
前、前記要素実体に対応する文脈識別子は‘Ｃ３’であ
ったので、前記要素実体の文字列インデックスに保持さ
れている各文字連鎖の文脈識別子を‘Ｃ３’から‘Ｃ４
に変更する必要がある。例えば、文書２の第３章に相当
する要素実体が１００文字から構成されているとする
と、２文字連鎖で索引を作成する場合、９９個の文字連
鎖について文脈識別子を変更する必要がある。このよう
に要素実体の文字連鎖数に応じて処理量も大きくなって
しまうという課題を有していた。As shown in the structure index after the update in FIG. 36, the context identifier corresponding to the element entity which newly becomes the third chapter in the document 2 is 'C4'. However, since the context identifier corresponding to the element entity was “C3” before the change, the context identifier of each character chain held in the character string index of the element entity was changed from “C3” to “C4”.
Need to be changed to For example, assuming that the element entity corresponding to the third chapter of the document 2 is composed of 100 characters, when creating an index with a two-character chain, it is necessary to change the context identifier for 99 character chains. As described above, there is a problem that the processing amount increases in accordance with the number of character chains of the element entity.

【００１８】なお、変更後に第２章となった要素実体に
新たな文脈識別子‘Ｃ４’を付与し、変更前第２章で変
更後第３章となる要素実体にはそのままの文脈識別子
‘Ｃ３とする更新方法も考えられるが、この場合は文書
１の第２章に相当する要素実体の文字列インデックスの
文字連鎖について、文脈識別子を‘Ｃ３’から‘Ｃ４’
に変更する必要がある。この例では登録文書が２つなの
で、上述の方法と更新にかかる処理量は同一であるが、
登録文書の数が増加した場合、第２章を有する全ての登
録文書の要素実体について、その文字列インデックスを
文脈識別子を‘Ｃ３’から‘Ｃ４’に変更する必要があ
るため、かえって処理量が増加してしまう結果になる。It is to be noted that a new context identifier “C4” is given to the element entity that has become the second chapter after the change, and the context identifier “C3” has been added to the element entity that has become the third chapter after the change in the second chapter before the change. In this case, the context identifier is changed from “C3” to “C4” for the character chain of the character string index of the element entity corresponding to Chapter 2 of Document 1.
Need to be changed to In this example, since there are two registered documents, the processing amount required for updating is the same as that of the above method.
When the number of registered documents increases, the character string index of the element entity of all the registered documents having Chapter 2 needs to be changed from "C3" to "C4" in the context identifier, and the processing amount is rather reduced. The result is an increase.

【００１９】また別の課題として、従来技術の構造イン
デックスは図３４に示すように複数の登録文書の論理構
造を順次重ね合わせることによって形成されるので、登
録文書の論理構造がほぼ同一の場合は新たに文脈識別子
を付与する機会は少ないが、各登録文書の論理構造が大
きく異なる場合は論理構造の重なりが少なくなり、この
ような論理構造が異なる登録文書が膨大に登録された場
合は、文脈識別子の数が膨大になるという課題を有して
いた。Another problem is that the structure index of the prior art is formed by sequentially superposing the logical structures of a plurality of registered documents as shown in FIG. 34. Therefore, when the logical structures of the registered documents are substantially the same, Although there is little opportunity to add a new context identifier, if the logical structure of each registered document is significantly different, the overlapping of logical structures will be reduced.If a large number of registered documents with such different logical structures are registered, the context There was a problem that the number of identifiers would be enormous.

【００２０】また従来技術の構造インデックスは、図３
４に示すように複数の登録文書の論理構造を順次重ね合
わせることによって形成されるので、この方法により形
成される構造インデックスには、１つの親ノードから同
一のタグ名を有する子ノードが複数出ている構造も発生
する場合がある。このとき検索範囲として或るタグ名を
指定した場合、各ノードのタグ名が該当するタグ名であ
るか否かをチェックする必要があるが、たとえ上記のよ
うに１つの親ノードから同一のタグ名を有する子ノード
が複数出ていたとしても、各子ノードの１つ１つについ
て該当するタグ名を有するノードであるか否かをチェッ
クするＯＲ検索が必要の為、検索が遅くなるという課題
を有していた。The structure index of the prior art is shown in FIG.
As shown in FIG. 4, the logical index is formed by sequentially overlapping the logical structures of a plurality of registered documents. Therefore, in the structure index formed by this method, a plurality of child nodes having the same tag name appear from one parent node. Structure may occur. At this time, when a certain tag name is specified as the search range, it is necessary to check whether the tag name of each node is the corresponding tag name. Even if a plurality of child nodes having a name appear, an OR search for checking whether each child node has a corresponding tag name is necessary, and the search becomes slow. Had.

【００２１】また上記従来の方法では、要素実体である
“段落”要素中にＭｉｘｅｄＣｏｎｔｅｎｔとして
“キーワード”要素を含むような場合、“キーワード”
タグの構造を無視して文字列インデックスを作成するた
め、「“キーワード”タグの中に“○○”を含む文書」
というような検索条件に対応できないという課題を有し
ていた。In the above-mentioned conventional method, when a "keyword" element is included as a Mixed Content in a "paragraph" element which is an element entity, the "keyword"
To create a character string index ignoring the structure of the tag, "Documents containing" XX "in the" keyword "tag"
There is a problem that such search conditions cannot be met.

【００２２】本発明は上記従来技術の課題を解決するも
ので、構造化文書を対象とした全文検索において、様々
な論理構造指定検索に対応すること、さらに検索用索引
のサイズ削減、文書の一部変更・一部削除時における検
索用索引の変更作業の簡易化、中間ノード以下を指定し
た高速な検索、そしてＭｉｘｅｄＣｏｎｔｅｎｔにま
たがる検索、およびＭｉｘｅｄＣｏｎｔｅｎｔである
要素を指定した検索を行なうことを目的とする。The present invention solves the above-mentioned problems of the prior art. In a full-text search for a structured document, the present invention supports various logical structure designation searches, further reduces the size of a search index, and reduces the size of a document. The purpose is to simplify the work of changing the search index at the time of part change / partial deletion, to perform high-speed search specifying intermediate nodes and below, to search across Mixed Content, and to perform search by specifying elements that are Mixed Content. And

【００２３】[0023]

【課題を解決するための手段】上記課題を解決するため
に、請求項１では各要素実体を識別する検索単位識別子
と、各要素実体の前記木構造における位置を表現した要
素実体位置識別子と、前記検索単位識別子から前記要素
実体位置識別子を特定するために、少なくとも前記検索
単位識別子と関係する前記要素実体位置識別子を対応付
けた要素管理テーブルを作成する構造情報作成手段を有
することにより、登録文書の構造が変化した場合でも前
記要素管理テーブルを更新するのみで対応が可能とな
り、従来技術のように文書構造が変化する度に文字列イ
ンデックス内の文脈識別子を変更する必要はないので、
登録文書の論理構造の変化する度に文字列インデックス
更新のための膨大な処理量が発生することはない。In order to solve the above problems, according to the present invention, a search unit identifier for identifying each element entity, an element entity position identifier expressing the position of each element entity in the tree structure, In order to specify the element entity position identifier from the search unit identifier, the registered document has a structure information creating unit that creates an element management table that associates at least the element entity position identifier related to the search unit identifier. Even if the structure of the document changes, it is possible to respond only by updating the element management table, and it is not necessary to change the context identifier in the character string index every time the document structure changes as in the related art.
Each time the logical structure of the registered document changes, a huge amount of processing for updating the character string index does not occur.

【００２４】請求項２では各要素実体を識別する検索単
位識別子と、各要素実体に至るタグ名を階層順に連ねた
パス名称を識別するパス名称ＩＤと、同一の親ノードを
持ち同一な名称を持つタグの同一階層内での出現順序を
階層順に連ねたパス階層を識別するパス階層ＩＤと、前
記検索単位識別子から前記パス名称ＩＤと前記パス階層
ＩＤを特定するために、少なくとも前記検索単位識別子
と関係する前記パス名称ＩＤ及びパス階層ＩＤを対応付
けた要素管理テーブルを作成する構造情報作成手段を有
することにより、登録文書の構造が変化した場合でも前
記要素管理テーブルを更新するのみで対応が可能とな
り、従来技術のように登録文書の論理構造の変化する度
に文字列インデックス更新のための膨大な処理量が発生
することはない。また、パス名称ＩＤとパス階層ＩＤを
導入することにより、従来技術のように検索範囲を特定
する際のＯＲ検索が不要になる。また、登録文書の論理
構造が異なる文書を多く登録する場合でも、要素実体を
パス名称ＩＤとパス階層ＩＤとで特定するので、従来技
術のように複数の登録文書の論理構造を順次重ね合わせ
ることによって形成される場合に必要となる文脈識別子
数よりは少なくて済む。According to a second aspect of the present invention, a search unit identifier for identifying each element entity, a path name ID for identifying a path name in which tag names leading to each element entity are arranged in a hierarchical order, and the same name having the same parent node and the same name are used. A path hierarchy ID for identifying a path hierarchy in which the appearance order of the tags in the same hierarchy is arranged in the hierarchy, and at least the search unit identifier for specifying the path name ID and the path hierarchy ID from the search unit identifier By having the structure information creating means for creating an element management table in which the path name ID and the path hierarchy ID related to the above are associated, even if the structure of the registered document changes, the element management table is updated only by updating the element management table. This makes it possible to avoid an enormous amount of processing for updating a character string index every time the logical structure of a registered document changes as in the related art. In addition, by introducing the path name ID and the path hierarchy ID, the OR search for specifying the search range as in the related art becomes unnecessary. Further, even when many documents having different logical structures of registered documents are registered, since the element entities are specified by the path name ID and the path hierarchy ID, the logical structures of a plurality of registered documents are sequentially overlapped as in the related art. Less than the number of context identifiers needed to be formed.

【００２５】請求項３ではタグ名を識別する名称ＩＤ
と、各要素実体を識別する検索単位識別子と、前記検索
単位識別子から前記名称ＩＤを特定するために、少なく
とも前記検索単位識別子と関係する前記名称ＩＤを対応
付けた要素管理テーブルを作成する構造情報作成手段を
有することにより、検索範囲として登録文書のノードの
タグ名を指定することが可能となる。In the third aspect, a name ID for identifying a tag name
Structure information for creating an element management table in which at least the name ID related to the search unit identifier is associated to identify the name ID from the search unit identifier and a search unit identifier for identifying each element entity With the creation means, it is possible to specify the tag name of the node of the registered document as the search range.

【００２６】請求項４では文字列検索結果一覧や各要素
実体表示のためのデータを作成する結果作成手段と、前
記結果作成手段で作成された検索結果を端末に表示する
結果表示手段とを有することにより、使用者に検索結果
を表示することが可能となる。According to a fourth aspect of the present invention, there are provided a result creating means for creating a character string search result list and data for displaying each element entity, and a result displaying means for displaying the search result created by the result creating means on a terminal. As a result, the search result can be displayed to the user.

【００２７】請求項５ではネットワーク上に、構造化文
書の入力を行う構造化文書入力手段と、前記構造化文書
入力手段により取り込んだ構造化文書を解析し該構造化
文書の木構造を生成する構造解析手段と、前記構造解析
手段により木構造で表現された構造化文書において、各
要素実体を識別する検索単位識別子と、各要素実体に至
るタグ名を階層順に連ねたパス名称を識別するパス名称
ＩＤと、同一の親ノードを持ち同一な名称を持つタグの
同一階層内での出現順序を階層順に連ねたパス階層を識
別するパス階層ＩＤと、前記検索単位識別子から前記パ
ス名称ＩＤ及び前記パス階層ＩＤを特定するために、少
なくとも前記検索単位識別子と関係する前記パス名称Ｉ
Ｄ及びパス階層ＩＤを対応付けた要素管理テーブルを作
成する構造情報作成手段とから成る構造化文書登録部を
独立して設けることにより、ネットワークを介して遠隔
から構造化文書の登録をすることが可能となり、登録文
書の構造が変化した場合でも前記要素管理テーブルを更
新するのみで対応が可能となり、従来技術のように登録
文書の論理構造の変化する度に文字列インデックス更新
のための膨大な処理量が発生することはない。またパス
名称ＩＤとパス階層ＩＤを導入することにより、従来技
術のように検索範囲を特定する際のＯＲ検索が不要にな
る。また登録文書の論理構造が異なる文書を多く登録す
る場合でも、要素実体をパス名称ＩＤとパス階層ＩＤと
で特定するので、従来技術のように複数の登録文書の論
理構造を順次重ね合わせることによって形成される場合
に必要となる文脈識別子数よりは少なくて済む。According to a fifth aspect of the present invention, a structured document input means for inputting a structured document on a network, and a structured document fetched by the structured document input means is analyzed to generate a tree structure of the structured document. A structure analysis unit, a search unit identifier for identifying each element entity in a structured document expressed in a tree structure by the structure analysis unit, and a path for identifying a path name in which tag names leading to each element entity are arranged in a hierarchical order. A path hierarchy ID for identifying a path hierarchy in which the appearance order of tags having the same parent node and the same name in the same hierarchy in the same hierarchy is arranged in a hierarchical order; In order to specify the path hierarchy ID, at least the path name I associated with the search unit identifier
By independently providing a structured document registration unit comprising a structure information creating unit for creating an element management table in which D and a path hierarchy ID are associated with each other, it is possible to remotely register a structured document via a network. It becomes possible to cope with the case where the structure of the registered document changes, only by updating the element management table. No processing volume is generated. In addition, by introducing the path name ID and the path hierarchy ID, the OR search for specifying the search range as in the related art is not required. Even when many documents having different logical structures of registered documents are registered, the element entities are identified by the path name ID and the path hierarchy ID. Therefore, the logical structure of a plurality of registered documents is sequentially superimposed as in the related art. It is less than the number of context identifiers required when formed.

【００２８】請求項６及び７では構造化文書の木構造が
変化した場合に、要素管理テーブルに記録されたパス名
称ＩＤ、パス階層ＩＤのうち、変更が必要なＩＤを更新
することにより、登録文書の構造が変化した場合でも前
記要素管理テーブルを更新することで対応が可能とな
り、従来技術のように登録文書の論理構造の変化する度
に文字列インデックス更新のための膨大な処理量が発生
することはない。According to the sixth and seventh aspects, when the tree structure of the structured document changes, the ID that needs to be changed is updated from the path name ID and the path hierarchy ID recorded in the element management table, thereby registering. Even if the structure of the document changes, it is possible to cope by updating the element management table, and a huge amount of processing for updating the character string index occurs every time the logical structure of the registered document changes as in the related art. I will not do it.

【００２９】請求項８ではネットワーク上に、構造化文
書の入力を行う構造化文書入力手段と、前記構造化文書
入力手段により取り込んだ構造化文書を解析し該構造化
文書の木構造を生成する構造解析手段と、前記構造解析
手段により生成された木構造からタグ名を識別する名称
ＩＤと、各要素実体を識別する検索単位識別子と、前記
検索単位識別子から前記名称ＩＤを特定するために、少
なくとも前記検索単位識別子と関係する前記名称ＩＤを
対応付けた要素管理テーブルを作成する構造情報作成手
段とから成る構造化文書登録部を独立して設けることに
より、ネットワークを介して遠隔から構造化文書の登録
が可能となる。According to the present invention, a structured document input means for inputting a structured document on a network, and a structured document fetched by the structured document input means is analyzed to generate a tree structure of the structured document. A structure analysis unit, a name ID for identifying a tag name from the tree structure generated by the structure analysis unit, a search unit identifier for identifying each element entity, and specifying the name ID from the search unit identifier, By independently providing a structured document registration unit including a structure information creating unit for creating an element management table in which at least the search unit identifier is associated with the name ID related to the search unit identifier, a structured document can be remotely provided via a network. Can be registered.

【００３０】請求項９では各要素実体から所定の文字数
で取り出した文字列が前記タグにまたがる場合は、該子
要素を識別する独自の検索単位識別子を取得し、該文字
列と該文字列の各文字の属する要素実体を識別する検索
単位識別子と前記タグを取り除いた要素実体内での該文
字列の位置を示す文字位置識別子とから成る検索用文字
列索引を生成する文字列索引作成部により、Ｍｉｘｅｄ
Ｃｏｎｔｅｎｔを含んだ構造化文書でも検索が可能と
なる。また作成される文字索引は前記検索単位識別子と
前記文字位置識別子の２要素から成るので、従来技術で
は３要素から成る文字列インデックスと比べメモリ量を
削減することができ、装置のコストダウンを実現するこ
とができる。In the ninth aspect, when a character string extracted with a predetermined number of characters from each element entity spans the tag, a unique search unit identifier for identifying the child element is obtained, and the character string and the character string A character string index creation unit for generating a search character string index including a search unit identifier for identifying an element entity to which each character belongs and a character position identifier indicating the position of the character string in the element entity with the tag removed , Mixed
Searching is possible even with a structured document including Content. In addition, since the character index to be created is composed of two elements, the search unit identifier and the character position identifier, the conventional technique can reduce the amount of memory as compared with a character string index composed of three elements, thereby realizing cost reduction of the apparatus. can do.

【００３１】請求項１０では予め数値であることを定義
しているタグに囲まれた文字列を識別する独自の検索単
位識別子を取得し、該タグに囲まれた文字列を数値デー
タに変換し、前記検索単位識別子と前記数値データとを
対応付けた数値型索引を生成する数値型索引作成手段に
より、特定の数値範囲を指定した検索が可能になる。According to a tenth aspect of the present invention, a unique search unit identifier for identifying a character string surrounded by a tag defining a numeric value in advance is obtained, and the character string surrounded by the tag is converted into numerical data. By using a numerical index generating means for generating a numerical index in which the search unit identifier is associated with the numerical data, a search in which a specific numerical range is specified becomes possible.

【００３２】請求項１１ではネットワーク上に、タグ名
を識別する名称ＩＤと、各要素実体に至るタグ名を階層
順に連ねたパス名称を識別するパス名称ＩＤと、同一の
親ノードを持ち同一な名称を持つタグの同一階層内での
出現順序を階層順に連ねたパス階層を識別するパス階層
ＩＤと、各要素実体を識別する検索単位識別子と、前記
検索単位識別子から前記名称ＩＤを特定するために、少
なくとも前記検索単位識別子と関係する前記名称ＩＤを
対応付けた要素管理テーブルまたは、前記検索単位識別
子から前記パス名称ＩＤと前記パス階層ＩＤを特定する
ために、少なくとも前記検索単位識別子と関係する前記
パス名称ＩＤ及びパス階層ＩＤを対応付けた要素管理テ
ーブルの少なくともいずれか一方を記憶するデータ格納
部と、検索条件の入力を行う検索条件入力手段と、前記
検索条件入力手段で入力された検索条件から検索条件に
該当する前記名称ＩＤ、前記パス名称ＩＤ、前記パス階
層ＩＤの少なくともいずれか１つ（ＩＤ１）を特定する
検索条件解析手段と、検索条件に該当する文字列を有す
る前記検索単位識別子を求める文字列索引検索手段と、
前記文字列索引検索手段で特定した検索単位識別子を基
に前記要素管理テーブルを参照して対応する名称ＩＤ、
パス名称ＩＤ、パス階層ＩＤの少なくともいずれか１つ
（ＩＤ２）を求め、前記ＩＤ２と前記検索条件解析手段
により求めたＩＤ１とが一致する検索単位識別子のみを
抽出する構造照合手段を備えた文字列検索部をそれぞれ
独立して設けることにより、ネットワークを介して遠隔
からの文字列検索が可能となる。In the eleventh aspect, the same parent node having the same parent node as the name ID for identifying the tag name, the path name ID for identifying the path name in which the tag names leading to each element entity are arranged in a hierarchical order, and the same on the network. To specify a path hierarchy ID for identifying a path hierarchy in which the appearance order of tags having names in the same hierarchy is linked in a hierarchical order, a search unit identifier for identifying each element entity, and for specifying the name ID from the search unit identifier An element management table in which the name ID related to at least the search unit identifier is associated, or at least the search unit identifier is related to specify the path name ID and the path hierarchy ID from the search unit identifier. A data storage unit for storing at least one of an element management table in which the path name ID and the path hierarchy ID are associated with each other; And at least one of the name ID, the path name ID, and the path hierarchical ID (ID1) corresponding to the search condition is specified from the search condition input means for performing the search and the search condition input by the search condition input means. Search condition analysis means, and a character string index search means for obtaining the search unit identifier having a character string corresponding to the search condition,
A corresponding name ID by referring to the element management table based on the search unit identifier specified by the character string index search means;
A character string provided with a structure matching unit for obtaining at least one (ID2) of a path name ID and a path hierarchy ID and extracting only a search unit identifier in which the ID2 matches the ID1 obtained by the search condition analysis unit By providing the search units independently, a character string search can be remotely performed via a network.

【００３３】請求項１２では予め数値であることを定義
しているタグに囲まれた文字列を含む構造化文書の数値
範囲検索において、前記タグに囲まれた文字列を識別す
る独自の検索単位識別子と前記タグに囲まれた文字列を
数値に変換した数値データとを対応付けた数値型索引を
参照し、検索条件に該当する前記検索単位識別子を抽出
する数値型索引検索手段を有することを特徴とする請求
項１１記載の文字列検索部を有していることにより、ネ
ットワークを介して遠隔から、指定した範囲の数値を有
する要素実体の検索単位識別子を求めることが可能とな
る。In the twelfth aspect, in a numerical range search of a structured document including a character string surrounded by a tag which defines a numerical value in advance, a unique search unit for identifying the character string surrounded by the tag A numerical index search unit that refers to a numerical index that associates an identifier with a numerical data obtained by converting a character string surrounded by the tag into a numerical value, and extracts the search unit identifier corresponding to a search condition. With the character string search unit according to the eleventh aspect, it is possible to remotely obtain a search unit identifier of an element entity having a numerical value in a specified range via a network.

【００３４】請求項１３では構造化文書を読み込むステ
ップと、各要素実体に至るタグ名を階層順に連ねたパス
名称を識別するパス名称ＩＤと、同一の親ノードを持ち
同一な名称を持つタグの同一階層内での出現順序を階層
順に連ねたパス階層を識別するパス階層ＩＤを生成する
ステップと、要素実体を有するか否かを判断するステッ
プと、各要素実体を識別する検索単位識別子を生成する
ステップと、前記検索識別子から前記パス名称ＩＤ及び
前記パス階層ＩＤを特定するために、少なくとも前記検
索単位識別子と関係する前記パス名称ＩＤ及びパス階層
ＩＤを対応付けた要素管理テーブルを作成するステップ
を有するプログラムを記録した可搬型媒体により、汎用
計算機に上記プログラムをインストールすることで、構
造化文書を登録する構造化文書登録部の機能を持たせる
ことが可能となる。According to a thirteenth aspect, a step of reading a structured document, a path name ID for identifying a path name in which tag names leading to each element entity are arranged in a hierarchical order, and a tag having the same parent node and the same name A step of generating a path hierarchy ID for identifying a path hierarchy in which the order of appearance in the same hierarchy is connected in a hierarchical order; a step of determining whether or not there is an element entity; and a search unit identifier for identifying each element entity Creating an element management table that associates at least the path name ID and the path hierarchy ID related to the search unit identifier in order to identify the path name ID and the path hierarchy ID from the search identifier By registering a structured document by installing the above program on a general-purpose computer using a portable medium storing a program having It is possible to provide the function of the structured document registration unit.

【００３５】請求項１４では構造化文書を読み込むステ
ップと、タグ名を識別する名称ＩＤを生成するステップ
と、要素実体を有するか否かを判断するステップと、各
要素実体を識別する検索単位識別子を生成するステップ
と、前記検索単位識別子から前記名称ＩＤを特定するた
めに、少なくとも前記検索単位識別子と関係する前記名
称ＩＤを対応付けた要素管理テーブルを作成するステッ
プを有するプログラムを記録した可搬型媒体により、汎
用計算機に上記プログラムをインストールすることで、
構造化文書を登録する構造化文書登録部の機能を持たせ
ることが可能となる。In the fourteenth aspect, a step of reading a structured document, a step of generating a name ID for identifying a tag name, a step of determining whether or not there is an element entity, and a search unit identifier for identifying each element entity And a program for generating an element management table in which at least the name ID related to the search unit identifier is associated to specify the name ID from the search unit identifier. By installing the above program on a general-purpose computer depending on the medium,
It is possible to have a function of a structured document registration unit for registering a structured document.

【００３６】請求項１５では、要素実体内部にさらにタ
グに囲まれた要素実体（子要素）を有する構造化文書の
文字索引の生成方法について、構造解析済みデータを読
み込むステップと、要素実体を有するか否かをチェック
するステップと、要素実体を識別するための検索単位識
別子を取得するステップと、前記子要素を含むか否かを
調べるステップと、該子要素を識別する検索単位識別子
を取得するステップと、要素実体から１以上の所定文字
数を単位とする文字列を取り出すステップと、前記文字
列の各文字の属する検索単位識別子を求めるステップ
と、該文字列及び該文字列の各文字の属する前記検索単
位識別子及びタグを取り除いた要素実体内での当該文字
列の位置を示す文字位置識別子を有する検索文字列索引
を生成するステップとを有するプログラムを記録した可
搬型媒体により、汎用計算機に上記プログラムをインス
トールすることにより、ＭｉｘｅｄＣｏｎｔｅｎｔを
含んだ構造化文書でも検索が可能な文字列索引を作成す
る文字列索引作成部の機能を持たせることが可能とな
る。According to a fifteenth aspect, a method for generating a character index of a structured document having an element entity (child element) further surrounded by tags inside the element entity includes a step of reading structurally analyzed data, and an element entity. Checking, whether to obtain a search unit identifier for identifying an element entity, checking whether or not the child element is included, and obtaining a search unit identifier for identifying the child element Extracting a character string in units of at least one predetermined number of characters from an element entity; obtaining a search unit identifier to which each character of the character string belongs; and belonging to the character string and each character of the character string Generating a search string index having a character position identifier indicating the position of the character string in the element entity from which the search unit identifier and the tag have been removed; By installing the above program on a general-purpose computer by using a portable medium storing a program having a character string, a function of a character string index creating unit that creates a character string index that can be searched even in a structured document including Mixed Content is provided. It is possible to make it.

【００３７】請求項１６では、構造化文書の数値検索用
索引生成方法について、構造化文書を読み込むステップ
と、予め数値であることを定義しているタグに囲まれた
文字列であるか否かを判断するステップと、数値である
ことを定義したタグに囲まれた文字列を識別するための
検索単位識別子を取得するステップと、該文字列を数値
に変換するステップと、前記検索単位識別子と前記数値
とからなる数値型索引を生成するステップを有するプロ
グラムを記録した可搬型媒体により、汎用計算機に上記
プログラムをインストールすることにより、数値範囲を
指定した検索も可能な文字列索引を生成する文字列索引
作成部の機能を持たせることが可能となる。According to a sixteenth aspect of the present invention, in the index generating method for a numerical value search of a structured document, a step of reading the structured document and a step of determining whether or not the character string is enclosed by a tag defining a numerical value in advance. Determining, a step of obtaining a search unit identifier for identifying a character string surrounded by a tag defined to be a numerical value, a step of converting the character string into a numerical value, and the search unit identifier By installing the program on a general-purpose computer by using a portable medium storing a program having a step of generating a numerical index composed of the numerical values, a character that generates a character string index that can be searched with a numerical range specified The function of the column index creation unit can be provided.

【００３８】請求項１７では、構造化文書の検索方法に
ついて、検索条件を読み込むステップと、前記検索条件
に該当するタグ名を識別する名称ＩＤ又は、各要素実体
に至るタグ名を階層順に連ねたパス名称を識別するパス
名称ＩＤ又は、同一の親ノードを持ち同一な名称を持つ
タグの同一階層内での出現順序を階層順に連ねたパス階
層を識別するパス階層ＩＤのいずれかのＩＤ（以下、Ｉ
Ｄ１）に変換するステップと、検索条件に該当する文字
列を有する各要素実体を識別する検索単位識別子（以
下、ＩＤ２）を特定するステップと、前記ＩＤ２から前
記名称ＩＤ、前記パス名称ＩＤ、前記パス階層ＩＤを特
定するために、少なくとも前記ＩＤ２と関係する前記名
称ＩＤ、前記パス名称ＩＤ、前記パス階層ＩＤを対応付
けた要素管理テーブルを参照し、前記ＩＤ２に対応する
前記名称ＩＤ、前記パス名称ＩＤ、前記パス階層ＩＤの
少なくともいずれか１つのＩＤ（以下、ＩＤ３）を求め
るステップと、前記ＩＤ１と前記ＩＤ３とが一致する前
記検索単位識別子のみを抽出するステップを有するプロ
グラムを記録した可搬型媒体により、汎用計算機に上記
プログラムをインストールすることにより、文字列検索
部の機能を持たせることが可能となる。According to a seventeenth aspect, in the structured document search method, a step of reading a search condition and a name ID for identifying a tag name corresponding to the search condition or a tag name leading to each element entity are arranged in a hierarchical order. Either a path name ID for identifying a path name or a path hierarchy ID for identifying a path hierarchy in which the appearance order of the tags having the same parent node and the same name in the same hierarchy is arranged in the hierarchy (hereinafter, referred to as “ID”). , I
D1), identifying a search unit identifier (hereinafter, ID2) for identifying each element entity having a character string corresponding to the search condition, and determining the name ID, the path name ID, In order to specify the path tier ID, at least the name ID, the path name ID, and the element management table in which the path tier ID is associated with the ID2 are referred to, and the name ID corresponding to the ID2, the path A portable type recording a program having a step of obtaining at least one ID (hereinafter, ID3) of a name ID and the path hierarchy ID, and a step of extracting only the search unit identifier in which the ID1 and the ID3 match. By installing the above program on a general-purpose computer using a medium, a function of a character string search unit is provided. Theft is possible.

【００３９】請求項１８では、中間ノード以下を検索範
囲に指定した場合における検索範囲に含まれるノードを
決定する方法について、各要素実体に至るタグ名を階層
順に連ねたパス名称を識別するパス名称又は、同一の親
ノードを持ち同一な名称を持つタグの同一階層内での出
現順序を階層順に連ねたパス階層を、１階層登り、現在
位置するノードが指定した中間ノードと一致するか又
は、既に検索範囲に含まれていると判定されているノー
ドであるかを判断し、前記いずれかの条件に該当するノ
ードである場合はそれまでたどったノード全てを検索範
囲に含まれると判定し、現在位置するノードが指定した
中間ノードと一致しないか又は、既に検索範囲外と判定
されているノードであるかを判断し、前記いずれかの条
件に該当するノードである場合はそれまでたどったノー
ド全てを検索範囲外であると判定する処理を、最下層ノ
ードを起点として１階層登る毎に実行し、最上位層のノ
ードに至るまで繰り返し実行することにより検索範囲を
特定する方法により、検索範囲として或る中間ノード以
下を指定した場合に検索範囲に含まれるノードを特定す
ることが可能となる。In the method for determining a node included in a search range when an intermediate node or less is specified as a search range, a path name for identifying a path name in which tag names leading to each element entity are arranged in a hierarchical order Or, a path hierarchy in which the appearance order of tags having the same parent node and the same name in the same hierarchy is linked in the hierarchy order is moved up by one hierarchy, and the current position of the node matches the designated intermediate node, or Determine whether the node has already been determined to be included in the search range, if it is a node corresponding to any of the above conditions, determine that all the nodes that have been reached so far are included in the search range, It is determined whether the currently located node does not match the designated intermediate node, or whether the node is already determined to be out of the search range. In some cases, the process of determining that all the nodes that have been reached so far is out of the search range is executed every time one level is climbed from the lowest node as a starting point, and is repeatedly executed until the node of the highest layer is reached. Is specified, the nodes included in the search range can be specified when a certain intermediate node or less is specified as the search range.

【００４０】請求項１９の発明は、請求項２に記載の構
造化文書管理装置を、汎用計算機とプログラムによって
実現することを可能とするものである。According to a nineteenth aspect, the structured document management apparatus according to the second aspect can be realized by a general-purpose computer and a program.

【００４１】請求項２０の発明は、請求項９に記載の文
字列索引作成装置を、汎用計算機とプログラムによって
実現することを可能とするものである。According to a twentieth aspect of the present invention, the character string index creating device according to the ninth aspect can be realized by a general-purpose computer and a program.

【００４２】請求項２１の発明は、請求項１０に記載の
文字列索引作成装置を、汎用計算機とプログラムによっ
て実現することを可能とするものである。According to a twenty-first aspect of the present invention, the character string index creating device according to the tenth aspect can be realized by a general-purpose computer and a program.

【００４３】請求項２２の発明は、木構造を有するデー
タを検索するために、検索範囲として所定のノード以下
を指定した場合に、各ノードが検索範囲に含まれるか否
かを示す照合テーブルを順次作成していくプログラムに
より、検索範囲の特定を効率良く実現するものである。According to a twenty-second aspect of the present invention, in order to search for data having a tree structure, when a predetermined node or less is specified as a search range, a collation table indicating whether or not each node is included in the search range. The search range can be efficiently specified by the sequentially created program.

【００４４】請求項２３の発明は、木構造で表現される
構造化文書を管理する装置であって、要素実体を識別す
る検索単位識別子を割当てる構造情報作成手段と、前記
検索単位識別子とは別個に要素実体を特定する手段とし
て、前記木構造において同一の親ノードを持ち同一な名
称を持つタグの出現順序を階層別に連ねたパス階層を格
納する手段と、前記木構造においてタグ名を階層別に連
ねたパス名称を格納する手段と、を備え、さらに、前記
パス階層及びパス名称と前記検索単位識別子とを関連付
ける要素管理テーブルを格納する手段と、検索条件の文
字列を含む要素実体の検索単位識別子を抽出する文字列
索引検索手段と、文字列索引検索手段により抽出された
検索単位識別子から、前記要素管理テーブルを参照し、
検索条件として指定したパス階層又はパス名称を満たす
文書を検索する構造照合手段と、を有する構造化文書管
理装置であって、効率良く文書検索を実現することが可
能となる。According to a twenty-third aspect of the present invention, there is provided an apparatus for managing a structured document represented by a tree structure, wherein structure information creating means for assigning a search unit identifier for identifying an element entity is provided separately from the search unit identifier. Means for storing a path hierarchy in which the order of appearance of tags having the same parent node and the same name in the tree structure are linked by hierarchy as means for specifying an element entity, and tag names are classified by hierarchy in the tree structure. Means for storing a series of path names, means for storing an element management table for associating the path hierarchy and path name with the search unit identifier, and a search unit for an element entity including a character string of search conditions Character string index search means for extracting an identifier, and referring to the element management table from the search unit identifier extracted by the character string index search means,
A structured collation unit that searches for a document that satisfies a path hierarchy or a path name specified as a search condition, thereby enabling efficient document search.

【００４５】請求項２４の発明は、木構造で表現可能な
データ構造を有するデータを管理するデータ管理装置で
あって、データの実体要素の特定は、前記木構造におい
て同一の親ノードを持ち同一な名称を持つタグの出現順
序を階層別に連ねたパス階層を格納する手段を用いるこ
とを特徴とするデータ管理装置であって、木構造で表現
できるデータの管理を少ない個数のＩＤにより管理する
ことが可能になる。A twenty-fourth aspect of the present invention is a data management apparatus for managing data having a data structure that can be represented by a tree structure, wherein the identification of the entity of the data has the same parent node in the tree structure. A data management device characterized by using a means for storing a path hierarchy in which the appearance order of tags having various names is linked by hierarchy, wherein management of data that can be expressed by a tree structure is managed by a small number of IDs. Becomes possible.

【００４６】請求項２５の発明は、木構造で表現された
データのタグ名を階層別に連ねたパス名称を格納する手
段をさらに備え、前記木構造におけるデータの実体要素
を一意に特定するために前記パス階層を格納する手段
と、前記パス名称を格納する手段とを用いることを特徴
とする請求項２４記載のデータ管理装置であって、木構
造で表現できるデータをパス階層及びパス名称で特定す
ることにより少ない個数のＩＤにより管理することが可
能になる。According to a twenty-fifth aspect of the present invention, there is further provided a means for storing a path name in which tag names of data expressed in a tree structure are linked by hierarchy, in order to uniquely specify a substantial element of data in the tree structure. 25. The data management apparatus according to claim 24, wherein a unit that stores the path hierarchy and a unit that stores the path name are used to specify data that can be expressed in a tree structure by using a path hierarchy and a path name. By doing so, it is possible to manage with a small number of IDs.

【００４７】請求項２６の発明は、同一親ノードを持ち
同一のタグ名称を有する実体要素が複数存在する場合、
前記パス名称は同一に表現されることを特徴とする請求
項２５記載のデータ管理装置であって、データの検索に
おいていわゆるＯＲ検索が不要となり、高速に検索する
ことが可能にすることが可能となる。According to a twenty-sixth aspect of the present invention, when there are a plurality of entity elements having the same parent node and the same tag name,
26. The data management apparatus according to claim 25, wherein the path name is expressed in the same manner, and a so-called OR search is not required in data search, and high-speed search can be performed. Become.

【００４８】[0048]

【発明の実施の形態】以下、本発明の実施の形態につい
て説明する。なお、本発明はこれら実施の形態に何ら限
定されるものではなく、その要旨を逸脱しない範囲にお
いて、種々なる態様で実施し得る。Embodiments of the present invention will be described below. The present invention is not limited to these embodiments at all, and can be implemented in various modes without departing from the gist thereof.

【００４９】（実施の形態１）図１は本発明の実施の形
態１における構造化文書管理装置の構成図である。図１
に示す構造化文書管理装置は、端末１０１、構造化文書
入力手段１０２、検索条件入力手段１０３、結果表示手
段１０４、検索エンジン１０５、データ格納部１０６か
らなる。(Embodiment 1) FIG. 1 is a configuration diagram of a structured document management apparatus according to Embodiment 1 of the present invention. FIG.
Is composed of a terminal 101, a structured document input unit 102, a search condition input unit 103, a result display unit 104, a search engine 105, and a data storage unit 106.

【００５０】端末１０１は、文書検索における検索条件
の指定および検索結果の表示に使用する。The terminal 101 is used to specify search conditions in document search and to display search results.

【００５１】構造化文書入力手段１０２は、登録対象文
書を格納しておき、文書の登録を行なう際にここから検
索エンジン１０５へデータを送る。検索条件入力手段１
０３は、端末１０１から入力された検索条件を検索エン
ジン１０５へ送る。The structured document input means 102 stores a document to be registered, and sends data to the search engine 105 from this when registering the document. Search condition input means 1
03 sends the search condition input from the terminal 101 to the search engine 105.

【００５２】結果表示手段１０４は、検索結果を検索エ
ンジン１０５から受け取り、端末１０１に表示する。The result display means 104 receives the search result from the search engine 105 and displays it on the terminal 101.

【００５３】検索エンジン１０５は、実際に構造化文書
の登録、検索および検索結果の作成を行なう。まず、登
録に関して、１０７は登録対象文書の論理構造を解析す
る構造解析手段、１０８は前記構造解析手段によって論
理構造に分けられた各要素の論理構造に関する情報を作
成する構造情報作成手段、１０９は文字列に対して高速
に検索を行なうための文字列索引を作成する文字列索引
作成手段である。これら１０７、１０８、および１０９
についての詳細は、文書登録処理の流れの説明の中で述
べる。次に検索に関して、１１０は検索条件入力手段１
０３から受けた検索条件中の論理構造に関する条件を、
本検索エンジン内における構造条件の表現方法に変換す
る検索条件解析手段、１１１は前記文字列索引を用いて
検索条件中の検索文字列で検索処理を行なう文字列索引
検索手段、１１２は前記文字列索引検索手段で得られた
文字列検索結果群の中から、前記検索条件解析手段で変
換した本検索エンジン内における構造条件に一致するも
のだけを抽出する構造照合手段である。１１０、１１
１、および１１２についての詳細は文書検索の流れの説
明の中で述べる。次に結果作成に関して、１１３は検索
結果の一覧や、実体表示のためのデータを作成し結果表
示手段１０４へ渡す結果作成手段である。The search engine 105 actually registers a structured document, performs a search, and creates a search result. First, with respect to registration, 107 is a structure analyzing means for analyzing the logical structure of the document to be registered, 108 is structural information creating means for creating information on the logical structure of each element divided into logical structures by the structural analyzing means, 109 is This is character string index creation means for creating a character string index for performing a high-speed search for a character string. These 107, 108 and 109
Will be described in the description of the flow of the document registration process. Next, regarding search, 110 is search condition input means 1
03, the condition regarding the logical structure in the search condition received from
Search condition analysis means for converting into a structure condition expression method in the present search engine; 111, a character string index search means for performing a search process on a search character string in a search condition using the character string index; This is a structure collating means for extracting, from a group of character string search results obtained by the index search means, only those which match the structural conditions in the present search engine converted by the search condition analyzing means. 110, 11
Details of 1 and 112 will be described in the description of the document search flow. Next, regarding the result creation, reference numeral 113 denotes a result creation unit that creates a list of search results and data for entity display and transfers the data to the result display unit 104.

【００５４】データ格納部１０６は、構造解析手段１０
７によって作成された構造解析済みデータを格納する構
造解析済みデータ格納手段１１４、文書中の検索対象要
素ごとに論理構造情報を格納した要素管理テーブル格納
手段１１５、最上位階層から順にタグ名を連ねて記述し
た文字列（以下、パス名称と呼ぶ）を管理し、各パス名
称にＩＤを割当てたパス名称インデックスを格納するパ
ス名称インデックス格納手段１１６、パス名称の各階層
の出現順序（同じ親要素を持つ同じタグ名の要素の中で
何番目に出現した要素かを示す番号）を連ねて記述した
文字列（以下、パス階層と呼ぶ）を管理し、各パス階層
にＩＤを割当てたパス階層インデックスを格納するパス
階層インデックス格納手段１１７、各要素のタグ名に対
してＩＤを割当てた名称ＩＤテーブルを格納する名称Ｉ
Ｄテーブル格納手段１１８、前記文字列索引作成手段１
０９によって作成された文字列索引を格納する文字列索
引格納手段１１９、登録文書の実体データを格納する実
体データ格納手段１２０、検索結果一覧のためのデータ
を格納する一覧データ格納手段１２１からなり、構造化
文書の検索および結果表示に用いるデータの格納に使用
する。The data storage unit 106 stores the structure analysis means 10
7, a structure-analyzed data storage unit 114 for storing the structure-analyzed data, an element management table storage unit 115 for storing logical structure information for each search target element in the document, and tag names in order from the highest hierarchy Path name index storage means 116 for managing a character string described as follows (hereinafter, referred to as a path name) and storing a path name index in which an ID is assigned to each path name, the order of appearance of each layer of the path name (the same parent element A character string (hereinafter, referred to as a path hierarchy) in which elements having the same tag name having the same name are consecutively described, and an ID is assigned to each path hierarchy. Path hierarchy index storage means 117 for storing an index, name I for storing a name ID table in which IDs are assigned to tag names of respective elements
D table storage means 118, character string index creation means 1
09, a character string index storage means 119 for storing the character string index created in step 09, an entity data storage means 120 for storing the entity data of the registered document, and a list data storage means 121 for storing data for a search result list. It is used to search structured documents and store data used for displaying results.

【００５５】次に本実施の形態における文書登録の処理
を具体的な構造化文書の例を用いて説明する。Next, the document registration process according to the present embodiment will be described using a specific example of a structured document.

【００５６】まず、構造化文書入力手段１０２から登録
対象文書を読み込む。次に構造解析手段１０７によって
登録対象文書の構造を理解できる形に変換する。この構
造解析手段１０７によって、文字の並びとしての構造化
文書が構造情報作成手段１０８に理解できるデータ構造
に変換され（以下、構造解析済みデータと呼ぶ）、構造
解析済みデータ格納手段１１４に格納される。First, a document to be registered is read from the structured document input means 102. Next, the structure of the registration target document is converted into a form that can be understood by the structure analysis unit 107. The structure analysis unit 107 converts the structured document as a character sequence into a data structure that can be understood by the structure information creation unit 108 (hereinafter, referred to as “structure analyzed data”), and is stored in the structure analyzed data storage unit 114. You.

【００５７】次に構造情報作成手段１０８で、前記構造
解析手段によって論理構造に分けられた各要素の論理構
造に関する情報を作成する。Next, the structure information creating means 108 creates information on the logical structure of each element divided into logical structures by the structure analyzing means.

【００５８】図２は構造化文書の一例である。図２の構
造化文書を構造解析手段１０７によって解析した結果得
られる木構造は図３のようになる。以下ではこの論理構
造を持った構造化文書を中心に説明する。図３において
実体（テキスト）を持つ要素（以下、要素実体）は網掛
けで表示されている。またこれら要素実体は、検索エン
ジン内で検索単位を一意に表す符号（以下、検索単位識
別子と呼ぶ）が割当てられる。この検索単位識別子は対
象とする文書内の論理的位置とは無関係な符号であり、
例えば、数値であっても良い。FIG. 2 shows an example of a structured document. The tree structure obtained as a result of analyzing the structured document of FIG. 2 by the structure analysis unit 107 is as shown in FIG. Hereinafter, a description will be given mainly of a structured document having this logical structure. In FIG. 3, elements having an entity (text) (hereinafter, element entities) are shaded. These element entities are assigned codes (hereinafter, referred to as search unit identifiers) that uniquely indicate search units in the search engine. This search unit identifier is a code irrelevant to the logical position in the target document,
For example, it may be a numerical value.

【００５９】図３において要素実体の下段に書かれた数
値が検索単位識別子の例である。また、要素実体は上述
のパス名称インデックス、パス階層インデックス、名称
ＩＤのいずれか１つ又は上記インデックスの組み合わせ
により特定が可能なので、上記３種のインデックスを総
称して「要素実体位置識別子」という。In FIG. 3, the numerical value written in the lower part of the element entity is an example of the search unit identifier. The element entity can be specified by any one of the above-mentioned path name index, path hierarchy index, and name ID, or a combination of the above-mentioned indexes. Therefore, the three types of indexes are collectively referred to as “element entity position identifier”.

【００６０】図４は構造情報作成手段１０８の処理の流
れである。まず、登録対象文書の構造解析済みデータを
構造解析済みデータ格納手段１１４から読込み、登録対
象文書ごとに一意な番号（以下、文書番号と呼ぶ）を割
当てる（ステップ４０１）。FIG. 4 shows the flow of the processing of the structure information creating means 108. First, the structure-analyzed data of the registration target document is read from the structure analyzed data storage unit 114, and a unique number (hereinafter, referred to as a document number) is assigned to each registration target document (step 401).

【００６１】次に登録対象文書の各要素に対して以下の
処理を繰り返す。まず、現在参照中の要素の名称ＩＤの
取得を行なう（ステップ４０２）。図５は図３のような
木構造を持つ構造化文書を登録した時に最終的に作成さ
れる名称ＩＤテーブルの例である。図３の３０１の要素
のタグ名は“段落”であるから、図５より名称ＩＤは
“Ｔ９”である。ステップ４０２では、この名称ＩＤテ
ーブルに現在参照中の要素に該当するタグ名と名称ＩＤ
のレコードが存在する場合はその名称ＩＤを取得し、存
在しない場合にはそのタグ名と名称ＩＤのレコードを新
たに作成し、名称ＩＤテーブル格納手段１１８に格納す
るとともに、その名称ＩＤを取得する。次に現在参照中
の要素のパス名称ＩＤの取得を行なう（ステップ４０
３）。図６は図３のような木構造を持つ構造化文書を登
録した時に、最終的に作成されるパス名称インデックス
の例である。パス名称インデックスは、登録対象文書の
パス名称に一意なＩＤ（パス名称ＩＤ）を割当てたもの
である。また各パス名称ＩＤは最下層のタグ名の名称Ｉ
Ｄの情報を持つ。図３の３０１の要素のパス名称は「／
論文／本文／章／節／段落」であり、このパス名称に割
当てられたパス名称ＩＤは、図６の例では６０１に示さ
れる値（Ｎ１１）である。ステップ４０３では、このパ
ス名称インデックスに現在参照中の要素に該当するパス
名称のノードが存在する場合はそのパス名称ＩＤを取得
し、存在しない場合にはそのパス名称のノードとそのパ
ス名称ＩＤを新たに作成し、パス名称インデックス格納
手段１１６に格納するとともに、そのパス名称ＩＤを取
得する。なお、ここでパス名称を表現する際に、各階層
の区切り文字として“／” (スラッシュ)を用いたが、
これはタグ名に用いられない文字である限りどのような
ものでも構わない。次に現在参照中の要素のパス階層Ｉ
Ｄの取得を行なう（ステップ４０４）。図７は図３のよ
うな木構造を持つ構造化文書を登録した時に、最終的に
作成されるパス階層インデックスの例である。パス階層
インデックスは、登録対象文書のパス階層に一意なＩＤ
（パス階層ＩＤ）を割当てたものである。図３の３０１
の要素のパス階層は「／１／１／１／１／２」であり、
このパス階層に割当てられたパス階層ＩＤは図７の例で
は７０１に示される値（Ｌ５）である。ステップ４０４
では、このパス階層インデックスに現在参照中の要素に
該当するパス階層のノードが存在する場合はそのパス階
層ＩＤを取得し、存在しない場合にはそのパス階層のノ
ードとそのパス階層ＩＤを新たに作成し、パス階層イン
デックス格納手段１１７に格納するとともに、そのパス
階層ＩＤを取得する。なお、ここでパス階層を表現する
際に、パス名称と同様に各階層の区切り文字として
“／” (スラッシュ)を用いたが、これは出現順序を表
す数字に用いられない文字である限りどのようなもので
も構わない。次に現在参照中の要素が実体を持つかどう
かチェックし（ステップ４０５）、実体を持たない場合
はステップ４０８へ進む。実体を持つ場合、ステップ４
０６へ進み、この要素に検索単位識別子を割当てる。次
にステップ４０７で要素管理テーブルに現在参照中の要
素のレコードを追加する。図８は要素管理テーブルの例
であり、８０１は図３の３０１の要素に関するレコード
に該当する。実施の形態１における要素管理テーブル
は、検索単位識別子をキーとして文書番号、パス名称Ｉ
Ｄ、パス階層ＩＤ、名称ＩＤを管理する。次にステップ
４０８で登録対象文書の全要素についてステップ４０２
から４０７の処理を終了したか調べ、まだ未処理の要素
が存在したらステップ４０２以降の処理を繰り返す。Next, the following process is repeated for each element of the document to be registered. First, the name ID of the currently referred element is obtained (step 402). FIG. 5 shows an example of a name ID table finally created when a structured document having a tree structure as shown in FIG. 3 is registered. Since the tag name of the element 301 in FIG. 3 is “paragraph”, the name ID is “T9” from FIG. In step 402, the tag ID and the name ID corresponding to the element currently being referred to in the name ID table
If a record exists, its name ID is obtained. If it does not exist, a new record of its tag name and name ID is created, stored in the name ID table storage means 118, and its name ID is obtained. . Next, the path name ID of the element currently being referred to is obtained (step 40).
3). FIG. 6 shows an example of a path name index finally created when a structured document having a tree structure as shown in FIG. 3 is registered. The path name index is obtained by assigning a unique ID (path name ID) to the path name of the document to be registered. Each path name ID is the name I of the tag name of the lowest layer.
D information. The path name of the element 301 in FIG.
The path name ID assigned to this path name is the value (N11) indicated by 601 in the example of FIG. In step 403, if a path name node corresponding to the element currently being referred to exists in this path name index, the path name ID is acquired. If not, the path name node and the path name ID are acquired. It is newly created and stored in the path name index storage means 116, and its path name ID is obtained. Note that when expressing the path name here, "/" (slash) was used as a delimiter for each layer.
This can be any character that is not used in tag names. Next, the path hierarchy I of the currently referenced element
D is obtained (step 404). FIG. 7 shows an example of a path hierarchy index finally created when a structured document having a tree structure as shown in FIG. 3 is registered. The path hierarchy index is a unique ID for the path hierarchy of the document to be registered.
(Path hierarchy ID). 301 in FIG.
Is a path hierarchy of "/ 1/1/1/1/2",
The path hierarchy ID assigned to this path hierarchy is the value (L5) indicated by 701 in the example of FIG. Step 404
Then, if a node of the path hierarchy corresponding to the element currently being referenced exists in this path hierarchy index, the path hierarchy ID is acquired. If not, the node of the path hierarchy and the path hierarchy ID are newly added. It is created, stored in the path hierarchy index storage means 117, and its path hierarchy ID is obtained. Here, when expressing the path hierarchy, "/" (slash) is used as a delimiter for each layer as in the case of the path name. Something like that is fine. Next, it is checked whether the element currently being referred to has an entity (step 405). If the element does not have an entity, the process proceeds to step 408. If you have an entity, step 4
Proceeding to 06, a search unit identifier is assigned to this element. Next, in step 407, a record of the element currently referred to is added to the element management table. FIG. 8 shows an example of the element management table. Reference numeral 801 corresponds to a record related to the element 301 in FIG. The element management table according to the first embodiment includes a document number and a path name I using a search unit identifier as a key.
D, path hierarchy ID, and name ID are managed. Next, in step 408, step 402 is performed for all elements of the registration target document.
It is checked whether the processing from step 407 to step 407 has been completed, and if there are any unprocessed elements, the processing after step 402 is repeated.

【００６２】次に文字列索引作成手段１０９では、各要
素ごとに要素内容の検索用文字列索引を作成する。文字
列索引作成手段１０９の処理の流れを図９を用いて説明
する。Next, the character string index creating means 109 creates a character string index for element content search for each element. The processing flow of the character string index creation means 109 will be described with reference to FIG.

【００６３】まず構造解析済みデータ格納手段１１４か
ら登録対象文書の構造解析済みデータを読み込む（ステ
ップ９０１）。次に現在参照中の要素が実体を持つかど
うかチェックし（ステップ９０２）、実体を持たない場
合はステップ８０７へ進む。実体を持つ場合、ステップ
９０３へ進み、構造情報作成手段１０８の処理ステップ
４０６でこの要素に割当てた検索単位識別子を取得す
る。次に該要素内容の文字列についてあらかじめ定めた
文字数の文字連鎖を取り出す（ステップ９０４）。First, the structurally analyzed data of the document to be registered is read from the structurally analyzed data storage means 114 (step 901). Next, it is checked whether or not the element currently being referred to has an entity (step 902). If the element does not have an entity, the process proceeds to step 807. If the element has an entity, the process proceeds to step 903, and the processing unit 406 of the structure information creating unit 108 acquires the search unit identifier assigned to this element. Next, a character chain having a predetermined number of characters is extracted from the character string of the element content (step 904).

【００６４】この文字連鎖について、該当する検索単位
識別子、および該文字連鎖先頭文字がその要素内容にお
いて何番目の文字かを表す番号（以下、文字位置番号と
呼ぶ）の情報を文字列索引に追加する（ステップ９０
５）。ステップ９０４、９０５の処理を該要素の全文字
列について繰り返す（ステップ９０６）。次にステップ
９０７で登録対象文書の全要素についてステップ９０２
から９０６の処理を終了したか調べ、まだ未処理の要素
が存在したらステップ９０２以降の処理を繰り返す。With respect to this character chain, information of a corresponding search unit identifier and a number (hereinafter referred to as a character position number) representing the number of the first character of the character chain in its element content is added to the character string index. (Step 90
5). The processing of steps 904 and 905 is repeated for all character strings of the element (step 906). Next, in Step 907, Step 902 is performed for all the elements of the document to be registered.
It is checked whether or not the processing from step 906 has been completed, and if there are any unprocessed elements, the processing from step 902 is repeated.

【００６５】全要素についてステップ９０２から９０６
の処理を終了したら、最後にここで作成した文字列索引
を文字列索引格納手段１１９に追加する（ステップ９０
８）。Steps 902 to 906 for all elements
Is completed, the character string index created here is finally added to the character string index storage unit 119 (step 90).
8).

【００６６】図１０は文字列索引作成手段１０９によっ
て図２の構造化文書のうち３行目の「<タイトル> 構造
化文書管理 < ／タイトル>」という要素について作成
した文字列索引の例の一部を示した図である。図１０の
１００１は「検索単位識別子が“１”の要素の文字列中
に“構造”という文字連鎖が先頭から“１”文字目の位
置から存在する」ということを表している。なお、図１
０は文字列索引の一部しか示していないが、実際は登録
対象文書の全要素の全文字列について文字列索引が作成
される。FIG. 10 shows an example of a character string index created by the character string index creating means 109 for the element “<title> structured document management </ title>” in the third line of the structured document of FIG. It is a figure showing a part. 1001 in FIG. 10 indicates that “a character chain“ structure ”exists from the position of the“ 1 ”character from the top in the character string of the element whose search unit identifier is“ 1 ””. FIG.
Although 0 indicates only a part of the character string index, a character string index is actually created for all character strings of all elements of the registration target document.

【００６７】なお、この例では２文字ずつ文字連鎖を取
り出してそれぞれに文字列索引を作成しているが、この
文字連鎖は２文字ずつでなくても構わない。また、以上
の登録処理を登録対象文書が入力されるごとに繰り返す
ことにより、構造情報と文字列索引が追加されてゆく。In this example, a character chain is extracted two characters at a time and a character string index is created for each character chain. However, the character chain does not have to be two characters. The above-described registration process is repeated each time a document to be registered is input, whereby structure information and a character string index are added.

【００６８】なお、図５他において名称ＩＤ、パス名称
ＩＤおよびパス階層ＩＤは“Ｔ９”や“Ｎ１１”や“Ｌ
５”といった文字で表現しているが、これらはそれぞ
れ、名称（タグ名）を一意に特定するＩＤ、パス名称を
一意に特定するＩＤ、パス階層を一意に特定するＩＤで
あればどのようなものでも構わない。次に本実施の形態
における文書検索の処理の流れを具体例を示して説明す
る。In FIG. 5 and the like, the name ID, the path name ID, and the path hierarchy ID are “T9”, “N11”, “L”
These are represented by characters such as 5 ", which are IDs for uniquely specifying names (tag names), IDs for uniquely specifying path names, and IDs for uniquely specifying path hierarchies. Next, the flow of the document search process according to the present embodiment will be described with reference to a specific example.

【００６９】なお、以下に示す本実施の形態における文
書検索処理の説明においては、名称ＩＤテーブル、パス
名称インデックス、パス階層インデックス、要素管理テ
ーブルには、それぞれ図５、図６、図７、図８のような
データが格納されているものとして説明を行なう。In the following description of the document search processing according to the present embodiment, the name ID table, the path name index, the path hierarchy index, and the element management table are shown in FIGS. 5, 6, 7, and 10, respectively. The description will be made assuming that data such as 8 is stored.

【００７０】まず検索条件入力手段１０３を通して、端
末１０１から「パス名称が“／論文／書誌／タイトル”
である要素に、“構造化”という文字列が含まれる文
書」という条件が与えられたとする。First, from the terminal 101 via the search condition input means 103, the message "Path name is" / article / bibliography / title ""
It is assumed that a condition that “a document including a character string“ structured ”is included in an element” is given.

【００７１】図１１は検索条件解析手段１１０の処理の
流れを示した図である。ここでの例は、検索条件の構造
指定としてパス名称のみ指定されているので、図１１の
Ｃａｓｅ３に該当する。Ｃａｓｅ３ではステップ１１０
２で、パス名称インデックス格納手段１１６に格納され
ているパス名称インデックスを参照し、検索条件のパス
名称をパス名称ＩＤに変換する。パス名称インデックス
が図６の場合、検索条件のパス名称“／論文／書誌／タ
イトル”は、パス名称ＩＤ“Ｎ２”に変換される。FIG. 11 is a diagram showing the flow of processing of the search condition analysis means 110. This example corresponds to Case 3 in FIG. 11 because only the path name is specified as the structure specification of the search condition. Step 110 in Case 3
In step 2, the path name of the search condition is converted into a path name ID by referring to the path name index stored in the path name index storage means 116. When the path name index is shown in FIG. 6, the path name “/ article / bibliography / title” of the search condition is converted into the path name ID “N2”.

【００７２】次に文字列索引検索手段１１１で、検索条
件の文字列について検索処理を行なう。図１２は文字列
索引検索手段１１１での処理を図に示したものである。
ここでの例では検索条件の文字列は“構造化”であり、
これは２文字ずつの文字連鎖として“構造”と“造化”
が取り出せる。ここで取り出す文字連鎖の文字数は、文
字列索引作成手段１０９で作成する文字連鎖の文字数と
同一とする。この２つの文字連鎖について図１２の１２
１０に示すような文字列索引が作成されているとして、
この中から検索単位識別子が同一で、かつ“構造”の文
字連鎖から“造化”の文字連鎖に対して文字位置番号が
連続しているものを文字列索引検索手段１１１の結果と
して抽出する。図１２の例では検索単位識別子が同一な
ものとして１２２１、１２２２、１２２３を取り出すこ
とが出来る。更にその中で文字位置番号が連続している
のは１２２１と１２２３であり、これらの検索単位識別
子を抽出する。Next, the character string index search means 111 performs a search process on the character string of the search condition. FIG. 12 shows the processing in the character string index search means 111.
In this example, the search condition string is "structured"
This is a "structure" and "formulation" as a character chain of two characters each.
Can be taken out. The number of characters in the character chain to be taken out here is the same as the number of characters in the character chain created by the character string index creation means 109. Regarding these two character chains, 12 in FIG.
Assuming that a character string index is created as shown in FIG.
Among them, those having the same search unit identifier and having consecutive character position numbers from the character chain of “structure” to the character chain of “formulation” are extracted as a result of the character string index search means 111. In the example of FIG. 12, 1221, 1222, and 1223 can be extracted as those having the same search unit identifier. Among them, the character position numbers are continuous at 1221 and 1223, and these search unit identifiers are extracted.

【００７３】次に構造照合手段１１２で、文字列索引検
索手段１１１で得られた検索単位識別子群の中から、検
索条件の構造指定を満たす最終的な検索結果を求める。
図１３は、構造照合手段１１２の処理の流れを示した図
である。図１３におけるＣａｓｅ１からＣａｓｅ４は、
図１１の検索条件の構造指定パターンＣａｓｅ１からＣ
ａｓｅ４と同様である。ここでの例ではＣａｓｅ３（パ
ス名称のみ指定）であるので、ステップ１３０３でパス
名称の照合を行なう。図１４はこの例における構造照合
処理の詳細を示す図である。まず文字列索引検索手段１
１１で得られた検索単位識別子（１４０１）をキーとし
て要素管理テーブルを参照する。そこで該検索単位識別
子のパス名称ＩＤが、検索条件解析手段１１０で求めた
検索条件のパス名称ＩＤ（この例では“Ｎ２”）と一致
するものだけを最終的な検索結果とする。Next, the structure matching means 112 obtains a final search result satisfying the structure specification of the search condition from the search unit identifier group obtained by the character string index search means 111.
FIG. 13 is a diagram showing a flow of processing of the structure matching unit 112. Case 1 to Case 4 in FIG.
The structure specification patterns Case1 to C of the search condition in FIG.
Same as case4. In this example, since it is Case 3 (only the path name is specified), the path name is collated in step 1303. FIG. 14 is a diagram showing details of the structure matching process in this example. First, character string index search means 1
The element management table is referenced using the search unit identifier (1401) obtained in step 11 as a key. Therefore, only the path name ID of the search unit identifier that matches the path name ID (“N2” in this example) of the search condition obtained by the search condition analysis means 110 is determined as the final search result.

【００７４】なお、本実施の形態では検索条件の構造指
定として、タグ名を指定した検索（Ｃａｓｅ１）、タグ
名とその出現順序を指定した検索（Ｃａｓｅ２）、パス
名称とパス階層を指定した検索（Ｃａｓｅ４）にも対応
可能である。以下でそれぞれＣａｓｅでの処理について
簡潔に説明する。In this embodiment, as the structure specification of the search condition, a search specifying a tag name (Case 1), a search specifying a tag name and its appearance order (Case 2), a search specifying a path name and a path hierarchy are performed. (Case 4) is also possible. The processing in Case will be briefly described below.

【００７５】タグ名を指定した検索（Ｃａｓｅ１）の場
合、まず図１１より検索条件解析手段１１０にて、検索
条件のタグ名を名称ＩＤに変換する（ステップ１１０
１）。In the case of a search (Case 1) in which a tag name is designated, first, the search condition analyzing means 110 converts the tag name of the search condition into a name ID from FIG. 11 (step 110).
1).

【００７６】次にＣａｓｅ３と同様に、文字列索引検索
手段１１１にて検索条件の文字列について検索処理を行
ない、該当する検索単位識別子群を求める。最後に図１
３より構造照合手段１１２にて、文字列索引検索手段１
１１で求めた検索単位識別子群のうち、名称ＩＤがステ
ップ１１０１で求めた名称ＩＤと一致するものだけを、
要素管理テーブルを元に抽出し（ステップ１３０１）、
最終的な検索結果とする。Next, similarly to Case 3, the character string index search means 111 performs a search process on the character string of the search condition, and obtains a corresponding search unit identifier group. Finally Figure 1
3, the character string index search means 1
11, only those whose name ID matches the name ID obtained in step 1101,
Extracted based on the element management table (step 1301),
Make it the final search result.

【００７７】タグ名とその出現順序を指定した検索（Ｃ
ａｓｅ２）の場合、Ｃａｓｅ１と同様な処理を行なった
後、最後に出現順序照合処理（図１３のステップ１３０
２）を行なう。ステップ１３０２では、該検索単位識別
子のパス階層ＩＤをキーとしてパス階層インデックスを
参照し、末端階層の出現順序が検索条件の出現順序と一
致するものだけを抽出し、最終的な検索結果とする。A search specifying a tag name and its appearance order (C
In case 2), after performing the same processing as in Case 1, finally, the appearance order collation processing (step 130 in FIG. 13)
Perform 2). In step 1302, the path hierarchy index is referred to using the path hierarchy ID of the search unit identifier as a key, and only those whose appearance order of the terminal hierarchy matches the appearance order of the search condition are extracted as a final search result.

【００７８】パス名称とパス階層を指定した検索（Ｃａ
ｓｅ４）の場合、検索条件解析手段１１０でＣａｓｅ３
と同様にステップ１１０２の処理を行なった後、検索条
件のパス階層をパス階層インデックスを用いてパス階層
ＩＤへの変換を行なう（ステップ１１０３）。次にＣａ
ｓｅ３と同様に、文字列索引検索手段１１１にて検索条
件の文字列について検索処理を行ない、該当する検索単
位識別子群を求める。A search specifying a path name and a path hierarchy (Ca
In case of se4), the search condition analyzing means 110 uses Case3.
After the processing of step 1102 is performed in the same manner as described above, the path hierarchy of the search condition is converted into the path hierarchy ID using the path hierarchy index (step 1103). Next, Ca
Similarly to se3, the character string index search means 111 performs a search process on the character string of the search condition, and obtains a corresponding search unit identifier group.

【００７９】最後に構造照合手段１１２にて、Ｃａｓｅ
３と同様にパス名称ＩＤ照合処理（ステップ１３０３）
を行なった後、パス階層ＩＤ照合処理（ステップ１３０
４）を行なう。ステップ１３０４では、該検索単位識別
子のパス階層ＩＤがステップ１１０３で変換したパス階
層ＩＤと一致するものだけを抽出し、最終的な検索結果
とする。Finally, the structure matching means 112
Path name ID collation processing as in step 3 (step 1303)
Is performed, the path hierarchy ID collation processing (step 130)
Perform 4). In step 1304, only those whose path hierarchy ID of the search unit identifier matches the path hierarchy ID converted in step 1103 are extracted as the final search result.

【００８０】最後に検索結果作成・表示処理について説
明する。結果作成手段１１３は検索結果として得られた
文書の書誌情報（タイトル、著者、日付など）を結果一
覧表示用のデータとして、一覧データ格納手段１２１に
格納する。このデータを結果表示手段１０４を通して端
末１０１に表示する。次に端末１０１から実体表示要求
としてこの検索結果一覧の中からどれか１つの文書が選
択されると、結果作成手段１１１が実体データ格納手段
１１５から指定された文書の実体を取得し、結果表示手
段１０４を通して端末１０１に表示する。なお、構造解
析手段１０７によって要素に分割された単位で、登録対
象文書を実体データ格納手段１２０に登録しておくこと
により、検索結果作成・表示処理において要素毎の結果
一覧の作成、および要素毎の実体取得も可能である。Finally, the search result creation / display process will be described. The result creating unit 113 stores the bibliographic information (title, author, date, etc.) of the document obtained as a search result in the list data storage unit 121 as data for displaying the result list. This data is displayed on the terminal 101 through the result display means 104. Next, when one of the documents is selected from the search result list as the entity display request from the terminal 101, the result creating unit 111 acquires the entity of the specified document from the entity data storage unit 115, and displays the result. The information is displayed on the terminal 101 through the means 104. By registering the registration target document in the entity data storage unit 120 in units divided into elements by the structure analysis unit 107, it is possible to create a result list for each element in the search result creation / display processing, and It is also possible to obtain the entity.

【００８１】以上のように本実施の形態では、構造化文
書の論理構造情報を要素管理テーブル格納手段１１５、
パス名称インデックス格納手段１１６、パス階層インデ
ックス格納手段１１７、名称ＩＤテーブル格納手段１１
８の４つに分けて格納し、文字列索引内部にこれら論理
構造に関する情報を含めないことにより、文字列索引の
サイズ縮小を可能とする。更に文書の特定の要素内容の
追加、変更、削除を行なう際に、追加、変更、削除によ
り論理構造の変化の発生した検索単位識別子のレコード
について、要素管理テーブルの変更処理を行なうだけで
済むため、文字列索引内部に論理構造に関する情報を含
める方法と比較して、処理量の大幅な軽減が可能とな
る。（文字列索引内部に論理構造に関する情報を含める
方法の場合、追加、変更、削除により、論理構造の変化
が発生した要素に関する全文字連鎖の文字列索引に対し
て修正処理が発生するため。）具体例を以下に示す。図
１５は図３の構造をした文書の第１章第１節と第１章第
２節の間に１５０１に示すノード群を追加した例であ
る。この場合、１５０２のノードは第１章第２節から第
１章第３節へと変更しなくてはならない。この時本実施
の形態の方法では、既登録のデータに関しては、要素管
理テーブルにおける検索単位識別子１０、および１１の
レコードのパス名称ＩＤとパス階層ＩＤを変更するだけ
で済む。一方、文字列索引内部に論理構造に関する情報
を含める方法の場合、検索単位識別子１０および１１の
要素の全文字連鎖の文字列索引に対して論理構造情報の
変更を行なわなくてはならない。（仮に、検索単位識別
子１０の要素の内容が１００文字であったとすると、２
文字連鎖で索引を作成している場合、９９個の文字連鎖
の文字列索引に対して変更が必要となる）。As described above, in the present embodiment, the logical structure information of the structured document is stored in the element management table storage means 115,
Path name index storage means 116, path hierarchy index storage means 117, name ID table storage means 11
8 is stored separately, and information on these logical structures is not included in the character string index, thereby making it possible to reduce the size of the character string index. Further, when adding, changing, or deleting specific element contents of a document, it is only necessary to change the element management table for the record of the search unit identifier in which the logical structure has changed due to the addition, change, or deletion. Thus, the amount of processing can be significantly reduced as compared with the method of including information on the logical structure in the character string index. (In the case of the method of including information about the logical structure in the character string index, addition, change, and deletion require modification of the character string index of the entire character chain for the element in which the logical structure has changed.) Specific examples are shown below. FIG. 15 shows an example in which a node group 1501 is added between the first chapter and the first section of the document having the structure shown in FIG. In this case, the node 1502 must be changed from Chapter 1 Section 2 to Chapter 1 Section 3. At this time, in the method according to the present embodiment, for the registered data, it is only necessary to change the path name ID and the path hierarchy ID of the records of the search unit identifiers 10 and 11 in the element management table. On the other hand, in the case of including the information on the logical structure in the character string index, the logical structure information must be changed for the character string index of the entire character chain of the elements of the search unit identifiers 10 and 11. (Assuming that the content of the element of the search unit identifier 10 is 100 characters, 2
If the index is created by a character chain, the character string index of the 99 character chains needs to be changed).

【００８２】また、本実施の形態では要素の論理構造位
置を特定するためのＩＤをパス名称ＩＤとパス階層ＩＤ
の２つに分けているため、論理構造が複雑かつ膨大にな
った場合でも、公知例のように１種類のＩＤ（文脈識別
子）で論理構造位置を特定する方法と比較して、ＩＤの
総数を少なく押さえることが可能となる。In the present embodiment, the ID for specifying the logical structure position of an element is a path name ID and a path hierarchy ID.
Therefore, even if the logical structure is complicated and enormous, the total number of IDs can be compared with the method of specifying the logical structure position using one type of ID (context identifier) as in a known example. Can be reduced.

【００８３】なお、本実施の形態では１文書の構造化文
書の登録、検索について説明したが、複数文書の場合で
も同様の処理で実現が可能である。また本実施の形態で
は、一種類のＤＴＤにおけるパス名称ＩＤの作成方法に
ついて説明したが、本システムに複数の異なるＤＴＤの
文書の登録要求が起こった場合においても、各ノードに
個別なパス名称ＩＤを割当てることにより、論理構造を
指定した検索が実現可能である。また、要素管理テーブ
ル、パス名称インデックス、パス階層インデックス、名
称ＩＤテーブルを一次記憶上に持つことにより、構造照
合手段１１２の高速化が可能である。In the present embodiment, registration and retrieval of a structured document of one document have been described. However, the same processing can be realized for a plurality of documents. In this embodiment, a method of creating a path name ID in one type of DTD has been described. However, even when a request to register a plurality of different DTD documents occurs in the present system, individual path name IDs are assigned to each node. , A search specifying a logical structure can be realized. In addition, by having the element management table, the path name index, the path hierarchy index, and the name ID table on the primary storage, the speed of the structure matching unit 112 can be increased.

【００８４】また本実施の形態は、構造化文書の管理を
目的とする装置について説明を行ったが、必ずしも構造
化文書に限らず、木構造で表現可能なデータを管理する
ために上述のパス名称インデックス及びパス階層インデ
ックスを利用して実体要素（データの実体）を管理する
ことも可能である。In this embodiment, an apparatus for managing structured documents has been described. However, the present invention is not limited to structured documents, and the above-described path is used for managing data that can be expressed in a tree structure. It is also possible to manage entity elements (entity of data) using the name index and the path hierarchy index.

【００８５】さらに実施の形態１は、装置として実現す
る例を示したが、その他に汎用計算機に本実施の形態に
開示した構造化文書管理装置として機能するプログラム
をインストールすることによっても実現することが可能
である。Further, although the first embodiment has been described as an example in which the present invention is realized as an apparatus, the present invention may also be realized by installing a program which functions as a structured document management apparatus disclosed in this embodiment in a general-purpose computer. Is possible.

【００８６】（実施の形態２）以下、本発明の実施の形
態２について説明する。図１６は実施の形態２における
構造化文書管理装置の構成図である。実施の形態１の構
成図である図１と異なるのは、データ格納部１０６にパ
ス名称ＩＤ照合テーブル格納手段１６０１、パス階層Ｉ
Ｄ照合テーブル格納手段１６０２を新たに備えていると
ころである。またそれに伴い、検索条件解析手段１１
０、および構造照合手段１１２の処理が実施の形態１と
は異なる。(Embodiment 2) Embodiment 2 of the present invention will be described below. FIG. 16 is a configuration diagram of the structured document management device according to the second embodiment. The difference from FIG. 1 which is the configuration diagram of the first embodiment is that the data storage unit 106 has a path name ID collation table storage unit 1601 and a path hierarchy I
The D collation table storage means 1602 is newly provided. In addition, the search condition analysis means 11
0 and the processing of the structure matching means 112 are different from those of the first embodiment.

【００８７】パス名称ＩＤ照合テーブル格納手段１６０
１は、各パス名称ＩＤが検索条件の構造指定の範囲内に
あるかどうかの情報が格納される。Path name ID collation table storage means 160
1 stores information as to whether each path name ID is within the range of the structure specification of the search condition.

【００８８】パス階層ＩＤ照合テーブル格納手段１６０
２は、各パス階層ＩＤが検索条件の構造指定の範囲内に
あるかどうかの情報が格納される。Path hierarchy ID collation table storage means 160
2 stores information on whether or not each path hierarchy ID is within the range of the structure specification of the search condition.

【００８９】実施の形態２における目的は、実施の形態
１における検索条件の構造指定パターンＣａｓｅ１から
Ｃａｓｅ４以外の構造指定に対応することである。Ｃａ
ｓｅ１からＣａｓｅ４はタグ名やパス名称などで指定さ
れた末端要素そのものに対して検索を行なうものであ
る。実施の形態２で実現する検索は、実体を持たない中
間ノード以下を指定した検索である。例えば、「“章”
以下に“管理”という文字列を含む文書を検索する」と
いった検索条件に対応することを目的とする。The purpose of the second embodiment is to cope with a structure specification other than the structure specification patterns Case 1 to Case 4 of the search condition in the first embodiment. Ca
From se1 to Case4, a search is performed on the terminal element itself specified by a tag name, a path name, or the like. The search realized in the second embodiment is a search in which an intermediate node having no entity is specified. For example, "Chapter"
The purpose of the present invention is to meet a search condition such as “search for a document including a character string“ management ”below”.

【００９０】実施の形態２における登録処理は、実施の
形態１と同様であるため説明を省略する。The registration processing in the second embodiment is the same as that in the first embodiment, and a description thereof will be omitted.

【００９１】次に実施の形態２における検索処理の流れ
を具体例を示して説明する。なお、以下に示す本実施の
形態における文書検索処理の説明においては、名称ＩＤ
テーブル、パス名称インデックス、パス階層インデック
ス、要素管理テーブルには、それぞれ図５、図６、図
７、図８のようなデータが格納されているものとして説
明を行なう。Next, the flow of the search processing in the second embodiment will be described with a specific example. In the following description of the document search process according to the present embodiment, the name ID
The description will be made assuming that the data shown in FIGS. 5, 6, 7, and 8 are stored in the table, the path name index, the path hierarchy index, and the element management table, respectively.

【００９２】まず、検索条件入力手段１０３を通して、
端末１０１から「パス名称が“／論文／本文／章”であ
る中間ノード以下である要素に、“管理”という文字列
が含まれる文書」という条件が与えられたとする。First, through the search condition input means 103,
It is assumed that a condition that “a document including a character string“ management ”in an element whose path name is equal to or lower than an intermediate node whose path name is“ / article / text / chapter ”” is given from the terminal 101.

【００９３】図１７は実施の形態２における検索条件解
析手段１１０の処理の流れを示した図である。ここでの
例では検索条件の構造指定としてパス名称以下が指定さ
れているので、図１７のＣａｓｅ７に該当する。Ｃａｓ
ｅ７ではステップ１１０２で、実施の形態１と同様に検
索条件のパス名称をパス名称ＩＤに変換する。パス名称
インデックスが図６の場合、検索条件のパス名称“／論
文／本文／章”はパス名称ＩＤ“Ｎ６”に変換される。
次にステップ１７０１でパス名称ＩＤ照合テーブルを作
成する。図１８はここでの検索条件の例におけるパス名
称ＩＤ照合テーブルの内容を示す図である。このパス名
称ＩＤ照合テーブルは、検索要求ごとに作成し、パス名
称インデックスの全パス名称ＩＤについて、検索条件で
指定された範囲内のパス名称ＩＤと範囲外のパス名称Ｉ
Ｄを即座に判断するために作成する。この例の場合、図
６のパス名称インデックスよりパス名称ＩＤ“Ｎ６”以
下にあるパス名称ＩＤ“Ｎ７、Ｎ８、Ｎ９、Ｎ１０、Ｎ
１１”が範囲内で、それ以外は範囲外となる。FIG. 17 is a diagram showing a flow of processing of the search condition analyzing means 110 according to the second embodiment. In this example, since a path name or less is specified as the structure specification of the search condition, this corresponds to Case 7 in FIG. Cas
In e7, in step 1102, the path name of the search condition is converted into a path name ID as in the first embodiment. When the path name index is as shown in FIG. 6, the path name “/ article / text / chapter” of the search condition is converted into the path name ID “N6”.
Next, in step 1701, a path name ID collation table is created. FIG. 18 is a diagram showing the contents of the path name ID collation table in the example of the search condition here. This path name ID collation table is created for each search request, and for all path name IDs in the path name index, a path name ID within the range specified by the search condition and a path name I outside the range are specified.
Created to determine D immediately. In the case of this example, the path name IDs “N7, N8, N9, N10, N” that are below the path name ID “N6” from the path name index in FIG.
11 "is within the range, and the others are outside the range.

【００９４】次に文字列索引検索手段１１１で、検索条
件の文字列について検索処理を行なう。処理手順は実施
の形態１と同様であるため省略するが、ここでの例であ
る“管理”という文字列で検索した結果として、検索単
位識別子“１”と“９”が得られたものとして、説明を
続ける。Next, the character string index search means 111 performs a search process on the character string of the search condition. Although the processing procedure is the same as that of the first embodiment, the description is omitted. However, as a result of the search using the character string “management” in this example, the search unit identifiers “1” and “9” are obtained. , Continue explanation.

【００９５】次に構造照合手段１１２で、文字列索引検
索手段１１１で得られた検索単位識別子群の中から、検
索条件の構造指定を満たす最終的な検索結果を求める。
図１９は実施の形態２における構造照合手段１１２の処
理の流れを示した図である。Next, the structure matching means 112 obtains a final search result which satisfies the structure specification of the search condition from the search unit identifier group obtained by the character string index search means 111.
FIG. 19 is a diagram showing a flow of processing of the structure matching means 112 according to the second embodiment.

【００９６】図１９におけるＣａｓｅ５からＣａｓｅ８
というのは、図１７の検索条件の構造指定パターンＣａ
ｓｅ５からＣａｓｅ８と同様である。ここでの例では、
Ｃａｓｅ７（パス名称以下を指定）であるので、ステッ
プ１３０３のパス名称ＩＤ照合処理を行なう。ただし、
Ｃａｓｅ７におけるパス名称ＩＤ照合処理は、パス名称
ＩＤ照合テーブルを用いて照合を行なう。図２０はこの
例における構造照合処理の詳細を示す図である。まず文
字列索引検索手段１１１で得られた検索単位識別子群
（２００１）をキーとして要素管理テーブルを参照す
る。そこで該検索単位識別子のパス名称ＩＤからパス名
称ＩＤ照合テーブルを参照し、照合フラグが“１”（範
囲内）であるものだけを最終的な検索結果とする。Case 5 to Case 8 in FIG.
This is because the search condition structure designation pattern Ca shown in FIG.
This is the same as from case 5 to case 8. In our example,
Since it is Case 7 (specify the path name or less), the path name ID collation processing in step 1303 is performed. However,
The path name ID collation processing in Case 7 performs collation using a path name ID collation table. FIG. 20 is a diagram showing details of the structure matching process in this example. First, the element management table is referenced using the search unit identifier group (2001) obtained by the character string index search means 111 as a key. Therefore, the path name ID collation table is referred to from the path name ID of the search unit identifier, and only those whose collation flag is “1” (within the range) are determined as the final search results.

【００９７】なお、本実施の形態では、検索条件の構造
指定として、タグ名で指定された中間ノード以下に対す
る検索（Ｃａｓｅ５）、タグ名とその出現順序で指定さ
れた中間ノード以下に対する検索（Ｃａｓｅ６）、パス
名称とパス階層で指定された中間ノード以下に対する検
索（Ｃａｓｅ８）にも対応可能である。以下でそれぞれ
Ｃａｓｅでの処理について簡潔に説明する。In the present embodiment, as the structure specification of the search condition, the search for the intermediate node or less specified by the tag name (Case 5), and the search for the intermediate node or less specified by the tag name and its appearance order (Case 6) ), A search (Case 8) for intermediate nodes and below specified by a path name and a path hierarchy is also possible. The processing in Case will be briefly described below.

【００９８】タグ名で指定された中間ノード以下に対す
る検索（Ｃａｓｅ５）の場合、検索条件解析手段１１０
と文字列索引検索手段１１１における処理は、実施の形
態１のＣａｓｅ１と同様であるため省略する。最後に図
１９より構造照合手段１１２にて構造指定のチェックを
行なう。ここでステップ１９０１のパス名称ＩＤ作成・
更新・照合処理について説明する。図２１はパス名称Ｉ
Ｄ作成・更新・照合処理の流れを示したフローチャート
であり、このフローチャートに沿って説明する。In the case of the search (Case 5) for the intermediate node and below specified by the tag name, the search condition analysis means 110
The processing in the character string index search means 111 is the same as in Case 1 of the first embodiment, and a description thereof will not be repeated. Finally, the structure designating means 112 checks the structure designation from FIG. Here, the path name ID creation and
The update / collation processing will be described. FIG. 21 shows the path name I
6 is a flowchart showing the flow of a D creation / update / collation process, and will be described with reference to this flowchart.

【００９９】まずパス名称ＩＤ照合テーブルの照合フラ
グを“０”（未定）で初期化しておく（ステップ３１０
１）。次に文字列索引検索手段１１１で求めた検索単位
識別子群それぞれについて以下の処理を繰り返す。まず
検索単位識別子を取得し（ステップ３１０２）、該検索
単位識別子のパス名称ＩＤ（要素管理テーブルより取
得）の照合フラグを参照（ステップ３１０３）し、該照
合フラグが“１”（範囲内）であれば（ステップ３１０
４）、該検索単位識別子を最終的な検索結果に含める
（ステップ３１０５）。照合フラグが“２”（範囲外）
であれば（ステップ３１０６）、該検索単位識別子は最
終的な検索範囲に含めない（ステップ３１０７）。照合
フラグが“０”（未定）であったら、該検索単位識別子
のパス名称ＩＤをキーとしてパス名称インデックスを参
照し（ステップ３１０８）、検索条件解析手段１１０の
ステップ１１０１で求めた名称ＩＤと一致するか、もし
くは、たどったノードのパス名称ＩＤの照合フラグが
“１”（範囲内）の場合（ステップ３１０９）、該検索
単位識別子のパス名称ＩＤと、そこまでたどったパス名
称ＩＤ全てに対して、パス名称ＩＤ照合テーブルの照合
フラグを１に設定し（ステップ３１１０）、該検索単位
識別子を最終的な検索結果に含める。First, the collation flag of the path name ID collation table is initialized to “0” (undecided) (step 310).
1). Next, the following processing is repeated for each search unit identifier group obtained by the character string index search means 111. First, a search unit identifier is acquired (step 3102), and a collation flag of the path name ID (acquired from the element management table) of the retrieval unit identifier is referenced (step 3103). If the collation flag is "1" (within the range), If there is (Step 310
4), the search unit identifier is included in the final search result (step 3105). Matching flag is "2" (out of range)
If (Step 3106), the search unit identifier is not included in the final search range (Step 3107). If the collation flag is "0" (undecided), the path name index is referenced using the path name ID of the search unit identifier as a key (step 3108), and matches the name ID obtained in step 1101 of the search condition analysis means 110. Or, if the collation flag of the path name ID of the traced node is "1" (within the range) (step 3109), the path name ID of the search unit identifier and all the path name IDs traced therefrom are compared. Then, the collation flag in the path name ID collation table is set to 1 (step 3110), and the search unit identifier is included in the final search result.

【０１００】逆に、たどったノードのパス名称ＩＤの照
合フラグが“２”（範囲外）の場合（ステップ３１１
１）、該検索単位識別子のパス名称ＩＤと、そこまでた
どったパス名称ＩＤ全てに対して、パス名称ＩＤ照合テ
ーブルの照合フラグを“２”（範囲外）に設定し（ステ
ップ３１１２）、該検索単位識別子を最終的な検索結果
に含めない。Conversely, when the collation flag of the path name ID of the traced node is "2" (out of range) (step 311)
1) For the path name ID of the search unit identifier and all of the path name IDs that have been traced to it, the collation flag in the path name ID collation table is set to “2” (out of range) (step 3112). Do not include the search unit identifier in the final search results.

【０１０１】さらに、たどったノードのパス名称ＩＤの
照合フラグが“０”（未定）の場合は、１階層登り（ス
テップ３１１３）、ルートノードであるか否かを判定し
（ステップ３１１４）し、ルートノードでなければ、再
びステップ３１０８に戻る。ルートノードである場合
は、該検索単位識別子のパス名称ＩＤと、それまでたど
ったパス名称ＩＤ全ての照合フラグを２“範囲外”に設
定する（ステップ３１１２）。Further, if the collation flag of the path name ID of the traced node is “0” (undecided), it goes up one level (step 3113) and judges whether or not it is the root node (step 3114). If it is not the root node, the process returns to step 3108 again. If it is the root node, the collation flags of the path name ID of the search unit identifier and all the path name IDs that have been traced so far are set to 2 "out of range" (step 3112).

【０１０２】次の該当検索単位識別子が存在するか否か
をチェックし（ステップ３１１５）、存在する場合は、
ステップ３１０２へ戻る。存在しない場合は、本処理を
終了する。It is checked whether or not the next applicable search unit identifier exists (step 3115).
It returns to step 3102. If not, the process ends.

【０１０３】このように徐々に各パス名称ＩＤが検索条
件の範囲内にあるかどうかのパス名称ＩＤ照合テーブル
が学習されていくため、別の検索単位識別子に対してパ
ス名称ＩＤの照合を行なう際に、すでに範囲内であると
判明している（照合フラグが“１”である）パス名称Ｉ
Ｄであった場合、該検索単位識別子を即座に最終的な検
索結果に含ませることが可能となる。As described above, since the path name ID collation table for gradually determining whether each path name ID is within the range of the search condition is learned, the path name ID is collated with another search unit identifier. At this time, the path name I already determined to be within the range (the collation flag is “1”)
In the case of D, the search unit identifier can be immediately included in the final search result.

【０１０４】なお上記ステップ３１０１からステップ３
１１５までの処理については、汎用計算機に上記ステッ
プの処理を実現するプログラムをインストールすること
により実現することが可能である。Note that steps 3101 through 3
The processing up to 115 can be realized by installing a program for realizing the processing of the above steps in a general-purpose computer.

【０１０５】また上記実施の形態では、構造化文書にお
いて中間ノード以下を検索範囲に指定した場合に、検索
範囲に含まれるノードを決定する例を示したが、構造化
文書に限らず、その他木構造で表現できるデータについ
ても同様に適用することが可能である。In the above-described embodiment, an example has been described in which, when a search range below the intermediate node is specified in a structured document, nodes included in the search range are determined. The same applies to data that can be represented by a structure.

【０１０６】タグ名とその出現順序で指定された中間ノ
ード以下に対する検索（Ｃａｓｅ６）の場合、検索条件
解析手段１１０、文字列索引検索手段１１１、および構
造照合手段１１２のステップ１９０１まではＣａｓｅ５
と同様の処理を行なう。次にステップ１９０１でパス名
称ＩＤが範囲内にあった場合に限り、ステップ１９０２
のパス階層ＩＤ作成・更新・照合処理を行なう。図２２
はパス階層ＩＤ照合テーブルの例である。ステップ１９
０２ではステップ１９０１のパス名称ＩＤに関する処理
と同様に、パス階層ＩＤについて構造指定の範囲にある
かどうか学習していき、照合フラグが“１”のパス階層
ＩＤを持つ検索単位識別子を最終的な検索結果とする。In the case of the search (Case 6) for the intermediate node or lower designated by the tag name and its appearance order, the search condition analysis means 110, the character string index search means 111, and the structure collation means 112 up to Step 1901 are in Case 5
The same processing as is performed. Next, only when the path name ID is within the range in step 1901, step 1902
Perform the path hierarchy ID creation / update / collation processing. FIG.
Is an example of a path hierarchy ID collation table. Step 19
In step 02, similarly to the processing related to the path name ID in step 1901, it is learned whether or not the path hierarchy ID is within the range of the structure specification, and the search unit identifier having the path hierarchy ID whose collation flag is "1" is finally determined. Search results.

【０１０７】パス名称とパス階層で指定された中間ノー
ド以下に対する検索（Ｃａｓｅ８）の場合、検索条件解
析手段１１０では、Ｃａｓｅ７と同様な処理を行なった
あとに、ステップ１７０２にてパス階層ＩＤ照合テーブ
ルを作成する。このパス階層ＩＤ照合テーブルは、パス
階層インデックスにおいて、ステップ１１０３で求めた
パス階層ＩＤにあたるノードとそれ以下全てのノードの
パス階層ＩＤに対する照合フラグを“１”（範囲内）
に、それ以外を“２”（範囲外）に設定する。文字列索
引検索手段１１１での処理はＣａｓｅ７と同様であるた
め説明を省略する。In the case of a search (Case 8) for the intermediate node and below specified by the path name and the path hierarchy, the search condition analysis means 110 performs the same processing as in Case 7, and then executes the path hierarchy ID collation table in step 1702. Create In this path hierarchy ID collation table, in the path hierarchy index, the collation flag for the path hierarchy ID of the node corresponding to the path hierarchy ID obtained in step 1103 and all the nodes below it is set to "1" (within the range).
And the others are set to “2” (out of range). The processing in the character string index search means 111 is the same as in Case 7, and a description thereof will be omitted.

【０１０８】次に構造照合手段１１２において、Ｃａｓ
ｅ７と同様な処理を行なった後、ステップ１７０２にて
作成したパス階層ＩＤ照合テーブルを用いて、該検索単
位識別子のパス階層ＩＤの照合処理を行なう。ここでパ
ス階層ＩＤ照合テーブルの照合フラグが“１”であるパ
ス階層ＩＤを持つ検索単位識別子のみ、最終的な検索結
果とする。Next, in the structure collation means 112, Cas
After performing the same processing as in e7, the path hierarchy ID of the search unit identifier is collated using the path hierarchy ID collation table created in step 1702. Here, only the search unit identifier having the path hierarchy ID whose collation flag of the path hierarchy ID collation table is “1” is the final search result.

【０１０９】実施の形態２における検索結果作成・表示
処理は実施の形態１と同様であるため、説明を省略す
る。Since the search result creation / display processing in the second embodiment is the same as that in the first embodiment, the description is omitted.

【０１１０】以上のように本実施の形態では、中間ノー
ドを以下を指定した検索の際に、各パス名称ＩＤが検索
条件の構造指定の範囲内にあるかどうかの情報が格納さ
れるパス名称ＩＤ照合テーブルや、各パス階層ＩＤが検
索条件の構造指定の範囲内にあるかどうかの情報が格納
されるパス階層ＩＤ照合テーブルを作成し、構造照合処
理を行なうことにより、中間ノード以下を指定した高速
な検索を実現する。As described above, in the present embodiment, when searching for an intermediate node by specifying the following, the path name in which information as to whether or not each path name ID falls within the structure specification range of the search condition is stored. Create an ID collation table and a path hierarchy ID collation table that stores information on whether each path hierarchy ID is within the range of the structure specification of the search condition, and specify the intermediate nodes and below by performing the structure collation processing Realized high-speed search.

【０１１１】なお、図１６に示す実施の形態２の構成に
おいても、パス名称ＩＤ照合テーブル格納手段１６０
１、およびパス階層ＩＤ照合テーブル格納手段１６０２
を使用しないことにより、実施の形態１における検索条
件の構造指定Ｃａｓｅ１からＣａｓｅ４にも、対応可能
である。また本実施の形態の説明において、パス名称Ｉ
Ｄ照合テーブル、およびバス階層ＩＤ照合テーブルの照
合フラグの値を、範囲内の場合“１”、範囲外の場合
“２”、未定の場合“０”としていたが、この照合フラ
グの値は範囲内、範囲外、未定の状態を判別可能な値で
あればどのような値を割当てても構わない。Note that, in the configuration of the second embodiment shown in FIG.
1, and path hierarchy ID collation table storage means 1602
Is not used, it is possible to cope with the structure designations Case 1 to Case 4 of the search condition in the first embodiment. In the description of the present embodiment, the path name I
The value of the collation flag in the D collation table and the bus hierarchy ID collation table is “1” when the value is within the range, “2” when the value is out of the range, and “0” when the value is undecided. Any value may be assigned as long as the value can be used to determine whether the state is within, outside, or undetermined.

【０１１２】さらに実施の形態２は、装置として実現す
る例を示したが、その他に汎用計算機に本実施の形態に
開示した構造化文書管理装置として機能するプログラム
をインストールすることによっても実現することが可能
である。Further, although the second embodiment has been described as an example in which the present invention is realized as an apparatus, the present invention can also be realized by installing a program which functions as a structured document management apparatus disclosed in this embodiment in a general-purpose computer. Is possible.

【０１１３】（実施の形態３）以下、本発明の実施の形
態３について説明する。実施の形態３における構造化文
書管理装置の構成図は実施の形態１における図１、もし
くは実施の形態２における図１６と同様である。ただ
し、文字列索引作成手段１０９における文字列索引の作
成方法が実施の形態１および実施の形態２とは若干異な
り、それに伴い文字列索引検索手段１１１と構造照合手
段１１２における処理が実施の形態１および実施の形態
２とは異なる。Embodiment 3 Hereinafter, Embodiment 3 of the present invention will be described. The configuration diagram of the structured document management device in the third embodiment is the same as FIG. 1 in the first embodiment or FIG. 16 in the second embodiment. However, the method of creating a character string index in the character string index creating means 109 is slightly different from that in the first and second embodiments, and accordingly, the processing in the character string index searching means 111 and the structure matching means 112 is performed in the first embodiment. And is different from the second embodiment.

【０１１４】ここで実施の形態３における登録処理の流
れについて説明する。まず構造化文書入力手段１０２、
構造解析手段１０７、および構造情報作成手段１０８の
処理は、実施の形態１および実施の形態２と同様である
ため説明を省略する。Here, the flow of the registration process in the third embodiment will be described. First, the structured document input means 102,
The processing of the structure analysis unit 107 and the structure information creation unit 108 is the same as in the first and second embodiments, and therefore, the description is omitted.

【０１１５】図２３は実施の形態３における文字列索引
作成手段１０９の処理の流れである。ステップ９０１か
らステップ９０３までは実施の形態１および実施の形態
２と同様であるため説明を省略する。次に該要素がＭｉ
ｘｅｄＣｏｎｔｅｎｔを含むかどうか調べ（ステップ
２２０１）、含む場合はこのＭｉｘｅｄＣｏｎｔｅｎ
ｔに割当てられている検索単位識別子を取得する（ステ
ップ２２０２）。この「ＭｉｘｅｄＣｏｎｔｅｎｔ」
とは、要素実体の内部で、該要素の子要素として存在す
る、要素実体のことである。例えば、図２４の２３１０
のように、「段落」を表す要素の中で、更に「キーワー
ド」タグに囲まれた要素がＭｉｘｅｄＣｏｎｔｅｎｔで
ある。他の例としては、「強調」や「斜体」などがあ
り、検索する際には、これら「段落」と「キーワード」
の要素にまたがった文字列でも検索してヒットすること
が望まれる。そのためステップ２２０３で文字連鎖を取
り出す際に、ＭｉｘｅｄＣｏｎｔｅｎｔにまたがる文
字連鎖も抽出し、ＭｉｘｅｄＣｏｎｔｅｎｔにまたがる
文字連鎖の場合には、ステップ２２０４で文字列索引
に、文字連鎖１文字目の検索単位識別子と文字連鎖２文
字目の検索単位識別子と文字位置番号を格納する（以
下、このようなＭｉｘｅｄＣｏｎｔｅｎｔにまたがる
文字連鎖の文字列索引を、拡張文字列索引と呼ぶ）。こ
の場合の文字位置番号は、該文字連鎖先頭文字がＭｉｘ
ｅｄＣｏｎｔｅｎｔの外側の要素の中で何番目の文字
かを表す番号とする。ステップ９０６から９０８までの
処理は、実施の形態１および実施の形態２と同様である
ため説明を省略する。FIG. 23 shows the flow of processing of the character string index creation means 109 according to the third embodiment. Steps 901 to 903 are the same as those in the first and second embodiments, and a description thereof will be omitted. Next, the element is Mi
It is checked whether or not the content includes the xed content (step 2201).
The search unit identifier assigned to t is obtained (step 2202). This "Mixed Content"
Is an element entity that exists inside the element entity as a child element of the element. For example, 2310 in FIG.
, Among the elements representing the “paragraph”, the element further surrounded by the “keyword” tag is the MixedContent. Other examples include "emphasis" and "italics", and when searching, these "paragraphs" and "keywords"
It is desirable to search and hit even a character string that spans the elements. Therefore, when the character chain is extracted in step 2203, the character chain extending over the Mixed Content is also extracted. The search unit identifier of the second character in the chain and the character position number are stored (hereinafter, such a character string index of a character chain spanning Mixed Content is referred to as an extended character string index). In this case, the character position number is such that the leading character of the character chain is Mix.
It is a number indicating the number of the character in the element outside the ed Content. The processing from steps 906 to 908 is the same as in the first and second embodiments, and a description thereof will be omitted.

【０１１６】次にＭｉｘｅｄＣｏｎｔｅｎｔを含む要
素の文字列索引の作成例について、図２４を用いて説明
する。図２４の２３１０に示すように、「段落」の中に
「キーワード」タグで囲まれたＭｉｘｅｄＣｏｎｔｅ
ｎｔを含み、「キーワード」タグの要素の検索単位識別
子は“１０１”、「段落」タグの要素の検索単位識別子
は“１０２”が割当てられているものとして説明する。
この例の場合に作成される文字列索引を図示したものが
２３２０である。この例の場合、“を検”（２３２１）
と“索す”（２３２３）の文字連鎖がＭｉｘｅｄＣｏ
ｎｔｅｎｔにまたがっており、文字連鎖１文字目と文字
連鎖２文字目の、２個の検索単位識別子が文字列索引に
格納される。なお、図２４の２３２０は文字列索引の一
部しか示されていないが、実際は登録対象文書の全要素
の全文字列について文字列索引が作成される。Next, an example of creating a character string index of an element including Mixed Content will be described with reference to FIG. As shown by reference numeral 2310 in FIG. 24, Mixed Content enclosed by “keyword” tags in “paragraph”
The description will be made on the assumption that the search unit identifier of the element of the “keyword” tag is “101” and the search unit identifier of the element of the “paragraph” tag is “102”.
2320 shows a character string index created in the case of this example. In the case of this example, “detect” (2321)
And the character chain of “find” (2323)
The first character string and the second character string of the character chain are stored in the character string index. Note that although 2320 in FIG. 24 shows only a part of the character string index, a character string index is actually created for all character strings of all elements of the registration target document.

【０１１７】次に実施の形態３における文書検索の処理
の流れについて説明する。まず検索条件入力手段１０
３、検索条件解析手段１１０における処理は実施の形態
１および実施の形態２と同様であるため説明を省略す
る。次に文字列索引検索手段１１１における処理につい
てだが、基本的には実施の形態１および実施の形態２と
同様である。ただし実施の形態３では、文字列索引作成
手段１０９において、ＭｉｘｅｄＣｏｎｔｅｎｔにま
たがる文字連鎖の場合、文字連鎖１文字目と文字連鎖２
文字目の、２個の検索単位識別子含む拡張文字列索引を
作成しているため、この拡張文字列索引が絡む場合の検
索処理が新たに必要となる。以下、その具体例について
図２４を用いて説明する。検索文字列が“検索する”で
ある場合、２３１０の要素に該当する文字連鎖の文字列
索引として２３２２、２３２３、２３２４が得られる。
ここで２３２２の検索単位識別子と、拡張文字列索引で
ある２３２３の文字連鎖１文字目検索単位識別子が“１
０１”で一致する。更に文字位置番号が“４”と“５”
で連続している。また、拡張文字列索引２３２３の文字
連鎖２文字目検索単位識別子と２３２４の検索単位識別
子が“１０２”で一致し、更に文字位置番号が“５”と
“６”で連続している。このような場合に文字連鎖２３
３３から２３２４にかけて文字列検索にヒットしたこと
になる。その際、文字列検索結果の検索単位識別子とし
て、検索文字列の先頭文字および末端文字に該当するの
検索単位識別子のセットを返す。ここでの例の場合、先
頭文字検索単位識別子“１０１”、末尾文字検索単位識
別子“１０２”のセットを返す。次に構造照合手段の処
理についてだが、基本的には実施の形態１および実施の
形態２と同様である。ただし実施の形態３では、文字列
索引検索手段１１１から得られる文字列検索結果群の中
に、先頭文字検索単位識別子と末尾文字検索単位識別子
のセットが含まれる場合があり、この場合の構造照合処
理が新たに必要となる。Next, the flow of a document search process according to the third embodiment will be described. First, search condition input means 10
3. The processing in the search condition analysis unit 110 is the same as that in the first and second embodiments, and a description thereof will be omitted. Next, the processing in the character string index search means 111 is basically the same as in the first and second embodiments. However, in the third embodiment, in the character string index creation unit 109, in the case of a character chain spanning Mixed Content, the first character chain and the second character chain
Since the extended character string index including the two search unit identifiers of the characters is created, a new search process is required when the extended character string index is involved. Hereinafter, a specific example thereof will be described with reference to FIG. When the search character string is “search”, 2322, 2323, and 2324 are obtained as character string indexes of character chains corresponding to 2310 elements.
Here, the search unit identifier of 2322 and the character string first character search unit identifier of 2323 as the extended character string index are “1”.
01 ", and the character position numbers are" 4 "and" 5 ".
Is continuous. In addition, the second character search unit identifier of the character chain of the extended character string index 2323 matches the search unit identifier of 2324 with “102”, and the character position numbers continue with “5” and “6”. In such a case, the character chain 23
It means that the character string search was hit from 33 to 2324. At this time, a set of search unit identifiers corresponding to the first character and the last character of the search character string is returned as the search unit identifier of the character string search result. In the case of this example, a set of the first character search unit identifier “101” and the last character search unit identifier “102” is returned. Next, the processing of the structure matching means is basically the same as in the first and second embodiments. However, in the third embodiment, a set of the first character search unit identifier and the last character search unit identifier may be included in the character string search result group obtained from the character string index search unit 111. New processing is required.

【０１１８】上記実施の形態３における文字列索引検索
手段１１１の説明で用いた例では、文字列検索処理結果
として、先頭文字検索単位識別子“１０１”、末尾文字
検索単位識別子“１０２”のセットを返した。この場
合、検索単位識別子“１０１”および“１０２”の両方
に対して、実施の形態１および実施の形態２と同様な構
造照合処理を行ない、両検索単位識別子とも検索条件の
構造指定に当てはまる場合のみ、最終的な検索結果とす
る。In the example used in the description of the character string index search means 111 in the third embodiment, a set of a head character search unit identifier “101” and a tail character search unit identifier “102” is set as a character string search processing result. I returned. In this case, a structure matching process similar to that in the first and second embodiments is performed on both the search unit identifiers “101” and “102”, and both search unit identifiers are applicable to the structure specification of the search condition. Only the final search results.

【０１１９】実施の形態３における検索結果作成・表示
処理は実施の形態１および実施の形態２と同様であるた
め、説明を省略する。Since the search result creation / display processing in the third embodiment is the same as in the first and second embodiments, the description is omitted.

【０１２０】以上のように本実施の形態では、登録対象
構造化文書中にＭｉｘｅｄＣｏｎｔｅｎｔを含む場合
に、ＭｉｘｅｄＣｏｎｔｅｎｔにまたがる文字連鎖に
対しても文字列索引（文字連鎖１文字目と文字連鎖２文
字目の、２個の検索単位識別子を記憶する拡張文字列索
引）を作成することによって、ＭｉｘｅｄＣｏｎｔｅ
ｎｔにまたがる文字列を検索対象とすることが可能とな
る。また、ＭｉｘｅｄＣｏｎｔｅｎｔである要素（上記
説明では「キーワード」要素）を指定した検索も可能と
なる。As described above, in the present embodiment, when Mixed Content is included in the structured document to be registered, a character string index (the first character chain and the second character chain) is also used for a character chain spanning the Mixed Content. By creating an extended character string index that stores the two search unit identifiers of the characters,
It is possible to search for a character string spanning nt. In addition, a search that specifies an element that is a MixedContent (the “keyword” element in the above description) can be performed.

【０１２１】なお、実施の形態３の説明においては、２
文字ずつ文字連鎖を取り出してそれぞれに文字列索引を
作成しているが、この文字連鎖は２文字ずつでなくても
構わない。この場合、実施の形態３における「文字連鎖
１文字目検索単位識別子」を「文字連鎖先頭文字の検索
単位識別子」に、「文字連鎖２文字目検索単位識別子」
を「文字連鎖末尾文字の検索単位識別子」に置き換える
ことにより、同様の効果が実現可能である。In the description of the third embodiment, 2
A character string is extracted for each character and a character string index is created for each character chain. However, the character chain does not have to be two characters. In this case, the “character chain first character search unit identifier” in the third embodiment is replaced with the “character chain first character search unit identifier” and the “character chain second character search unit identifier”.
Can be replaced by “the search unit identifier of the last character in the character chain” to achieve the same effect.

【０１２２】さらに実施の形態３は、装置として実現す
る例を示したが、その他に汎用計算機に本実施の形態に
開示した構造化文書管理装置として機能するプログラム
をインストールすることによっても実現することが可能
である。Although the third embodiment has been described as an example in which the present invention is realized as an apparatus, the present invention can also be realized by installing a program which functions as a structured document management apparatus disclosed in the present embodiment in a general-purpose computer. Is possible.

【０１２３】（実施の形態４）以下、本発明の実施の形
態４について説明する。図２５は実施の形態４における
構造化文書管理装置の構成図である。実施の形態１の構
成図である図１と異なるのは、検索エンジン１０５に数
値型索引作成手段２４０１と数値型索引検索手段２４０
２を、データ格納部１０６に数値型設定格納手段２４０
３と数値型索引格納手段２４０４を新たに備えていると
ころである。Embodiment 4 Hereinafter, Embodiment 4 of the present invention will be described. FIG. 25 is a configuration diagram of the structured document management device according to the fourth embodiment. The difference from FIG. 1, which is the configuration diagram of the first embodiment, is that the search engine 105 uses the numerical index creation unit 2401 and the numerical index search unit 240
2 is stored in the data storage unit 106 as a numeric setting storage unit 240
3 and a numerical index storage unit 2404 are newly provided.

【０１２４】数値型索引作成手段２４０１は、あらかじ
め設定されたタグ名の要素内容に対する数値範囲検索用
の索引を作成する。The numerical type index creating means 2401 creates an index for numerical range search with respect to the element content of a preset tag name.

【０１２５】数値型索引検索手段２４０２は、数値型索
引作成手段２４０１で作成された数値型索引を用いて数
値範囲の検索処理を行なう。The numerical index search means 2402 performs a numerical range search process using the numerical index created by the numerical index creating means 2401.

【０１２６】数値型設定格納手段２４０３は、あらかじ
め数値型索引を作成するように定められた要素のタグ名
の集合が格納されている。The numerical type setting storage means 2403 stores a set of tag names of elements determined to create a numerical index in advance.

【０１２７】数値型索引格納手段２４０４は、数値型索
引作成手段２４０１で作成された数値型索引を格納す
る。The numerical index storage means 2404 stores the numerical index created by the numerical index creating means 2401.

【０１２８】ここで、実施の形態４における登録処理の
流れについて具体例を用いて説明する。まず実施の形態
４においては、本システムに文書を登録する前に、数値
型設定格納手段２４０３にあらかじめ数値索引を作成す
る要素のタグ名として“価格”というタグ名が設定され
ているものとする。この時、図２６のような文書を登録
する場合について説明する。構造化文書入力手段１０
２、構造解析手段１０７、構造情報作成手段１０８、お
よび文字列索引作成手段１０９の処理は、実施の形態１
および実施の形態２と同様であるため説明を省略する。Here, the flow of the registration process in the fourth embodiment will be described using a specific example. First, in the fourth embodiment, before registering a document in the present system, it is assumed that a tag name “price” is set in the numerical value setting storage unit 2403 as a tag name of an element for which a numerical index is created in advance. . At this time, a case where a document as shown in FIG. 26 is registered will be described. Structured document input means 10
2. The processing of the structure analysis unit 107, the structure information creation unit 108, and the character string index creation unit 109 is performed in
The description is omitted because it is similar to that of the second embodiment.

【０１２９】図２７は実施の形態４における数値型索引
作成手段２４０１の処理の流れである。まずステップ２
６０１で登録文書の構造解析済みデータを読み込む。次
に現在参照中の要素が数値型設定格納手段２４０３で数
値型索引を作成するよう設定された要素かどうか調べ
（ステップ２６０２）、設定されていない要素であった
らステップ２６０６へ進む。設定されていた要素であっ
たら、構造解析手段１０７のステップ４０６にて該要素
に割当てられた検索単位識別子を取得する。次にステッ
プ２６０４で該要素の実体（文字列）を数値データに変
換する。その際、文字列が数字だけでなく単位などの文
字データを含んでいる場合、数字部分の文字列だけ取り
出し、数値データに変換する。そして数値型索引に該要
素の検索単位識別子と数値データのレコードを追加す
る。この際、数値型索引は数値型設定格納手段２４０３
で設定された要素のタグ名の名称ＩＤごとに作成する
（ステップ２６０５）。次にステップ２６０６で登録対
象文書の全要素についてステップ２６０２から２６０５
の処理を終了したか調べ、まだ未処理の要素が存在した
らステップ２６０２以降の処理を繰り返す。全要素につ
いてステップ２６０２から２６０５の処理を終了した
ら、最後にここで作成した数値型索引を数値型索引格納
手段２４０４に追加する（ステップ２６０７）。FIG. 27 shows the flow of processing of the numerical index creating means 2401 in the fourth embodiment. First step 2
In step 601, the structurally analyzed data of the registered document is read. Next, it is checked whether the element currently being referred to is an element set to create a numerical index in the numerical type setting storage unit 2403 (step 2602). If the element is not set, the process proceeds to step 2606. If the element has been set, in step 406 of the structure analysis means 107, the retrieval unit identifier assigned to the element is obtained. Next, in step 2604, the entity (character string) of the element is converted into numerical data. At this time, if the character string contains not only numbers but also character data such as units, only the character string of the number part is extracted and converted to numerical data. Then, the search unit identifier of the element and the record of the numerical data are added to the numerical index. At this time, the numerical type index is stored in the numerical type setting storage unit 2403.
It is created for each name ID of the tag name of the element set in (2605). Next, in step 2606, all the elements of the document to be registered are processed in steps 2602 to 2605.
It is checked whether or not the process has been completed, and if there is an unprocessed element, the process from step 2602 is repeated. When the processing of steps 2602 to 2605 is completed for all elements, finally, the numerical index created here is added to the numerical index storage unit 2404 (step 2607).

【０１３０】ここでの例の場合、数値型索引を作成する
要素は図２６の２５０１に示す要素である。該要素の検
索単位識別子が“２０１”であるとした場合に作成され
る数値型索引は図２８の２７１０のようになる。なお、
図２８では数値データをLong型整数として格納している
が、Double型浮動小数点数などで格納することも可能で
ある。ただし、名称ＩＤ単位で作成される数値型索引ご
とに型を統一する必要がある。In the case of this example, the element for creating the numerical index is the element indicated by 2501 in FIG. If the search unit identifier of the element is "201", the numerical index created is as shown by 2710 in FIG. In addition,
In FIG. 28, the numerical data is stored as a Long type integer, but may be stored as a Double type floating point number or the like. However, it is necessary to unify the type for each numerical type index created for each name ID.

【０１３１】次に実施の形態４における文書検索の処理
について説明する。実施の形態４では、数値型設定格納
手段２４０３で設定されたタグ名の要素に対して数値型
索引を作成しているため、実施の形態１および実施の形
態２で説明した構造を指定した文字列の検索のほかに、
数値範囲の検索が可能となる。Next, a description will be given of a document retrieval process according to the fourth embodiment. In the fourth embodiment, since the numeric index is created for the element of the tag name set in the numeric type setting storage unit 2403, the character specifying the structure described in the first and second embodiments is used. In addition to searching columns,
Numeric range search is possible.

【０１３２】例として、まず検索条件入力手段１０３を
通して、端末１０１から「タグ名が“価格”である要素
の内容が“１５００円〜１７００円”である文書」とい
う条件が与えられたとする。この時検索条件解析手段１
１０の処理は実施の形態１のＣａｓｅ１と同様であるた
め説明を省略する。As an example, first, it is assumed that a condition that “the content of the element whose tag name is“ price ”is“ 1500 to 1700 yen ”” is given from the terminal 101 through the search condition input means 103. At this time, search condition analysis means 1
The processing of Step 10 is the same as that of Case 1 of the first embodiment, and a description thereof will not be repeated.

【０１３３】次に検索条件が数値範囲を指定した検索な
ので、文字列索引検索手段１１１ではなく、数値型索引
検索手段２４０２の処理を行なう。ここでの例の場合、
“価格”タグの名称ＩＤについて作成された数値型索引
に図２８の２７２０のようなデータが格納されていると
すると、１５００以上、１７００以下の数値データを持
つものとして２７２１（検索単位識別子：５４）、２７
２２（検索単位識別子：２０１）、２７２３（検索単位
識別子：５４５）の３つを抽出する。Next, since the search condition is a search in which a numerical range is designated, the processing is performed not by the character string index search means 111 but by the numerical index search means 2402. In our example,
Assuming that data such as 2720 in FIG. 28 is stored in the numerical index created for the name ID of the “price” tag, it is assumed that numerical data of 1500 or more and 1700 or less are stored as 2721 (search unit identifier: 54 ), 27
22 (search unit identifier: 201) and 2723 (search unit identifier: 545) are extracted.

【０１３４】次に構造照合手段１１２にて、数値型索引
検索手段２４０２の処理で抽出した検索単位識別子につ
いて、検索条件の構造指定チェックを行なう。ここでの
例における構造照合手段１１２の処理は実施の形態１と
同様であるため説明を省略する。なお、実施の形態４で
は数値範囲検索における構造指定として、上記実施の形
態１におけるＣａｓｅ１のみでなく、Ｃａｓｅ２、Ｃａ
ｓｅ３、Ｃａｓｅ４に対応可能である。それぞれのＣａ
ｓｅにおける検索条件解析手段１１０および構造照合手
段１１２における処理は、実施の形態１と同様であるた
め説明を省略する。Next, the structure collating means 112 checks the structure designation of the search condition for the search unit identifier extracted in the processing of the numerical index search means 2402. The processing of the structure matching unit 112 in this example is the same as that of the first embodiment, and a description thereof will be omitted. In the fourth embodiment, not only Case 1 in the first embodiment but also Case 2 and
Se3 and Case4 can be supported. Each Ca
The processing in search condition analysis unit 110 and structure matching unit 112 in se is the same as in the first embodiment, and a description thereof will be omitted.

【０１３５】実施の形態４における検索結果作成・表示
処理は実施の形態１と同様であるため、説明を省略す
る。Since the search result creation / display processing in the fourth embodiment is the same as that in the first embodiment, the description is omitted.

【０１３６】以上のように本実施の形態では、あらかじ
め数値型設定格納手段２４０３で設定されたタグ名の要
素に対して数値型索引作成手段２４０１にて数値型索引
を作成することにより、要素内容を数値データとして扱
った数値範囲の検索が可能となる。As described above, in the present embodiment, a numeric index is created by the numeric index creating means 2401 with respect to the element of the tag name set in advance by the numeric setting storage means 2403, whereby the element contents Can be searched for a numerical range in which is treated as numerical data.

【０１３７】なお実施の形態４における数値型索引は、
図２８の２７２０のような構造であるとして説明した
が、この数値型索引は指定された数値範囲に該当する検
索単位識別子を抽出できるものであればどのような構造
でも構わない。また、実施の形態４において、文字列索
引作成手段１０９での処理の後に、数値型索引作成手段
２４０１を行なうものとして説明したが、文字列索引作
成手段１０９の処理手順である図４のステップ４０５に
て、要素実体に出会った場合に、ステップ４０６と４０
７の処理と平行して、数値型索引作成手段２４０１の処
理手順である図２７のステップ２６０２からステップ２
６０５の処理を行なうことも可能である。The numerical index according to Embodiment 4 is as follows.
Although described as having a structure like 2720 in FIG. 28, this numerical index may have any structure as long as it can extract a search unit identifier corresponding to a specified numerical range. Also, in the fourth embodiment, the description has been given assuming that the numerical index creating means 2401 is performed after the processing by the character string index creating means 109. However, step 405 in FIG. In step 406 and step 40 when the element entity is encountered.
Steps 2602 to 2 in FIG.
It is also possible to perform the process of 605.

【０１３８】さらに実施の形態４は、装置として実現す
る例を示したが、その他に汎用計算機に本実施の形態に
開示した構造化文書管理装置として機能するプログラム
をインストールすることによっても実現することが可能
である。Although the fourth embodiment has been described as an example in which the present invention is realized as an apparatus, the present invention can also be realized by installing a program which functions as a structured document management apparatus disclosed in this embodiment in a general-purpose computer. Is possible.

【０１３９】（実施の形態５）以下、本発明の実施の形
態５について説明する。図２９は実施の形態５における
構造化文書管理装置の構成図である。(Embodiment 5) Hereinafter, Embodiment 5 of the present invention will be described. FIG. 29 is a configuration diagram of the structured document management device according to the fifth embodiment.

【０１４０】本実施の形態は、ネットワーク上に構造化
文書管理装置の各機能が分散していることを特徴とする
ものである。The present embodiment is characterized in that the functions of the structured document management device are distributed on a network.

【０１４１】構造化文書登録部３００１は、構造化文書
を読み込み、解析し、構造化文書の木構造を生成する機
能を有している。文字列索引作成部３００２は、構造化
文書登録部３００１で解析された構造化文書について、
検索用索引を生成する機能を有している。文字列検索部
３００３は、検索条件を読み込み、検索条件に該当する
文字列を有している要素実体を検索する機能を有してい
る。結果表示部３００４は、前記文字列検索部３００３
で得られた検索結果を端末１０１に表示する機能を有し
ている。なお、端末１０１及びデータ格納部１０６は実
施の形態１で記載した機能と同一の機能を有しており、
データ格納部１０６は上記各機能ブロックが作成した解
析済構造化文書、文字列索引、検索結果等をネットワー
ク経由で受け取り記憶する。端末１０１は、使用者の指
定した検索条件をネットワーク経由で文字列検索部３０
０３に送る。また、結果表示部３００４に記憶されてい
る検索結果をネットワーク経由で受け取り、表示する機
能を有している。以下、各機能ブロック毎に説明する。The structured document registration unit 3001 has a function of reading and analyzing a structured document and generating a tree structure of the structured document. The character string index creation unit 3002 converts the structured document analyzed by the structured document registration unit 3001
It has a function of generating a search index. The character string search unit 3003 has a function of reading a search condition and searching for an element entity having a character string corresponding to the search condition. The result display unit 3004 is the character string search unit 3003
Has a function of displaying on the terminal 101 the search result obtained in step (1). Note that the terminal 101 and the data storage unit 106 have the same functions as those described in Embodiment 1, and
The data storage unit 106 receives and stores the analyzed structured document, character string index, search result, and the like created by each of the functional blocks via a network. The terminal 101 transmits the search condition specified by the user to the character string search unit 30 via the network.
Send to 03. Further, it has a function of receiving and displaying the search results stored in the result display unit 3004 via the network. Hereinafter, each functional block will be described.

【０１４２】構造化文書登録部３００１は、構造化文書
入力手段１０２と構造解析手段１０７と構造情報作成手
段１０８より構成されており、これら３つの手段は、実
施の形態１で記載している機能と同一の機能を有してい
る。ただし、構造情報作成手段１０８で作成される要素
管理テーブルは実施の形態１で記載した図８の形式の
他、図３１または３２のように検索単位識別子とパス名
称ＩＤ及びパス階層ＩＤとの対応関係を示した形式、ま
たは検索単位識別子と名称ＩＤとの対応関係を示した形
式でも構わない。The structured document registration unit 3001 is composed of the structured document input means 102, the structure analysis means 107, and the structure information creation means 108, and these three means have the functions described in the first embodiment. It has the same function as. However, the element management table created by the structure information creating means 108 has the correspondence between the search unit identifier, the path name ID, and the path hierarchy ID as shown in FIG. A format indicating the relationship or a format indicating the correspondence between the search unit identifier and the name ID may be used.

【０１４３】なお、上記構造化文書登録部３００１の機
能と同一の機能はプログラム形式で実行可能であり、こ
のプログラムを記録した可搬型媒体を用いて汎用計算機
にインストールすることにより、構造化文書登録部３０
０１と同一の機能を実現できる。The same functions as those of the structured document registration unit 3001 can be executed in the form of a program. By installing the program in a general-purpose computer using a portable medium storing the program, the structured document registration unit 3001 can be executed. Part 30
01 can realize the same function.

【０１４４】また上記構造化文書登録部３００１は、そ
れ自体で装置としての機能も果たすことが可能である。The structured document registration unit 3001 can also function as a device by itself.

【０１４５】文字列索引作成部３００２は、文字列索引
作成手段１０９と、数値型索引作成手段２４０１から構
成されている。文字列索引作成手段１０９は実施の形態
１に記載した機能と同一の機能を有している。数値型索
引作成手段２４０１は実施の形態４に記載した機能と同
一の機能を有している。ただし、数値型索引作成手段２
４０１は、検索条件として特定の数値範囲に該当する文
字列を検索する場合に必要となる構成要素であり、検索
条件に数値範囲が含まれない場合は、数値型索引作成手
段２４０１は不要である。The character string index creating section 3002 is composed of the character string index creating means 109 and the numerical type index creating means 2401. The character string index creation means 109 has the same function as the function described in the first embodiment. Numerical index creation means 2401 has the same function as the function described in the fourth embodiment. However, numerical index creation means 2
Reference numeral 401 denotes a component required when searching for a character string corresponding to a specific numerical range as a search condition. If the search condition does not include a numerical range, the numerical index creation unit 2401 is unnecessary. .

【０１４６】なお、上記文字列索引作成部３００２の機
能と同一の機能はプログラム形式で実行可能であり、こ
のプログラムを記録した可搬型媒体を用いて汎用計算機
にインストールすることにより、文字列索引作成部３０
０２と同一の機能を実現できる。Note that the same function as that of the character string index creation unit 3002 can be executed in the form of a program. By installing this program on a general-purpose computer using a portable medium on which the program is recorded, the character string index creation unit 3002 can be created. Part 30
02 can realize the same function.

【０１４７】また文字列索引作成部３００２は、それ自
体で装置としての機能も果たすことが可能である。The character string index creation unit 3002 itself can also function as a device.

【０１４８】文字列検索部３００３は、検索条件入力手
段１０３と、検索条件解析手段１１０と、文字列索引検
索手段１１１と、数値型索引検索手段２４０２と、構造
照合手段１１２から構成されている。検索条件入力手段
１０３、検索条件解析手段１１０と、文字列索引検索手
段１１１と、構造照合手段１１２は、実施の形態１に記
載の機能と同一の機能を有する。ただし、構造情報作成
手段１０８で作成される要素管理テーブルが図３１の形
式の場合は、検索条件としてタグ名を指定することはで
きず、パス名称またはパス階層を指定することができ
る。一方、要素管理テーブルが図３２の形式の場合は、
検索条件としてタグ名のみを指定することができる。The character string search section 3003 comprises the search condition input means 103, the search condition analysis means 110, the character string index search means 111, the numerical index search means 2402, and the structure matching means 112. The search condition input unit 103, the search condition analysis unit 110, the character string index search unit 111, and the structure matching unit 112 have the same functions as those described in the first embodiment. However, when the element management table created by the structure information creating unit 108 is in the format shown in FIG. 31, a tag name cannot be specified as a search condition, and a path name or a path hierarchy can be specified. On the other hand, when the element management table is in the format of FIG. 32,
Only tag names can be specified as search conditions.

【０１４９】なお、上記文字列検索部３００３の機能と
同一の機能はプログラム形式で実行可能であり、このプ
ログラムを記録した可搬型媒体を用いて汎用計算機にイ
ンストールすることにより、文字列索引部３４０３と同
一の機能を実現できる。Note that the same function as that of the character string search unit 3003 can be executed in the form of a program. By installing the program in a general-purpose computer using a portable medium storing the program, the character string indexing unit 3403 can be executed. The same function can be realized.

【０１５０】また文字列検索部３００３は、それ自体で
装置としての機能も果たすことが可能である。The character string search section 3003 itself can also function as a device.

【０１５１】数値型索引検索手段２４０２は実施の形態
４に記載の機能と同一の機能を有する。ただし、数値型
索引検索手段２４０２は、検索条件として特定の数値範
囲に該当する文字列を検索する場合に必要となる構成要
素であり、検索条件に数値範囲が含まれない場合は、数
値型索引検索手段２４０２は不要である。The numerical index search means 2402 has the same function as the function described in the fourth embodiment. However, the numerical index search means 2402 is a component necessary when searching for a character string corresponding to a specific numerical range as a search condition. If the search condition does not include a numerical range, the numerical index search means 2402 Search means 2402 is unnecessary.

【０１５２】なお、上記数値型索引検索手段２４０２の
機能と同一の機能はプログラム形式で実行可能であり、
このプログラムを記録した可搬型媒体を用いて汎用計算
機にインストールすることにより、数値型索引検索手段
２４０２と同一の機能を実現できる。The same function as that of the numerical index search means 2402 can be executed in a program format.
By installing this program on a general-purpose computer using a portable medium on which the program is recorded, the same function as that of the numerical index search means 2402 can be realized.

【０１５３】図３０は、文字列検索部３００３の処理の
流れを示したフローチャートである。FIG. 30 is a flowchart showing the flow of the process of the character string search unit 3003.

【０１５４】まず、使用者の指定した検索条件を読み込
み（ステップ３００５）、次に、読み込んだ検索条件に
該当する名称ＩＤ又は、パス名称ＩＤ又は、パス階層Ｉ
ＤのいずれかのＩＤ（以下ＩＤ１）に変換する（ステッ
プ３００６）。なお、前記３つのＩＤのうち、いずれの
ＩＤに変換されるかは図１１に示すように使用者の検索
条件に依存する。また、どのような検索条件が可能であ
るかは図８、図３１、図３２に示した要素管理テーブル
の形式に制約される。次に、前記検索条件に該当する文
字列を有するすべての検索単位識別子（以下、ＩＤ２）
を特定する（ステップ３００７）し、前記ＩＤ２に基づ
いて要素管理テーブルを参照し、対応する名称ＩＤ又
は、パス名称ＩＤ又は、パス階層ＩＤのいずれかのＩＤ
（以下、ＩＤ３）を特定し（ステップ３００８）、最後
に、前記ＩＤ１とＩＤ３が一致する検索単位識別子を特
定する（ステップ３００９）。First, a search condition specified by the user is read (step 3005), and then a name ID, a path name ID, or a path hierarchy I corresponding to the read search condition is read.
It is converted to any ID of D (hereinafter, ID1) (step 3006). Note that which of the three IDs is converted depends on the search condition of the user as shown in FIG. Also, what search conditions are possible are restricted by the format of the element management table shown in FIGS. 8, 31, and 32. Next, all search unit identifiers having a character string corresponding to the search condition (hereinafter, ID2)
(Step 3007), referring to the element management table based on the ID2, and corresponding ID of any of the name ID, the path name ID, or the path hierarchy ID
(Hereinafter, ID3) is specified (step 3008), and finally, a search unit identifier that matches the ID1 and ID3 is specified (step 3009).

【０１５５】結果表示部３００４は、結果作成手段１１
３と結果表示手段１０４から構成されている。結果作成
手段１１３と結果表示手段１０４は、実施の形態１に記
載の機能と同一である。The result display unit 3004 displays the result creating means 11
3 and result display means 104. The result creating unit 113 and the result displaying unit 104 have the same functions as those described in the first embodiment.

【０１５６】さらに実施の形態５は、装置として実現す
る例を示したが、その他に汎用計算機に本実施の形態に
開示した構造化文書管理装置として機能するプログラム
をインストールすることによっても実現することが可能
である。Although the fifth embodiment has been described as an example in which the present invention is realized as an apparatus, the present invention can also be realized by installing a program functioning as a structured document management apparatus disclosed in this embodiment in a general-purpose computer. Is possible.

【０１５７】[0157]

【発明の効果】以上のように、本発明によれば構造化文
書の様々な論理構造を指定した検索が可能な構造化文書
管理装置において、文字列索引内部に論理構造に関する
情報を含めないことにより、文字列索引のサイズ縮小を
可能とする効果を有する。更に文書の特定の要素内容の
追加、変更、削除を行なう際に、処理量が大幅に軽減さ
れるという効果を有する。As described above, according to the present invention, in a structured document management apparatus capable of performing a search by designating various logical structures of a structured document, information relating to the logical structure is not included in the character string index. Thus, the size of the character string index can be reduced. Further, there is an effect that the processing amount is greatly reduced when adding, changing, or deleting specific element contents of a document.

【０１５８】また、ノードの論理構造位置を特定するた
めのＩＤをパス名称ＩＤとパス階層ＩＤの２つに分けて
管理しているため、論理構造が複雑かつ膨大になった場
合でも、構造を特定するためのＩＤの総数を少なく押さ
えることを可能とする効果を有する。Further, since the ID for specifying the logical structure position of a node is managed by being divided into two, a path name ID and a path hierarchy ID, even if the logical structure becomes complicated and enormous, the structure can be changed. This has the effect of enabling the total number of IDs to be specified to be kept low.

【０１５９】また、各パス名称ＩＤが検索条件の構造指
定の範囲内にあるかどうかの情報が格納されるパス名称
ＩＤ照合テーブルや、各パス階層ＩＤが検索条件の構造
指定の範囲内にあるかどうかの情報が格納されるパス階
層ＩＤ照合テーブルを作成し、構造照合処理を行なうこ
とにより、中間ノード以下を指定した高速な検索を実現
するという効果を有する。A path name ID collation table storing information as to whether or not each path name ID falls within the structure specification range of the search condition, and each path hierarchy ID falls within the structure specification range of the search condition. By creating a path hierarchy ID collation table in which information as to whether or not information is stored and performing the structure collation processing, there is an effect that a high-speed search that specifies intermediate nodes and below is realized.

【０１６０】なお、上述したように従来の技術では検索
範囲として中間ノード以下を指定した場合、たとえ同一
の親ノードを持つ同一タグ名を有するノードでも異なる
文脈識別子が割り当てられるため、検索条件に該当する
か否かをチェックする為のＯＲ検索が必要となり、検索
時間が大きくなるという課題を有していたが、本発明
は、同一の親ノードを持つ同一タグ名を有するノードが
たとえ複数存在しても、同一の識別子を付与するため
に、ＯＲ検索が不要となり、検索時間が短縮できるとい
う効果を有する。As described above, according to the conventional technique, when a node below the intermediate node is designated as a search range, even if nodes having the same tag name and the same parent node have different context identifiers, different context identifiers are assigned. Although an OR search for checking whether or not to perform the search is required, there is a problem that the search time becomes long. However, according to the present invention, even if a plurality of nodes having the same tag name and the same parent node exist, However, since the same identifier is assigned, the OR search becomes unnecessary, and the search time can be shortened.

【０１６１】また、ＭｉｘｅｄＣｏｎｔｅｎｔにまた
がる文字連鎖に対して拡張文字列索引を作成することに
よって、ＭｉｘｅｄＣｏｎｔｅｎｔにまたがる文字列
を検索対象とすること、およびＭｉｘｅｄＣｏｎｔｅ
ｎｔである要素を指定した検索を可能とする効果を有す
る。Further, by creating an extended character string index for a character chain extending over Mixed Content, a character string extending over Mixed Content can be searched, and
This has the effect of enabling a search specifying an element that is nt.

【０１６２】また、あらがじめ設定されたタグ名の要素
に対して数値型索引を作成することにより、要素内容を
数値データとして扱った数値範囲の検索を可能とする効
果を有する。Further, by creating a numerical index for the element of the tag name set in advance, it is possible to search a numerical range in which the element contents are treated as numerical data.

[Brief description of the drawings]

【図１】本発明の実施の形態１における構造化文書管理
装置の構成図FIG. 1 is a configuration diagram of a structured document management device according to a first embodiment of the present invention.

【図２】本発明の実施の形態１における構造化文書の一
例を示す図FIG. 2 is a diagram showing an example of a structured document according to the first embodiment of the present invention.

【図３】本発明の実施の形態１における構造を解析した
結果の木構造の一例を示す図FIG. 3 is a diagram illustrating an example of a tree structure obtained by analyzing the structure according to the first embodiment of the present invention;

【図４】本発明の実施の形態１における構造情報作成手
段の処理手順を示す図FIG. 4 is a diagram showing a processing procedure of a structure information creating unit according to the first embodiment of the present invention.

【図５】本発明の実施の形態１における名称ＩＤを割当
てた例を示す図FIG. 5 is a diagram showing an example in which a name ID is assigned according to the first embodiment of the present invention.

【図６】本発明の実施の形態１におけるパス名称インデ
ックスの一例を示す図FIG. 6 is a diagram showing an example of a path name index according to the first embodiment of the present invention.

【図７】本発明の実施の形態１におけるパス階層インデ
ックスの一例を示す図FIG. 7 is a diagram showing an example of a path hierarchy index according to the first embodiment of the present invention.

【図８】本発明の実施の形態１における要素管理テーブ
ルの一例を示す図FIG. 8 is a diagram showing an example of an element management table according to the first embodiment of the present invention.

【図９】本発明の実施の形態１における文字列索引作成
手段の処理手順を示す図FIG. 9 is a diagram showing a processing procedure of a character string index creation unit according to the first embodiment of the present invention.

【図１０】本発明の実施の形態１における文字列索引の
一例を示す図FIG. 10 is a diagram showing an example of a character string index according to the first embodiment of the present invention.

【図１１】本発明の実施の形態１における検索条件解析
手段の処理手順を示す図FIG. 11 is a diagram showing a processing procedure of a search condition analysis unit according to the first embodiment of the present invention.

【図１２】本発明の実施の形態１における文字列索引を
用いた検索処理の詳細を示す図FIG. 12 is a diagram showing details of a search process using a character string index according to the first embodiment of the present invention.

【図１３】本発明の実施の形態１における構造照合手段
の処理手順を示す図FIG. 13 is a diagram showing a processing procedure of a structure matching unit according to the first embodiment of the present invention.

【図１４】本発明の実施の形態１における構造照合処理
の詳細を示す図FIG. 14 is a diagram showing details of a structure matching process according to the first embodiment of the present invention.

【図１５】本発明の実施の形態１におけるノード群を追
加した木構造の一例を示す図FIG. 15 is a diagram showing an example of a tree structure to which a node group is added according to the first embodiment of the present invention.

【図１６】本発明の実施の形態２における構造化文書管
理装置の構成図FIG. 16 is a configuration diagram of a structured document management device according to a second embodiment of the present invention.

【図１７】本発明の実施の形態２における構造条件解析
手段の処理手順を示す図FIG. 17 is a diagram showing a processing procedure of a structural condition analysis unit according to the second embodiment of the present invention.

【図１８】本発明の実施の形態２におけるパス名称ＩＤ
照合テーブルの一例を示す図FIG. 18 shows a path name ID according to the second embodiment of the present invention.
Diagram showing an example of a collation table

【図１９】本発明の実施の形態２における構造照合手段
の処理手順を示す図FIG. 19 is a diagram showing a processing procedure of a structure matching unit according to the second embodiment of the present invention.

【図２０】本発明の実施の形態２における構造照合処理
の詳細を示す図FIG. 20 is a diagram showing details of a structure matching process according to the second embodiment of the present invention.

【図２１】本発明の実施の形態２における構造照合手段
で、中間ノードを指定した場合の検索範囲に該当するノ
ードを特定するための処理手順を示す図FIG. 21 is a diagram illustrating a processing procedure for specifying a node corresponding to a search range when an intermediate node is specified by the structure matching unit according to the second embodiment of the present invention;

【図２２】本発明の実施の形態２におけるパス階層ＩＤ
照合テーブルの一例を示す図FIG. 22 illustrates a path hierarchy ID according to the second embodiment of the present invention.
Diagram showing an example of a collation table

【図２３】本発明の実施の形態３における文字列索引作
成手段の処理手順を示す図FIG. 23 is a diagram showing a processing procedure of a character string index creation unit according to the third embodiment of the present invention.

【図２４】本発明の実施の形態３における拡張文字列索
引の一例を示す図FIG. 24 is a diagram showing an example of an extended character string index according to the third embodiment of the present invention.

【図２５】本発明の実施の形態４における構造化文書管
理装置の構成図FIG. 25 is a configuration diagram of a structured document management device according to a fourth embodiment of the present invention.

【図２６】本発明の実施の形態４における構造化文書の
一例を示す図FIG. 26 is a diagram showing an example of a structured document according to the fourth embodiment of the present invention.

【図２７】本発明の実施の形態４における数値型索引作
成手段の処理手順を示す図FIG. 27 is a diagram showing a processing procedure of a numerical index creation unit according to the fourth embodiment of the present invention.

【図２８】本発明の実施の形態４における数値型索引の
一例を示す図FIG. 28 is a diagram showing an example of a numerical index according to the fourth embodiment of the present invention.

【図２９】本発明の実施の形態５における構造化文書管
理装置の構成図FIG. 29 is a configuration diagram of a structured document management device according to a fifth embodiment of the present invention.

【図３０】本発明の実施の形態５における文字列検索部
の処理手順を示す図FIG. 30 is a diagram showing a processing procedure of a character string search unit according to the fifth embodiment of the present invention.

【図３１】本発明の実施の形態５における要素管理テー
ブルの一例を示す図FIG. 31 is a diagram showing an example of an element management table according to the fifth embodiment of the present invention.

【図３２】本発明の実施の形態５における要素管理テー
ブルの一例を示す図FIG. 32 is a diagram showing an example of an element management table according to the fifth embodiment of the present invention.

【図３３】従来の技術における文書登録システムの構成
を示す図FIG. 33 is a diagram showing a configuration of a document registration system according to a conventional technique.

【図３４】従来の技術における構造インデックスの生成
過程を示す図FIG. 34 is a diagram showing a process of generating a structure index in a conventional technique.

【図３５】従来の技術における文字列インデックスの例
を示した図FIG. 35 is a diagram showing an example of a character string index in a conventional technique.

【図３６】従来の技術における構造インデックスの更新
方法を示した図FIG. 36 is a diagram showing a method of updating a structure index in the related art.

[Explanation of symbols]

１０１…端末１０２…構造化文書入力手段１０３…検索条件入力手段１０４…結果表示手段１０５…検索エンジン１０６…データ格納部１０７…構造解析手段１０８…構造情報作成手段１０９…文字列索引作成手段１１０…検索条件解析手段１１１…文字列索引検索手段１１２…構造照合手段１１３…結果作成手段１１４…構造解析済みデータ格納手段１１５…要素管理テーブル格納手段１１６…パス名称インデックス格納手段１１７…パス階層インデックス格納手段１１８…名称ＩＤテーブル格納手段１１９…文字列索引格納手段１２０…実体データ格納手段１２１…一覧データ格納手段１６０１…パス名称ＩＤ照合テーブル格納手段１６０２…パス階層ＩＤ照合テーブル格納手段２４０１…数値型索引作成手段２４０２…数値型索引検索手段２４０３…数値型設定格納手段２４０４…数値型索引格納手段３００１…構造化文書登録部３００２…文字列索引作成部３００３…文字列検索部３００４…結果表示部 101 Terminal 102 Structured Document Input Unit 103 Search Condition Input Unit 104 Result Display Unit 105 Search Engine 106 Data Storage Unit 107 Structure Analysis Unit 108 Structure Information Creation Unit 109 Character String Index Creation Unit 110 Search condition analysis means 111 Character string index search means 112 Structure comparison means 113 Result creation means 114 Structure analyzed data storage means 115 Element management table storage means 116 Path name index storage means 117 Path hierarchy index storage means 118 ... Name ID table storage means 119 ... Character string index storage means 120 ... Substantial data storage means 121 ... List data storage means 1601 ... Path name ID collation table storage means 1602 ... Path hierarchy ID collation table storage means 2401 ... Numeric index creation Means 2402 Numeric index search unit 2403 ... numeric setting storing unit 2404 ... numeric index storage unit 3001 ... structured document registration unit 3002 ... string index creation unit 3003 ... string searching unit 3004 ... result display unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者鶴林健大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者片山修大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者中井信一大阪府門真市大字門真1006番地松下電器産業株式会社内Ｆターム(参考） 5B075 ND35 NK43 ──────────────────────────────────────────────────の Continuing on the front page (72) Inventor Ken Tsurubayashi 1006 Kadoma Kadoma, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (72) Inventor Shinichi Nakai 1006 Kazuma Kadoma, Kadoma-shi, Osaka Matsushita Electric Industrial Co., Ltd. F-term (reference) 5B075 ND35 NK43

Claims

[Claims]

In a document management apparatus for handling a structured document, a structured document input means for inputting a structured document, a structured document fetched by the structured document input means is analyzed, and a tree of the structured document is analyzed. Structural analysis means for generating a structure;
In the structured document expressed in the tree structure by the structure analysis means, a search unit identifier for identifying each element entity, an element entity position identifier expressing the position of each element entity in the tree structure, and the search unit identifier A structure information creating means for creating an element management table in which the element entity position identifier associated with at least the search unit identifier is specified to identify the element entity position identifier; and a character string index for performing a character string search Character string index creation means for creating a search condition, search condition input means for inputting a search condition, search condition analysis means for specifying the element entity position identifier corresponding to the search condition input by the search condition input means, Using the character string index created by the character string index creation means, the search unit identifier of each element entity having a character string corresponding to the search condition is identified. Character string index searching means, and a corresponding element entity position identifier is obtained by referring to the element management table based on the search unit identifier specified by the character string index searching means, and the element entity position identifier and the search condition analysis are performed. A structured document management device comprising a structure matching unit for extracting only a search unit identifier that matches the element entity position identifier obtained by the unit.

2. A structured document input device for inputting a structured document, a structured document input device for inputting the structured document, and analyzing the structured document fetched by the structured document input device to obtain a tree of the structured document. Structural analysis means for generating a structure;
In the structured document expressed in a tree structure by the structure analysis means, the search unit identifier for identifying each element entity and the path name ID for identifying a path name in which tag names leading to each element entity are arranged in a hierarchical order are the same. A path layer ID for identifying a path layer in which the appearance order of tags having the same name and the same name in the same layer is linked in a layer order, and the path name ID and the path layer ID are specified from the search unit identifier. A structure information creating unit that creates an element management table in which the path name ID and the path hierarchy ID related to at least the search unit identifier are associated, and a character that creates a character string index for performing a character string search. Column index creation means,
Search condition input means for inputting a search condition, search condition analysis means for identifying at least one of the path name ID and the path hierarchy ID corresponding to the search condition input by the search condition input means, A character string index search means for specifying the search unit identifier of each element entity having a character string corresponding to a search condition using a character string index created by the character string index creation means; and a character string index search means. A corresponding path name ID or path hierarchy ID is obtained by referring to the element management table based on the specified search unit identifier,
A structured document management device comprising a structure collation unit that extracts only a search unit identifier in which the path name ID or the path hierarchy ID matches the path name ID or the path hierarchy ID obtained by the search condition analysis unit.

3. A structured document management apparatus for handling a structured document, comprising: a structured document input unit for inputting a structured document; analyzing the structured document fetched by the structured document input unit; Structure analysis means for generating a tree structure, a name ID for identifying a tag name from the tree structure generated by the structure analysis means, a search unit identifier for identifying each element entity, and the name ID from the search unit identifier. Structure information creating means for creating an element management table in which at least the name ID related to the search unit identifier is associated, and a character string index for creating a character string index for performing a character string search Means, search condition input means for inputting search conditions, search condition analysis means for specifying the name ID corresponding to the search conditions input by the search condition input means, A character string index search means for specifying the search unit identifier of each element entity having a character string corresponding to a search condition using a character string index created by the character string index creation means; and a character string index search means. Structural collation for obtaining a corresponding name ID by referring to the element management table based on the specified search unit identifier and extracting only a search unit identifier in which the name ID matches the name ID obtained by the search condition analysis means. Structured document management device provided with means.

4. It has a result creation means for creating a character string search result list and data for displaying each element entity, and a result display means for displaying a search result created by the result creation means on a terminal. 4. The structured document management device according to claim 1, wherein:

5. A structured document input means for inputting a structured document, a structure analyzing means for analyzing the structured document fetched by the structured document input means and generating a tree structure of the structured document, In a structured document expressed in a tree structure by the structural analysis means, the same search unit identifier as a search unit identifier for identifying each element entity and the same path name ID for identifying a path name in which tag names leading to each element entity are arranged in a hierarchical order. A path hierarchy ID for identifying a path hierarchy in which the appearance order of tags having the same name and having the same name in the same hierarchy is arranged in the hierarchy, and the path name ID and the path hierarchy I from the search unit identifier.
A structured document registration device comprising: a structure information creating unit that creates an element management table that associates at least the path name ID and the path hierarchy ID related to the search unit identifier to identify D.

6. When the tree structure of a structured document changes,
3. The structured document management apparatus according to claim 2, wherein an ID that needs to be changed is updated among the path name ID and the path hierarchy ID recorded in the element management table.

7. When the tree structure of a structured document changes,
6. The structured document registration device according to claim 5, wherein an ID that needs to be changed is updated among the path name ID and the path hierarchy ID recorded in the element management table.

8. A structured document input means for inputting a structured document, a structure analyzing means for analyzing the structured document fetched by the structured document input means and generating a tree structure of the structured document, A name ID for identifying a tag name from the tree structure generated by the structure analysis means, a search unit identifier for identifying each element entity, and the name I based on the search unit identifier
A structured document registration device comprising: a structure information creating unit that creates an element management table in which at least the name ID related to the search unit identifier is associated to identify D.

9. An index creation of a structured document including an element entity (child element) further surrounded by a tag inside each element entity, wherein a character string extracted with a predetermined number of characters from each element entity straddles the tag. Obtains a unique search unit identifier for identifying the child element, retrieves the character string, a search unit identifier for identifying an element entity to which each character of the character string belongs, and the character in the element entity from which the tag is removed. A character string index creation device for generating a search character string index including a character position identifier indicating a column position.

10. In creating an index of a structured document including a character string surrounded by a tag that is defined in advance as a numerical value, a unique search unit identifier for identifying the character string surrounded by the tag is obtained. And a numerical index creating means for converting a character string surrounded by the tag into numerical data and creating a numerical index in which the search unit identifier and the numerical data are associated with each other. The character string index creation device according to claim 9.

11. When searching for a character string corresponding to a predetermined condition, a name ID for identifying a tag name, a path name ID for identifying a path name in which tag names leading to each element entity are arranged in a hierarchical order, A path hierarchy ID for identifying a path hierarchy in which the appearance order of tags having the same parent node and the same name in the same hierarchy is arranged in the hierarchy, a search unit identifier for identifying each element entity, and the search unit identifier From the name I
In order to specify D, at least the element management table in which the name ID related to the search unit identifier is associated, or in order to specify the path name ID and the path hierarchy ID from the search unit identifier, at least the search The path name ID and the path hierarchy ID related to the unit identifier
A data storage unit that stores at least one of an element management table in which a search condition is associated, a search condition input unit that inputs a search condition, and a search condition corresponding to a search condition based on the search condition input by the search condition input unit. At least one of a name ID, the path name ID, and the path hierarchy ID
(ID1), a character string index search means for obtaining the search unit identifier having a character string corresponding to the search condition, and a search unit identifier specified by the character string index search means. With reference to the element management table, at least one of the corresponding name ID, path name ID, and path hierarchy ID (ID2) is obtained.
A character string search device comprising a structure matching means for extracting only a search unit identifier that matches with ID1 obtained by the search condition analysis means.

12. In a numerical range search of a structured document including a character string surrounded by a tag that is defined in advance as a numerical value, a unique search unit identifier for identifying the character string surrounded by the tag is provided. Numerical index search means for referring to a numerical index associated with numerical data obtained by converting a character string surrounded by the tags into numerical values and extracting the search unit identifier corresponding to a search condition, The character string search device according to claim 11, wherein

13. A method for registering a structured document represented by a tree structure, wherein a step of reading the structured document and a path name ID for identifying a path name in which tag names leading to each element entity are arranged in a hierarchical order are the same. A step of acquiring a path hierarchy ID for identifying a path hierarchy in which the appearance order of tags having the same name and having the same name in the same hierarchy is arranged in the hierarchy, and a step of determining whether or not the tag has an element entity; Obtaining a search unit identifier for identifying each element entity; and identifying the path name ID and the path hierarchy ID from the search unit identifier, at least the path name ID and the path hierarchy related to the search unit identifier. A portable medium storing a program having a step of creating an element management table associated with an ID.

14. A method for registering a structured document represented by a tree structure, a step of reading a structured document, a step of obtaining a name ID for identifying a tag name, and a step of determining whether or not an element entity is included. Acquiring a search unit identifier for identifying each element entity; and, in order to identify the name ID from the search unit identifier, an element management table in which at least the name ID related to the search unit identifier is associated. A portable medium storing a program having a step of creating.

15. A method for generating a character index of a structured document having an element entity (child element) further surrounded by tags inside the element entity, a step of reading structurally analyzed data, and a step of determining whether or not the element entity is included. Checking, and obtaining a search unit identifier for identifying an element entity; checking whether or not the child element is included; and obtaining a search unit identifier for identifying the child element, Extracting a character string in units of one or more predetermined characters from an element entity; obtaining a search unit identifier to which each character of the character string belongs; and searching for the search unit to which the character string and each character of the character string belong Generating a search string index having a character position identifier indicating the position of the character string within the element entity from which the identifier and the tag have been removed. A portable medium that stores programs.

16. A method for generating a numerical search index for a structured document, comprising: reading structurally analyzed data;
A step of determining whether or not the character string is enclosed by a tag that defines a numerical value in advance, and a search unit identifier for identifying the character string enclosed by the tag that is defined to be a numerical value , A step of converting the character string into a numerical value, and a step of generating a numerical index including the search unit identifier and the converted numerical value.

17. A method for retrieving a structured document, comprising: reading a search condition; and inputting a name ID for identifying a tag name corresponding to the search condition or a path name in which tag names leading to each element entity are arranged in a hierarchical order. At least one of a path name ID to be identified or a path hierarchy ID for identifying a path hierarchy in which the appearance order of tags having the same parent node and the same name in the same hierarchy is linked in the hierarchy order ID1), identifying a search unit identifier (hereinafter, ID2) for identifying each element entity having a character string corresponding to the search condition, and determining the name ID, the path name ID, Path hierarchy ID
To specify at least the name ID, the path name ID, and the element management table in which the path hierarchy ID is associated with the ID2, the name ID corresponding to the ID2, the path name ID, The path hierarchy I
A portable medium storing a program having a step of obtaining at least one ID of D (hereinafter, referred to as ID3) and a step of extracting only the search unit identifier in which the ID1 and the ID3 match.

18. A method for determining nodes included in a search range when an intermediate node or less is specified as a search range, wherein a path name for identifying a path name in which tag names leading to each element entity are arranged in a hierarchical order, or the same. The parent node and the tag with the same name appear in the same hierarchy in the same hierarchy and the path hierarchy is moved up by one hierarchy, and the node at the current position matches the specified intermediate node, or the search range has already been reached. Is determined to be included in the search range, if it is a node that meets any of the above conditions, it is determined that all the nodes that have been traced so far are included in the search range, the current position It is determined whether the node does not match the specified intermediate node or is a node that has already been determined to be out of the search range. The process of determining that all the nodes that have been traced are out of the search range is executed each time the user climbs one level from the lowest node, and the search range is specified by repeatedly executing the process up to the node of the highest layer. Method.

19. A general-purpose computer for managing a structured document, structured document input means for inputting the structured document,
In the structured document, which analyzes the structured document fetched by the structured document input means to generate a tree structure of the structured document, and in the structured document represented by the tree structure by the structure analyzing unit, each element entity is The search unit identifier for identification, the path name ID for identifying a path name in which tag names leading to each element entity are arranged in a hierarchical order, and the appearance order of tags having the same parent node and the same name in the same hierarchy A path hierarchy ID for identifying a path hierarchy connected in a hierarchical order, and the path name ID and the path hierarchy ID related to at least the search unit identifier for specifying the path name ID and the path hierarchy ID from the search unit identifier. Structure information creating means for creating an element management table in which character strings are associated, character string index creating means for creating a character string index for performing a character string search, and input of search conditions Search condition input means for performing, search condition analysis means for specifying at least one of the path name ID and the path hierarchy ID corresponding to the search condition input by the search condition input means, and the character string index creation Character string index search means for specifying the search unit identifier of each element entity having a character string corresponding to the search condition using the character string index created by the means, and the search unit identifier specified by the character string index search means The corresponding path name ID or path hierarchy ID is obtained by referring to the element management table based on the above, and the path name ID or the path hierarchy ID and the path name ID or the path hierarchy ID obtained by the search condition analysis means are obtained. A structured document management program for functioning as structure matching means for extracting only matching search unit identifiers.

20. A character string obtained by extracting a general-purpose computer with a predetermined number of characters from each element entity in order to create an index of a structured document including an element entity (child element) further surrounded by tags inside each element entity If the tag spans the tag, a unique search unit identifier for identifying the child element is obtained, and the character string, a search unit identifier for identifying an element entity to which each character of the character string belongs, and the element from which the tag is removed A character index creation program for functioning as a character string index creation means for creating a search character string index comprising a character position identifier indicating the position of the character string in the entity.

21. A general-purpose computer for creating an index of a structured document including a character string surrounded by a tag which defines in advance that the character string is a numeric value. Of the search unit identifier, converts the character string enclosed by the tag into numerical data, and functions as a numerical index creating means for creating a numerical index in which the search unit identifier is associated with the numerical data. String indexing program for

22. A program for specifying nodes included in a search range when data below a predetermined node is designated as a search range in data expressed by a tree structure, and determining whether each node is included in the search range. A first step of initializing a collation table storing a collation flag indicating whether or not the referenced node is within the search range or undecided, based on the collation table If the second step determines that the node is within the search range, the third step of setting a matching flag indicating the inside of the search range for the referencing node in the matching table, and the third step determines that the node is outside the search range. In the case where the fourth step and the second step of setting a collation flag indicating that the node being referred to is outside the search range in the collation table are determined to be undecided. Therefore, if the node referred to further matches the specified node or is already within the search range, a collation flag indicating within the search range is set in the collation table for all the nodes that have been reached so far. If it is determined in the fifth step and the second step that it is undecided, and if the referencing node is already out of the search range, the matching flag indicating the out of range is checked for all the nodes that have been reached so far. If none of the sixth step, fifth step or sixth step set in the table applies, the seventh step goes up one layer from the node currently being referred to, If the ascending node is the root node, a collation flag indicating out of the search range is set in the collation table for all the nodes that have been traced so far. The search range is specified by comprising an eighth step and an eighth step of returning to the fifth step if the node up one hierarchy in the seventh step is other than the root node. Program to do.

23. An apparatus for managing a structured document represented by a tree structure, comprising: structure information creating means for assigning a search unit identifier for identifying an element entity; and specifying an element entity separately from the search unit identifier. Means for storing a path hierarchy in which the appearance order of tags having the same parent node and the same name in the tree structure is linked by hierarchy, and a path name in which the tag names are linked by hierarchy in the tree structure. Means for storing an element management table for associating the path hierarchy and path name with the search unit identifier, and a character for extracting a search unit identifier of an element entity including a character string of a search condition A column index search means,
A structured matching means for referring to the element management table from the search unit identifier extracted by the character string index search means and searching for a document satisfying a path hierarchy or a path name specified as a search condition; .

24. A data management apparatus for managing data having a data structure that can be expressed by a tree structure, wherein the identification of a substance element of the data includes a tag having the same parent node and the same name in the tree structure. A data management device characterized by using a means for storing a path hierarchy in which the appearance order of each is linked by hierarchy.

25. The apparatus further comprises means for storing a path name in which tag names of data expressed in a tree structure are linked by hierarchy.
25. The data management apparatus according to claim 24, wherein a means for storing the path hierarchy and a means for storing the path name are used to uniquely specify a substantial element of the data in the tree structure.

26. The data management device according to claim 25, wherein, when there are a plurality of entity elements having the same parent node and the same tag name, the path name is expressed the same.