JP2960936B2 - Dependency analyzer - Google Patents
Dependency analyzerInfo
- Publication number
- JP2960936B2 JP2960936B2 JP62173011A JP17301187A JP2960936B2 JP 2960936 B2 JP2960936 B2 JP 2960936B2 JP 62173011 A JP62173011 A JP 62173011A JP 17301187 A JP17301187 A JP 17301187A JP 2960936 B2 JP2960936 B2 JP 2960936B2
- Authority
- JP
- Japan
- Prior art keywords
- dependency
- word
- candidate
- relationship
- connection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000004458 analytical method Methods 0.000 claims description 45
- 150000001875 compounds Chemical class 0.000 claims description 34
- 238000000605 extraction Methods 0.000 claims description 13
- 230000001419 dependent effect Effects 0.000 claims description 12
- 239000000284 extract Substances 0.000 claims description 9
- 238000012986 modification Methods 0.000 claims description 4
- 230000004048 modification Effects 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims 1
- 238000000034 method Methods 0.000 description 19
- 238000006243 chemical reaction Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 5
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 229910052710 silicon Inorganic materials 0.000 description 4
- 239000010703 silicon Substances 0.000 description 4
- 239000000758 substrate Substances 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 235000016496 Panda oleosa Nutrition 0.000 description 1
- 240000000220 Panda oleosa Species 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Landscapes
- Machine Translation (AREA)
Description
【発明の詳細な説明】
〔産業上の利用分野〕
この発明は、機械翻訳,自然言語理解,自動索引抽出
などの自然言語処理における文章の係り受け解析方法を
実施するための係り受け解析装置に関するものである。
〔従来の技術〕
文章の係り受け解析は、自然言語処理の前処理であ
り、係り受けの精度が処理全体の性能に大きく影響す
る。そのため、精度の高い係り受け解析方法が強く望ま
れている。
一般に、日本語文の係り受けには、できるだけ受け語
を係り語の近くに配置するという規則がある。しかし、
係り受け関係の全てが上記規則に当てはまらないため、
係り受け関係にあいまい性が生じる。そこで、係り受け
関係を正確に決定するために、種々の方法が従来から提
案されている。
意味情報を付与した動詞の格関係の文型表を用いる方
法を絹川ら(絹川,木村:日本語文構造解析による自動
インデクシング方式、情報処理学会論文誌、vo1.21,no.
3,1980)は提案している。
第6図は上記の係り受け解析方法で用いられている動
詞の格関係の文型表の例である。文中の動詞に着目し、
その動詞に接続する名詞と格助詞およびその名詞句の意
味分類を規定し、その関係を用いて係り受け解析を行っ
ている。しかし、この方法は、全ての動詞について、動
詞と名詞の格関係を記述する困難な作用を必要とする。
さらに、名詞句から名詞句への係り受けや用言の連体修
飾による名詞句への係り受けなどの場合、係り受けの決
定が難しいという問題点がある。
高松,西田ら(高松,日下,西田:技術抄録文からの
関係情報の自動抽出、情報処理学会論文誌vo1.25,no.2,
1984)は、絹川らの文型表による係り受け解析方法の問
題点を専門的知識を用いることにより補っている。この
方法は、特許請求の範囲文などの技術抄録文に対し、動
詞の格構造パターンと個別に記述された専門の分野の知
識を組み合せて、格構造関係だけでは解析できなかった
あいまいな係り受け解析の決定を可能としている。具体
例を用いてその方法を説明する。
まず、入力文の格構造を解析してから、第7図の知識
表へのアクセス表を参照する。次に、そのアクセス表を
基に、文の格構造に対応する格ラベルの組を知識表から
検索する。そして、知識表に書かれている格関係事例を
最尤係り受け候補とし、係り受けを判定する。
第8図は半導体装置に関する知識表の例である。第9
図の文例で上記方法の係り受け解析例を示す。この例で
は“含む”動詞が“シリコン基板上の”に係るか“絶縁
層”に係るかの(a),(b)2通りの解釈が成り立
つ。“含む”型動詞の格としてはOBJとPARTICを取り、
その間には“COMPOSITION"の概念関係があることが第7
図(a)のアクセス表より判断される。
次に、第8図の半導体装置における“COMPOSITION"の
知識表を検索し、OBJとPARTICの格関係の事例を調べ
る。実際には、“チャンネル領域”(PARTIC)と“シリ
コン基板”(OBJ)の格関係事例が知識表より抽出さ
れ、チャンネル領域はシリコン基板に含まれるが、絶縁
層はシリコン基板に含まれないという知識が抽出でき、
この例では(a)の係り受けが正しいと判断される。
〔発明が解決しようとする問題点〕
この方法により連体修飾に関する係り受け解析を正確
に行うことができるが、構成素など各種の知識表や各専
門分野ごとに綿密な知識表を作成しなければならないと
いう問題点がある。
この発明の目的は、特許請求の範囲文などのように繰
り返し表現の多い文章の構文解析に際し、詳細な格関係
や専門的知識を用いずに、文章内の情報により係り受け
のあいまい性を解消する係り受け解析方法を提供するこ
とにある。
[問題点を解決するための手段]
本発明にかかる係り受け解析装置は、係り語と受け語
との対からなる接続関係を記録する接続テーブルと、係
り語の意味カテゴリ番号と受け語の意味カテゴリ番号と
の対からなる意味カテゴリ番号の連結関係を記録する意
味カテゴリ番号の連結関係リストとからなり、前記接続
関係と前記意味カテゴリ番号の連結関係との対を係り受
け関係として記録した係り受け関係テーブルからなる係
り受け候補保存部を有する。
また、入力文を文節に分割する文節単位分割部と、前
記入力文の先頭から順次分割された文節を抽出する文節
抽出部と、前記抽出された文節を係り語とし残りの文節
を受け語とする対からなる文節間係り受け候補を抽出す
る係り受け候補抽出部と、前記抽出された文節の自立語
が2個以上の単語からなる複合語であるとき前記複合語
を単語に分割して接頭語と接尾語を取り除く複合語分割
部と、前記複合語の単語間係り受け関係を係り語と直後
の単語に係る受け語の対を接続関係とし、前記係り語の
意味カテゴリと受け語の意味カテゴリの対を意味カテゴ
リ番号の連結関係とする複合語係り受け解析部と、前記
単語間係り受け関係を前記係り受け候補保存部に登録
し、前記文節間係り受け候補が1つである場合その候補
を前記係り受け候補保存部に新規に登録する係り受け関
係登録部を有する。
前記文節間係り受け候補が2つ以上の場合、前記候補
の接続関係と一致するものが前記接続テーブルの中から
検索できたとき、そのうち係り語と受け語の距離が最も
近い候補を正解と判定し、正解が判定できなかった場
合、前記候補の連結関係と一致するものが前記意味カテ
ゴリ連結関係リストの中から検索できたとき、そのうち
係り語と受け語の距離が最も近い候補を正解と判定する
係り受け候補検索部と、前記係り受け候補検索部におい
て正解が判定できなかった場合に前記候補の中の係り語
とする意味カテゴリ番号と受け語の意味カテゴリ番号が
等しい候補が存在するか否かを判定し、存在する場合に
その候補を正解とし、存在しない場合に文節間係り受け
の判定を保留する係り受け判定部を有する。
すべての文節の係り受け解析が終了した時点で正解を
決定できなかった文節間係り受け候補について、係り語
と受け語の距離が最も短い文節間係り受け候補を正確と
する、係り受け関係決定部とを有する。
なお、前記係り受け候補検索部は、前記係り受けの判
定を保留した候補の接続関係が前記に登録された接続関
係と一致する候補を正解と判定し、正解が判定されなか
った場合に前記判定を保留した候補の連結関係が前記新
規に登録された連結関係と一致する候補を正解と判定す
ることを特徴とするものである。
[作用]
本発明にかかる係り受け解析装置では、係り受け候補
保存部は係り受け関係テーブルとして係り語と受け語と
の対からなる接続関係を記録する接続テーブルと、係り
語の意味カテゴリ番号と受け語の意味カテゴリ番号との
対からなる意味カテゴリ番号の連結関係を記録する意味
カテゴリ番号の連結関係リストとからなり、接続関係と
意味カテゴリ番号の連結関係との対を係り受け関係とし
て記録する。
文を入力すると入力文は文節単位分割部によって文節
に分割される。文節抽出部は、入力文の先頭から順次分
割された文節を抽出し、係り受け候補抽出部は抽出され
た文節を係り語とし残りの文節を受け語とする対からな
る文節間係り受け候補を抽出する。
複合語分割部は、抽出された文節の自立語が2個以上
の単語からなる複合語であるとき、複合語を単語に分割
して接頭語と接尾語を取り除く。複合語係り受け解析部
は単語間係り受け関係として、係り語と直後の単語に係
る受け語の対を接続関係とし、係り語の意味カテゴリと
受け語の意味カテゴリの対を意味カテゴリ番号の連結関
係とする。
係り受け関係登録部は単語間係り受け関係を係り受け
候補保存部に新規に登録し、文節間係り受け候補が1つ
である場合、該候補を係り受け候補保存部に新規に登録
する。
係り受け候補検索部は、文節間係り受け候補が2つ以
上の場合、該候補の接続関係と一致するものが接続テー
ブルの中から検索できたとき、そのうち係り語と受け語
の距離が最も近い候補を正解と判定し、正解が判定でき
なかった場合、該候補の連結関係と一致するものが意味
カテゴリ連結関係リストの中から検索できたとき、その
うち係り語と受け語の距離が最も近い候補を正解と判定
する。
係り受け判定部は、係り受け候補検索部において正解
が判定できなかった場合に該候補の中の係り語の意味カ
テゴリ番号と受け語の意味カテゴリ番号が等しい候補が
存在するか否かを判定し、存在する場合に該候補を正解
とし、存在しない場合に文節間係り受けの判定を保留す
る。
係り受け関係決定部は、すべての文節の係り受け解析
が終了した時点で正解を決定できなかった文節間係り受
け候補のうち係り語と受け語の距離が最も短い文節間係
り受け候補を正解とする。
なお、係り受け候補検索部は、係り受けの判定を保留
した候補の接続関係が新規に登録された接続関係と一致
する候補を正解と判定し、正解が判定できなかった場
合、係り受けの判定を保留した候補の連結関係が新規に
登録された連結関係と一致する候補を正解と判定する。
よって、本発明の係り受け解析装置は入力文を分割し
て得られた文節間の係り受け関係を1通りに定めること
ができる。ここで係り受け解析の際には複合語を構成す
る単語間係り受け関係や1つに定められた文節間係り受
け候補が係り受け候補保存部に登録され、その後の係り
受け解析に用いられる。
〔実施例〕
以下、この発明の実施例について説明する。
この発明の実施例においては、係り受け関係の係り語
と受け語はともに文節を単位とする。また、係り受け関
係としては、係り語と受け語の意味カテゴリの連結関係
および係り語と受け語の接続関係を用いる。ここで、意
味カテゴリとは、単語の持つ共通的な意味概念を表して
おり、各単語には、その概念に対応した意味カテゴリ番
号等が割り当てられる。この実施例で使用している意味
カテゴリ番号は、国立国語研究所発行の分類語彙表(国
立国語研究所資料集6 分類語彙表、秀英出版、1964)
に記載されているものを用いる。
第1図はこの発明の係り受け解析装置に用いる係り受
け解析の処理の流れ図、第2図はこの発明に係る係り受
け解析装置の一実施例を示すブロック図である。まず、
第2図に示される構成について述べ、次に第1図により
その動作を説明する。
第2図において、1は文節単位分割部で、入力文を文
節に分割する。2は文節抽出部で、入力文の先頭から順
次分割された文節を抽出する。3は係り受け候補抽出部
で、抽出された文節を係り語とし残りの文節を受け語と
する対について文節間係り受け候補を抽出する。4は複
合語分割部で、抽出された文節の自立語が2個以上の単
語からなる複合語であるときに自立語を単語に分割す
る。5は複合語係り受け解析部で、単語間の係り受け関
係を解析する。6は係り受け候補保存部で、係り語と受
け語との対からなる接続関係を記録する接続テーブル
と、係り語の意味カテゴリ番号と受け語の意味カテゴリ
番号との対からなる意味カテゴリ番号の連結関係を記録
する意味カテゴリ番号の連結関係リストとからなり、接
続関係と意味カテゴリ番号の連結関係との対を係り受け
関係として記録する係り受け関係テーブルからなる。7
は係り受け関係登録部で、単語間係り受け候補を係り受
け候補保存部6に新規に登録し、文節間係り受け候補が
1通りのとき、該候補を係り受け候補保存部6に新規に
登録する。8は係り受け候補検索部で、文節間係り受け
候補が2通り以上の場合、該候補の接続関係と一致する
ものを接続テーブルの中から検索し、そのうち係り語と
受け語との距離が最も近い候補を正解と判定する。正解
が判定できなかった場合、該候補の関連関係と一致する
ものが前記意味カテゴリ連結関係リストの中から検索
し、そのうち係り語と受け語の距離が最も近い候補を正
解と判定する。
9は係り受け判定部で、係り受け候補検索部8におい
て正解が判定できなかった場合、該候補の中の係り語の
意味カテゴリ番号と受け語の意味カテゴリ番号が等しい
候補が存在するか否かを判定する。存在する場合には該
候補を正解とし、存在しない場合には文節間係り受け関
係の判定を保留する。係り受け関係の判定が保留された
候補については、係り受け候補検索部8がその接続関係
が新規に登録された接続関係と一致する候補を正解と判
定し、正解が判定できなかった場合、その連結関係が新
規に登録された連結関係と一致する候補を正解と判定す
る。
係り受け関係決定部10はすべての文節の係り受け解析
が終了した時点で、正解を決定できなかった文節間係り
受け候補のうち係り語と受け語の距離が最も短くなる文
節間係り受け候補を正解する。
なお、第2図の(1)〜(16)なる番号は各構成要素
で実行される係り受け解析のステップを示し、第1図中
のステップ(1)〜(16)と各々対応する。各ステップ
は次の手順で実行される。まず、ステップ(1)として
文節単位分割部1は入力文を文節に分割する。ステップ
(2)で文節抽出部2は入力文の先頭から順次分割され
た文節を抽出する。ステップ(3)では係り受け候補抽
出部3で、抽出された文節を係り語として対象文節中の
自立語の品詞、活用形およびとする付属語の種類により
受け語となりえる全ての文節候補、即ち文節間係り受け
候補を抽出する。ステップ(4)では複合語分割部4で
係り語を構成する自立語が2個以上の単語からなる複合
語であるか否かを判断し、自立語が2個以上の単語から
なる複合語であるときにステップ(5)に進み、そうで
ないときにはステップ(8)に進む。
ステップ(5)で複合語を単語に分割し、接頭語、接
尾語を取り除く。ステップ(6)では複合語係り受け解
析部5で、係り語と直後の単語に係る受け語の対からな
る接続関係と係り語の意味カテゴリ番号と受け語の意味
カテゴリ番号との対からなる意味カテゴリ番号の連結関
係、即ち単語間係り受け関係を解析する。ステップ
(7)では単語間係り受け関係を係り受け関係登録部7
が係り受け候補保存部6に記録する。ステップ(8)で
は、文節間係り受け候補が1つの場合ステップ(9)を
実行する。ステップ(9)では係り受け関係登録部7に
より該候補を係り受け候補保存部6に新規に記録する。
これらの処理により文節間係り受け解析において、文節
間係り受け情報のみならず単語間係り受け情報も利用で
きるようになる。
文節間係り受け候補が2つ以上の場合ステップ(12)
を実行する。ステップ(12)は係り受け候補検索部8
で、該候補の接続関係と一致するものを接続テーブルの
中から検索し、その中で係り語と受け語の距離が最も短
い候補を正解と判定し、正解が判定できなかった場合、
該候補の連結関係と一致するものを意味カテゴリ連結関
係リストの中から検索し、その中で係り語と受け語の距
離が最も短い候補を正解と判定する。ステップ(13)は
係り受け候補検索部8で正解が判定できた場合ステップ
(17)に進み、正解が判定できなかった場合ステップ
(14)に進む。ステップ(14)で、係り受け判定部9は
該候補の中の係り語の意味カテゴリ番号と受け語の意味
カテゴリ番号が等しい候補が存在するか否かを判定す
る。ステップ(15)では存在する場合に該候補を正解と
判定し、ステップ(17)に進む。存在しない場合にはス
テップ(16)に進み、係り受け判定を保留してステップ
(17)に進む。ステップ(17)では係り語の対象を次文
節に進める。
係り受け判定が保留された係り受け候補について、ス
テップ(10)では係り受け候補検索部8において接続関
係がステップ(7)及びステップ(9)において新規に
登録された接続関係と一致する候補を正解として文節間
係り受け関係を判定する。正解が判定されなかったとき
連結関係が新規に登録された連結関係と一致する候補を
正解として文節間係り受け関係を決定する。
なお、ステップ(11)ですべての文節の係り受け解析
が終了した時点で、係り受け関係決定部10は正解を決定
できなかった係り受け候補について、係り語と受け語の
距離が最も短い候補を正解として決定する。
以下に、特許請求範囲文の解析を例にとって、この発
明の係り受け解析装置で用いられる係り受け解析につい
て説明する。第3図は特許請求範囲文の文例を示す。こ
の文を係り受け解析すると、1通りに決定できない係り
受け候補が存在する。例えば例文中[1]の例では、
「作成中の」という文節を係り語とすると、次の「文章
中に」、「変換結果の」、「同音語を」、「手段と」を
各々受け語とする4通りの係り受け関係が可能にある。
第4図の係り受け関係テーブルには「作成中の」を係り
語として受け語との対からなる接続関係が認められない
ので、接続関係を用いて係り受け関係の候補を絞ること
はできない。そのため、係り語の意味カテゴリ番号と受
け語の意味カテゴリ番号との対からなる意味カテゴリ番
号の連結関係を用いて係り受け関係を決定する。
第4図は「作成中の」の文節まで係り受け解析を実行
して得られた係り受け関係テーブルの例を示す。ここ
で、12は係り語と係り語の意味カテゴリ番号を、13は受
け語と受け語の意味カテゴリ番号を、14は係り語の意味
カテゴリ番号と受け語の意味カテゴリ番号との対からな
る連結関係を記録した意味カテゴリ番号の連結関係リス
トを示す。特に、12と13とは係り語と受け語を対にした
係り語と受け語の接続関係を示すので、12と13とを総括
して接続テーブルと見なすことができる。また、係り受
け関係テーブルがこのような配置で構成されることによ
って、係り受け関係つまり接続関係と連結関係とが対応
付けられて記録されている。ここで、最初の2組の「カ
ナ」−「漢字」と「漢字」−「変換」の関係は第5図に
示される複合語「カナ漢字変換」から抽出された係り受
け関係であり、3番目の「カナ漢字変換」−「変換結
果」の関係は係り語「カナ漢字変換時の」と受け語「変
換結果の」を係り受け解析して得られた係り受け関係で
ある。複合語としての意味カテゴリ番号として、通常、
複合語の意味カテゴリ番号よりも複合語を構成する単語
群の中で複合語の意味を明確に示す単語の意味カテゴリ
番号が用いられる。特許請求範囲文において用いられる
複合語は、最後の単語(接尾語を除く)によってその意
味が表現される場合が多い。第4図の例では一番最後の
単語の意味カテゴリ番号を複合語の意味カテゴリ番号と
して用いている。第4図における最後の3つのリストは
「作成中の文中に表示する手段と」(第3図1行目)と
いう文から抽出した係り受け関係である。この時点まで
に作成された係り受け関係を用いて[1]に示される文
に対する係り受け解析について例を示す。
[1]の係り受け候補(c),(d),(e),
(f)の連結関係はそれぞれ(1.386,1.3154),(1.38
6,1.1112),(1.386,1.3112),(1.386,1.1113)とな
る。この中で第4図の意味カテゴリ番号の連結関係リス
トに記録されている連結関係と一致する(c)の(1.38
6,1.3154)が検索される。つまり、意味カテゴリ番号1.
386が示す生成関係の単語を係り語とし、意味カテゴリ
番号1.3154が示す文章関係の単語を受け語とする係り受
け関係が候補として選ばれる。そのため、生成関連の単
語である「作成」が係り語と文章関連の単語である「文
章」が受け語の候補となる係り受け関係があることがわ
かり、「作成中の」は「文章中に」に係るのが正解であ
るということが判明する。
この例で示されるように、(1.386,1.3154)となる連
結関係から「作成する」という動詞(意味カテゴリ番号
1.386)は意味カテゴリ番号1.3154の単語(文章、論
文、文,……)を格として持つ可能性を表している。そ
のため、必ずしも同じ単語が係り受けに用いられていな
くとも、意味カテゴリ番号の同じ単語同志で、同様な係
り受け関係をとると判定することができる。
例えば「文を作るとき…」という文について「文を」
という文節と「作る」という文節との間に係り受け関係
の判定にも用いることができる。また、連結関係で係り
と受けを区別せずに、意味カテゴリ番号の連結関係リス
トを用いて「作成された文章の…」という係り語と受け
語とが逆になった係り受け関係も判定できる。
第3図の[3]の例で、一致する接続関係と連結関係
が係り受け関係テーブルに記録されていない係り受け候
補についての処理例を示す。ここで、文節「選択され
る」は連体修飾形であるため、名詞(句)つまり「同音
語表示選択手段を」と「同音語出力方式」のどちらにも
係る可能性があったとする。ここでは係り受け候補の接
続関係も連結関係も係り受けテーブルに一致するものが
認められなかったことを仮定する。このときステップ
(14)を実行することになる。そこで、意味カテゴリ番
号が単語「選択」と等しい1.3063となる係り語と、意味
カテゴリ番号が単語「選択」と等しい1.3063となる単語
から構成される受け語、からなるものを係り受け全候補
から絞り込む。この場合、複合語「同音語表示選択手
段」の「選択」の意味カテゴリ番号が1.3063であるた
め、「選択させる」を係り語としたとき「同音語表示選
択手段」が受け語となる係り受け関係が特定される。
それでも係り語の意味カテゴリ番号と受け語の意味カ
テゴリ番号が一致しない係り受け候補が、第3図[2]
の例のように2種類以上存在することがありうる。ここ
では、「変換結果の」という文節を係り語とするとき
「同音語を」(g)と「手段と」(h)を各々受け語と
する。そのため文節「変換結果」を係り語とする係り受
け候補をすべて保存して係り受けの判定を保留し、次文
節の係り受け解析を実行する。他の文節の係り受け解析
を実行した結果、文節「変換結果」を係り語とする係り
受け関係は第3図2行目の波線箇所11で示される範囲に
おける係り受け関係の解析結果を用いて決定することが
できる。つまり、波線箇所11における係り語「変換結果
の」からは受け語「同音語を」が1通りに決定される。
これらの接続関係が接続テーブルに記録され、単語「変
換結果」の意味カテゴリ番号1.1112と単語「同音語」の
意味カテゴリ番号1.3112との対からなる連結関係が意味
カテゴリの連結関係リストに記録される。そのため、文
節「変換結果の」が文節「同音語を」に係る係り受け関
係を、新たに記録された接続関係または連結関係とから
決定できる。つまり、第3図の[2]の例では後の文節
を係り受け解析することにより、先の文節を係り語とす
る(g)の係り受け関係が正解であると判断される。
なお、第3図の[4]の例では、文節「同音語」を係
り語としたとき図に示すように(k)「表示して」,
(l)「選択させる」,(m)「備えたことを」,
(n)「特徴とする」とをそれぞれ受け語とする4種類
の係り受け候補がある。これらの係り受け関係に対応す
る意味カテゴリ番号の連結関係と一致する意味カテゴリ
番号の連結関係は、図4の意味カテゴリ番号の連結関係
リストには認められない。しかし、複合語「同音表示選
択手段」を単語に分割してから、「同音語」を係り語
「表示」を受け語とする係り受け関係等、その単語間の
係り語と受け語との連結関係を決定することにより、文
節「同音語を」は文節(k)「表示して」に係ると決定
することができる。
〔発明の効果〕
この発明の係り受け解析装置は、係り受けが一義に決
定できる係り受け関係を係り受け候補保存部に順次保存
し、係り受けが一義に決定できない係り受け関係の判定
に際し、係り受け候補保存部を参照して、この係り受け
候補保存部に存在する係り受け関係をこの係り受け候補
保存部に存在しない係り受け関係に優先して採用するも
のであるから、特別な専門的知識を用いずに、各文章内
にある情報を抽出し利用することにより、精度の高い係
り受け解析を実現することができる。そして、係り受け
関係を意味カテゴリの連結としてえられた場合、動詞の
格関係に基づく係り受けや、名詞句による修飾を一元的
に扱うことができ、かつ言い替えを含む文章に対しても
係り受け関係を一義的に決定できる。この発明を用いれ
ば、専門的知識を分野毎に作成するために必要となるコ
ストを節減することができる。また、この方法をハード
ウエア、ソフトウエアいずれで実現する場合において
も、専門的知識を常駐させておく領域を必要としないた
め、非常に小規模のシステムとすることができる。
したがって、この発明の係り受け解析装置は、特許文
や法律文のような同じ表現を繰り返し用いている文章の
解析に有効である。さらに、本解析装置は文章の自動分
類、自動索引抽出、自動要約などの自動文書処理に応用
することができる。また、日本語の文章に限らず英文に
対しても適用することが可能である等の優れた利点があ
る。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a dependency analysis apparatus for performing a sentence dependency analysis method in natural language processing such as machine translation, natural language understanding, and automatic index extraction. Things. [Related Art] Dependency analysis of a sentence is a pre-process of natural language processing, and the accuracy of the dependency greatly affects the performance of the entire processing. Therefore, a highly accurate dependency analysis method is strongly desired. In general, there is a rule in Japanese language dependency that a received word is placed as close to the word as possible. But,
Not all of the dependency relationships fall under the above rules,
Ambiguous dependency relationships occur. Therefore, various methods have been conventionally proposed in order to accurately determine the dependency relationship. Kinukawa et al. (Kinukawa, Kimura: Automatic indexing method by Japanese sentence structure analysis, IPSJ Transactions, vo1.21, no.
3, 1980) has proposed. FIG. 6 is an example of a sentence pattern table of verb case relations used in the above dependency analysis method. Focus on verbs in sentences,
The noun and case particle connected to the verb and the semantic classification of the noun phrase are defined, and dependency analysis is performed using the relation. However, this method requires a difficult action to describe the case relation between the verb and the noun for all verbs.
Furthermore, in the case of dependency from a noun phrase to a noun phrase or dependency on a noun phrase by modifying a continuation of a word, there is a problem that it is difficult to determine the dependency. Takamatsu, Nishida et al. (Takamatsu, Kusaka, Nishida: Automatic extraction of related information from technical abstracts, IPSJ Transactions vo1.25, no.2,
1984) supplements the problem of the dependency analysis method based on the sentence pattern table of Kinukawa et al. This method combines a case structure pattern of a verb with knowledge of a specialized field described individually in a technical abstract such as a claim sentence, and cannot be analyzed by case structure relation alone. The analysis can be determined. The method will be described using a specific example. First, the case structure of the input sentence is analyzed, and then the access table to the knowledge table in FIG. 7 is referred to. Next, based on the access table, a set of case labels corresponding to the case structure of the sentence is searched from the knowledge table. Then, the case related to the case described in the knowledge table is set as the maximum likelihood dependency candidate, and the dependency is determined. FIG. 8 is an example of a knowledge table regarding a semiconductor device. Ninth
An example of dependency analysis of the above method is shown in the sentence example in the figure. In this example, two interpretations (a) and (b) hold whether the verb “contain” relates to “on the silicon substrate” or “insulating layer”. OBJ and PARTIC are taken as the case of the "contain" type verb,
The seventh is that there is a conceptual relationship of "COMPOSITION" between them.
It is determined from the access table in FIG. Next, a knowledge table of “COMPOSITION” in the semiconductor device of FIG. 8 is searched to check an example of a case relationship between OBJ and PARTIC. Actually, case examples of "channel region" (PARTIC) and "silicon substrate" (OBJ) are extracted from the knowledge table, and the channel region is included in the silicon substrate, but the insulating layer is not included in the silicon substrate. Knowledge can be extracted,
In this example, it is determined that the dependency of (a) is correct. [Problems to be solved by the invention] Dependency analysis on association modification can be performed accurately by this method, but unless various knowledge tables such as constituents and detailed knowledge tables for each specialized field are created, There is a problem that it does not. An object of the present invention is to eliminate the ambiguity of dependency by using information in a sentence without using detailed case relations or specialized knowledge when parsing a sentence having a lot of repeated expressions such as a claim sentence. Another object of the present invention is to provide a dependency analysis method. [Means for Solving the Problems] A dependency analyzing apparatus according to the present invention provides a connection table for recording a connection relationship consisting of a pair of a dependency word and a received word, a meaning category number of the dependency word, and a meaning of the received word. A semantic category number connection relation list that records a semantic category number connection relation consisting of a pair with a category number, and a dependency in which a pair of the connection relation and the semantic category number connection relation is recorded as a dependency relation. It has a dependency candidate storage unit composed of a relation table. Further, a phrase unit dividing unit that divides the input sentence into phrases, a phrase extracting unit that extracts the phrases that are sequentially divided from the beginning of the input sentence, and a phrase that receives the remaining phrases as the related words and receives the remaining phrases as words A dependency candidate extracting unit for extracting inter-phrase dependency candidates composed of pairs of words, and a prefix that divides the compound word into words when the independent word of the extracted phrase is a compound word composed of two or more words. A compound word division unit for removing a word and a suffix, and a word-to-word dependency relationship between the compound words as a connection relationship, and a meaning category of the word and a meaning of the word. A compound word dependency analyzing unit for linking a category pair to a semantic category number connection, and the inter-word dependency relationship is registered in the dependency candidate storage unit, and when the inter-phrase dependency candidate is one, Save the candidate Having a dependency relationship registration unit that registers the new part. When the inter-phrase dependency candidates are two or more, when a candidate that matches the connection relation of the candidates can be retrieved from the connection table, the candidate with the closest distance between the dependency word and the received word is determined to be the correct answer. However, if the correct answer cannot be determined, and a candidate that matches the connection relation of the candidates can be retrieved from the semantic category connection relation list, the candidate with the closest distance between the dependent word and the received word is determined to be the correct answer. A dependency candidate search unit that determines whether a candidate has the same semantic category number as a dependency word in the candidate when the correct answer cannot be determined in the dependency candidate search unit. A dependency determination unit that determines whether the candidate is correct if the word exists, and suspends the determination of inter-phrase dependency if the word does not exist. Dependency relation determination unit that, for the inter-phrase dependency candidates for which the correct answer could not be determined at the time of completion of the dependency analysis of all phrases, accurately determines the inter-phrase dependency candidate with the shortest distance between the dependency word and the received word. And Note that the dependency candidate search unit determines that a candidate whose connection relationship with the candidate for which the determination of the dependency has been suspended matches the connection relationship registered in the above is determined to be a correct answer. Is determined as a correct answer if the connection relationship of the candidate for which the reservation is suspended coincides with the newly registered connection relationship. [Operation] In the dependency analyzing apparatus according to the present invention, the dependency candidate storage unit stores, as a dependency relationship table, a connection table that records a connection relationship composed of a pair of a dependency word and a reception word, a semantic category number of the dependency word, and the like. A semantic category number concatenation relation list that records concatenation relations of semantic category numbers composed of pairs with semantic category numbers of received words, and records pairs of connection relations and semantic category number concatenation relations as dependency relations . When a sentence is input, the input sentence is divided into phrases by a phrase unit dividing unit. The phrase extraction unit extracts phrases that are sequentially divided from the beginning of the input sentence, and the dependency candidate extraction unit extracts inter-phrase dependency candidates consisting of pairs in which the extracted phrases are used as the dependent words and the remaining phrases are used as the received words. Extract. When the extracted independent word of the phrase is a compound word composed of two or more words, the compound word dividing unit divides the compound word into words and removes the prefix and the suffix. The compound dependency analysis unit connects the pair of the dependency word and the received word relating to the immediately succeeding word as the inter-word dependency relationship, and connects the pair of the meaning category of the dependency word and the meaning category of the received word to the semantic category number. Relationship. The dependency relation registration unit newly registers the dependency relation between words in the dependency candidate storage unit, and newly registers the candidate in the dependency candidate storage unit when there is one inter-phrase dependency candidate. When the number of inter-phrase dependency candidates is two or more, the dependency candidate search unit searches the connection table for a candidate that matches the connection relationship of the candidate, and the distance between the dependency word and the received word is the shortest. If the candidate is determined to be correct, and the correct answer cannot be determined, the candidate that has the closest relationship between the dependent word and the received word is found when the candidate that matches the connection relationship of the candidate can be searched from the semantic category connection relationship list. Is determined to be the correct answer. The dependency determination unit determines whether there is a candidate having the same semantic category number of the dependency word and the semantic category number of the received word in the candidate when the correct candidate cannot be determined by the dependency candidate search unit. If it exists, the candidate is regarded as the correct answer, and if it does not exist, the determination of inter-phrase dependency is suspended. The dependency relation determination unit determines that the inter-phrase dependency candidate with the shortest distance between the dependency word and the received word among the inter-clause dependency candidates for which the correct answer could not be determined at the time when the dependency analysis of all the clauses has been completed is determined to be the correct answer. I do. Note that the dependency candidate search unit determines that the candidate whose connection relation has been suspended and whose match is identical to the newly registered connection relation is a correct answer. If the correct answer cannot be determined, the dependency Is determined as a correct answer if the connection relationship of the candidate for which is suspended is coincident with the newly registered connection relationship. Therefore, the dependency analyzing apparatus of the present invention can determine the dependency relationship between the phrases obtained by dividing the input sentence in one way. Here, at the time of the dependency analysis, the dependency relationship between words constituting a compound word and the inter-segment dependency candidate defined as one are registered in the dependency candidate storage unit and used for the subsequent dependency analysis. [Example] Hereinafter, an example of the present invention will be described. In the embodiment of the present invention, both the dependency word and the dependency word in the dependency relationship are in terms of a phrase. As the dependency relationship, a connection relationship between the dependency category and the semantic category of the dependency word and a connection relationship between the dependency word and the dependency word are used. Here, the semantic category represents a common semantic concept of a word, and a semantic category number or the like corresponding to the concept is assigned to each word. The semantic category numbers used in this example are classified vocabulary tables published by the National Institute of Japanese Language and Linguistics (National Institute for Japanese Language Linguistics 6 Classified vocabulary tables, Hideei Shuppan, 1964)
Use the one described in. FIG. 1 is a flowchart of a dependency analysis process used in the dependency analysis device of the present invention, and FIG. 2 is a block diagram showing an embodiment of the dependency analysis device according to the present invention. First,
The configuration shown in FIG. 2 will be described, and then the operation will be described with reference to FIG. In FIG. 2, reference numeral 1 denotes a phrase unit dividing unit, which divides an input sentence into phrases. Reference numeral 2 denotes a phrase extraction unit which extracts phrases sequentially divided from the beginning of the input sentence. Reference numeral 3 denotes a dependency candidate extraction unit that extracts inter-phrase dependency candidates for pairs in which the extracted phrase is used as a dependency word and the remaining phrases are used as receiving words. Reference numeral 4 denotes a compound word division unit that divides an independent word into words when the extracted independent word of the phrase is a compound word composed of two or more words. Reference numeral 5 denotes a compound word dependency analysis unit for analyzing the dependency relationship between words. Reference numeral 6 denotes a dependency candidate storage unit, which stores a connection table that records a connection relationship consisting of a pair of a dependency word and a reception word, and a semantic category number consisting of a pair of a meaning category number of the dependency word and a meaning category number of the reception word. It is composed of a connection list of semantic category numbers for recording connection relations, and is a dependency relation table for recording pairs of connection relations and connection relations of semantic category numbers as dependency relations. 7
Is a dependency relation registration unit, which newly registers the inter-word dependency candidate in the dependency candidate storage unit 6 and, when there is only one inter-phrase dependency candidate, newly registers the candidate in the dependency candidate storage unit 6 I do. Reference numeral 8 denotes a dependency candidate search unit. When there are two or more inter-phrase dependency candidates, a connection table that matches the connection relation of the candidate is searched from the connection table, and the distance between the dependency word and the received word is the shortest. A close candidate is determined as a correct answer. If the correct answer cannot be determined, a candidate that matches the related relationship of the candidate is searched from the semantic category connection relationship list, and a candidate having the closest distance between the dependent word and the received word is determined to be the correct answer. Reference numeral 9 denotes a dependency determination unit. If the correct candidate cannot be determined by the dependency candidate search unit 8, whether or not there is a candidate having the same semantic category number of the dependency word and the semantic category number of the received word in the candidate Is determined. If it exists, the candidate is regarded as the correct answer. If it does not exist, the determination of the inter-phrase dependency relation is suspended. For the candidate for which the determination of the dependency relationship has been suspended, the dependency candidate search unit 8 determines that the candidate whose connection relationship matches the newly registered connection relationship is a correct answer. A candidate whose connection relationship matches the newly registered connection relationship is determined to be correct. When the dependency analysis of all the clauses is completed, the dependency relation determination unit 10 determines the inter-clause dependency candidate in which the distance between the dependency word and the subject word is the shortest among the inter-clause dependency candidates for which the correct answer could not be determined. Correct answer. The numbers (1) to (16) in FIG. 2 indicate the steps of the dependency analysis performed by each component, and correspond to the steps (1) to (16) in FIG. Each step is performed in the following procedure. First, as a step (1), the phrase unit dividing unit 1 divides an input sentence into phrases. In step (2), the phrase extraction unit 2 extracts phrases sequentially divided from the beginning of the input sentence. In step (3), the dependency candidate extraction unit 3 uses the extracted phrase as a dependency, and determines all the phrase candidates that can be received words according to the part of speech of the independent word in the target phrase, the inflected form, and the type of the auxiliary word to be used, ie, Extract inter-phrase dependency candidates. In step (4), the compound word dividing unit 4 determines whether or not the independent word constituting the hang-up word is a compound word composed of two or more words. When there is a certain time, the process proceeds to step (5), and when not, the process proceeds to step (8). In step (5), the compound word is divided into words, and prefixes and suffixes are removed. In step (6), the compound word dependency analysis unit 5 performs a connection relation consisting of a pair of the dependency word and the immediately following word, and a meaning consisting of a pair of the meaning category number of the dependency word and the meaning category number of the received word. The connection relationship between category numbers, that is, the dependency relationship between words is analyzed. In step (7), the dependency relation registration unit 7 stores the dependency relation between words.
Is recorded in the dependency candidate storage unit 6. In step (8), if there is one inter-phrase dependency candidate, step (9) is executed. In step (9), the candidate is newly recorded in the dependency candidate storage unit 6 by the dependency relationship registration unit 7.
Through these processes, in the inter-phrase dependency analysis, not only inter-phrase dependency information but also inter-word dependency information can be used. Step (12) when there are two or more inter-phrase dependency candidates
Execute Step (12) is the dependency candidate search section 8
In the connection table that matches the connection relationship of the candidate is searched from the connection table, and the candidate with the shortest distance between the dependency and the received word is determined as the correct answer. If the correct answer cannot be determined,
A candidate that matches the connection relation of the candidate is searched from the semantic category connection relation list, and a candidate having the shortest distance between the dependency word and the received word is determined to be the correct answer. Step (13) proceeds to step (17) if the correct answer can be determined by the dependency candidate search unit 8, and proceeds to step (14) if the correct answer cannot be determined. In step (14), the dependency determining unit 9 determines whether or not there is a candidate having the same semantic category number of the dependency word and the semantic category number of the received word in the candidate. In step (15), when the candidate exists, the candidate is determined to be correct, and the process proceeds to step (17). If not present, the process proceeds to step (16), where the dependency determination is suspended, and the process proceeds to step (17). In step (17), the target of the hang-up word is advanced to the next clause. Regarding the dependency candidates for which the dependency determination has been suspended, in the step (10), the candidate whose connection relation matches the connection relation newly registered in the step (7) and the step (9) in the dependency candidate search unit 8 is correctly answered. To determine the dependency relationship between phrases. When a correct answer is not determined, a candidate whose connection relation matches the newly registered connection relation is determined as a correct answer, and the inter-phrase dependency relation is determined. When the dependency analysis of all phrases is completed in step (11), the dependency relation determination unit 10 determines, for the dependency candidates for which the correct answer could not be determined, the candidate having the shortest distance between the dependency word and the received word. Determine as the correct answer. Hereinafter, the dependency analysis used in the dependency analysis apparatus of the present invention will be described by taking the analysis of the claims as an example. FIG. 3 shows an example of a claim sentence. When this sentence is subjected to dependency analysis, there are dependency candidates that cannot be determined in one way. For example, in the example of [1] in the example sentence,
Assuming that the phrase “under construction” is a dependency, there are four types of dependency relations with the following “in the sentence”, “conversion result”, “same phonetic”, and “means” as the receiving words. It is possible.
In the dependency relation table shown in FIG. 4, since a connection relation consisting of a pair with a reception word using "under construction" as a dependency word is not recognized, it is not possible to narrow down candidates for the dependency relation using the connection relation. Therefore, the dependency relationship is determined by using the concatenation relationship of the semantic category numbers, which are pairs of the semantic category numbers of the dependency words and the semantic category numbers of the receiving words. FIG. 4 shows an example of a dependency relation table obtained by executing the dependency analysis up to the phrase “under construction”. Here, 12 is a dependency word and a meaning category number of the dependency word, 13 is a receiving word and a meaning category number of the receiving word, and 14 is a concatenation consisting of a pair of the meaning category number of the dependency word and the meaning category number of the receiving word. 6 shows a connection list of semantic category numbers in which the relation is recorded. In particular, since 12 and 13 indicate the connection relationship between the dependent word and the received word in which the dependent word and the received word are paired, 12 and 13 can be regarded collectively as a connection table. Further, since the dependency relation table is configured in such an arrangement, the dependency relation, that is, the connection relation and the connection relation are recorded in association with each other. Here, the relationship between the first two pairs of “kana”-“kanji” and “kanji”-“conversion” is a dependency relationship extracted from the compound word “kana-kanji conversion” shown in FIG. The second relationship between “kana-kanji conversion” and “conversion result” is a dependency relationship obtained by performing dependency analysis on the dependency word “at the time of kana-kanji conversion” and the receiving word “conversion result”. As a semantic category number as a compound,
A semantic category number of a word that clearly indicates the meaning of a compound in a group of words forming the compound is used rather than a semantic category number of the compound. The compound words used in the claims are often expressed by the last word (excluding the suffix). In the example of FIG. 4, the semantic category number of the last word is used as the semantic category number of the compound word. The last three lists in FIG. 4 are dependency relationships extracted from the sentence “With means to be displayed in the sentence being created” (first line in FIG. 3). An example of the dependency analysis for the sentence shown in [1] using the dependency relationship created up to this point will be described. Dependency candidates (c), (d), (e),
The connection relationships in (f) are (1.386, 1.3154) and (1.38, respectively).
6,1.1112), (1.386,1.3112) and (1.386,1.1113). Among them, the connection relation (c) (1.38) that matches the connection relation recorded in the connection relation list of the semantic category number in FIG.
6,1.3154). In other words, semantic category number 1.
The dependency relationship in which the word of the generation relationship indicated by 386 is the dependency word and the word of the sentence relationship indicated by the semantic category number 1.3154 is the recipient word is selected as a candidate. Therefore, it can be seen that there is a dependency relationship in which the generation-related word "creation" is a dependency and the sentence-related word "sentence" is a candidate for a receiving word. Is the correct answer. As shown in this example, the verb “create” (semantic category number) from the connection relationship (1.386,1.3154)
1.386) indicates the possibility of having a word (text, paper, sentence,...) With a semantic category number of 1.3154 as a case. Therefore, even if the same word is not necessarily used for the dependency, it can be determined that the same dependency relationship is established between the words having the same meaning category number. For example, when you make a sentence ...
It can also be used to determine a dependency relationship between the phrase "make" and the phrase "make". In addition, the dependency relationship in which the dependency word "of the created text ..." and the reception word are reversed can be determined using the connection relationship list of the semantic category number without distinguishing the dependency and the dependency by the connection relationship. . In the example of [3] in FIG. 3, a processing example of a dependency candidate for which a matching connection relationship and a connection relationship are not recorded in the dependency relationship table is shown. Here, since the phrase “selected” is a modified form of the adnominal, it is assumed that there is a possibility that the phrase is related to both a noun (phrase), that is, “homophone display selection means” and “homophone output method”. Here, it is assumed that neither the connection relation nor the connection relation of the dependency candidates matches the dependency table. At this time, step (14) is executed. Therefore, the candidates consisting of words having a meaning category number of 1.3063 equal to the word "selection" and words having a meaning category number of 1.3063 equal to the word "selection" are narrowed down from all candidates for the dependency. . In this case, since the semantic category number of “selection” of the compound word “homophone display selection means” is 1.3063, when “choose” is used as a dependency, “homophone display selection means” becomes the dependency. Relationships are specified. Even if the semantic category number of the dependency word and the semantic category number of the receiving word do not match, a candidate for the dependency is shown in FIG. 3 [2].
There may be two or more types as in the example of. Here, when the phrase "of the conversion result" is used as a dependent word, "homophone" (g) and "means" (h) are received words. Therefore, all the dependency candidates having the phrase “conversion result” as the dependency are stored, the determination of the dependency is suspended, and the dependency analysis of the next phrase is executed. As a result of executing the dependency analysis of the other clauses, the dependency relation using the phrase "conversion result" as a dependency is obtained by using the analysis result of the dependency relation in the range shown by the wavy line 11 in the second row of FIG. Can be determined. In other words, the received word “same phonetic word” is determined in one way from the dependency word “of the conversion result” at the wavy line portion 11.
These connection relations are recorded in the connection table, and a connection relation consisting of a pair of the meaning category number 1.1112 of the word “conversion result” and the meaning category number 1.3112 of the word “same word” is recorded in the connection relation list of the semantic category. . Therefore, the dependency relationship of the phrase “of the conversion result” with respect to the phrase “same word” can be determined from the newly recorded connection relationship or connection relationship. That is, in the example of [2] in FIG. 3, the dependency relationship of (g) in which the preceding phrase is a dependency is determined to be correct by performing dependency analysis on the later phrase. In addition, in the example of [4] in FIG. 3, when the phrase “homophone” is used as a dependent word, as shown in the figure, (k) “display”,
(L) "Select", (m) "Prepared",
(N) There are four types of dependency candidates each having "characteristic" as a receiving word. The connection relation of the semantic category numbers corresponding to the connection relation of the semantic category numbers corresponding to these dependency relations is not recognized in the connection relation list of the semantic category numbers in FIG. However, after the compound word "same word display selecting means" is divided into words, a dependency relationship between the words such as a dependency relationship between the words "same word" and the dependent word "display" as a dependent word, etc. By determining the relationship, it is possible to determine that the phrase “homophone” relates to the phrase (k) “display”. [Effect of the Invention] The dependency analyzing apparatus of the present invention sequentially stores a dependency relationship that can be uniquely determined in a dependency candidate storage unit, and determines a dependency relationship in which a dependency cannot be uniquely determined. By referring to the dependency candidate storage unit, the dependency relationship existing in the dependency candidate storage unit is adopted in preference to the dependency relationship not present in the dependency candidate storage unit. By extracting and using information in each sentence without using, a highly accurate dependency analysis can be realized. Then, when the dependency relation is obtained as a concatenation of semantic categories, it is possible to handle the modification based on the case relation of the verb and the modification by the noun phrase in a unified manner, and also to the sentence including the paraphrase. The relationship can be determined uniquely. According to the present invention, it is possible to reduce the cost required for creating specialized knowledge for each field. In addition, even if this method is realized by hardware or software, an extremely small-scale system can be provided because an area for retaining expert knowledge is not required. Therefore, the dependency analyzer according to the present invention is effective for analyzing sentences that repeatedly use the same expression, such as patent sentences and legal sentences. Further, the analysis apparatus can be applied to automatic document processing such as automatic classification of text, automatic index extraction, and automatic summarization. Also, there is an excellent advantage that it can be applied not only to Japanese sentences but also to English sentences.
【図面の簡単な説明】
第1図はこの発明の係り受け解析装置の処理手順の流れ
図、第2図はこの発明の係り受け解析装置の一実施例を
示すブロック図、第3図はこの発明の一実施例の説明に
用いた特許請求範囲文の文例を示す図、第4図は前記実
施例で作成した係り受け関係テーブル、第5図は前記実
施例における複合語解析の例を示す図、第6図は従来の
係り受け解析方法で用いられた格関係表の例を示す図、
第7図,第8図は従来の係り受け解析方法で用いられた
知識表へのアクセス表と知識表の例を示す図、第9図は
従来の係り受け解析の実行例の説明に用いた特許文の文
例を示す図である。
図中、1は文節単位分割部、2は文節抽出部、3は係り
受け候補抽出部、4は複合語分割部、5は複合語係り受
け解析部、6は係り受け候補保存部、7は係り受け関係
登録部、8は係り受け候補検索部、9は係り受け判定
部、10は係り受け関係決定部である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flowchart of a processing procedure of a dependency analyzing apparatus of the present invention, FIG. 2 is a block diagram showing one embodiment of a dependency analyzing apparatus of the present invention, and FIG. FIG. 4 is a diagram showing a sentence example of a claim sentence used for describing one embodiment, FIG. 4 is a dependency relation table created in the embodiment, and FIG. 5 is a diagram showing an example of compound word analysis in the embodiment. FIG. 6 is a diagram showing an example of a case relation table used in the conventional dependency analysis method,
7 and 8 show an example of an access table to a knowledge table and a knowledge table used in a conventional dependency analysis method, and FIG. 9 is used to explain an example of execution of a conventional dependency analysis. It is a figure showing the example of a sentence of a patent sentence. In the figure, 1 is a phrase unit division unit, 2 is a phrase extraction unit, 3 is a dependency candidate extraction unit, 4 is a compound word division unit, 5 is a compound word dependency analysis unit, 6 is a dependency candidate storage unit, and 7 is a dependency candidate storage unit. A dependency relationship registration unit, 8 is a dependency candidate search unit, 9 is a dependency determination unit, and 10 is a dependency relationship determination unit.
───────────────────────────────────────────────────── フロントページの続き (72)発明者 小橋 史彦 神奈川県横須賀市武1丁目2356番地 日 本電信電話株式会社複合通信研究所内 (56)参考文献 特開 昭61−260366(JP,A) 特開 昭62−197076(JP,A) ────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Fumihiko Kobashi 1-2356 Take, Yokosuka-shi, Kanagawa Sun Inside the Combined Communication Research Laboratories of the Telegraph and Telephone Corporation (56) References JP-A-61-260366 (JP, A) JP-A-62-197076 (JP, A)
Claims (1)
接続テーブルと、係り語の意味カテゴリ番号と受け語の
意味カテゴリ番号との対からなる意味カテゴリ番号の連
結関係を記録する意味カテゴリ番号の連結関係リストと
からなり、前記接続関係と前記意味カテゴリ番号の連結
関係との対を係り受け関係として記録した係り受け関係
テーブルからなる係り受け候補保存部と、 入力文を文節に分割する文節単位分割部と、 前記入力文の先頭から順次分割された文節を抽出する文
節抽出部と、 前記抽出された文節を係り語とし残りの文節を受け語と
する対からなる文節間係り受け候補を抽出する係り受け
候補抽出部と、 前記抽出された文節の自立語が2個以上の単語からなる
複合語であるとき、前記複合語を単語に分割して接頭語
と接尾語を取り除く複合語分割部と、 前記複合語の単語間係り受け関係を、係り語と直後の単
語に係る受け語の対を接続関係とし、前記係り語の意味
カテゴリと受け語の意味カテゴリの対を意味カテゴリ番
号の連結関係とする複合語係り受け解析部と、 前記単語間係り受け関係を前記係り受け候補保存部に新
規に登録し、 前記文節間係り受け候補が1つである場合、その候補を
前記係り受け候補保存部に新規に登録する係り受け関係
登録部と、 前記文節間係り受け候補が2つ以上の場合、 前記候補の接続関係と一致するものが前記接続テーブル
の中から検索できたとき、 そのうち係り語と受け語の距離が最も近い候補を正解と
判定し、 正解が判定できなかった場合、前記候補の連結関係と一
致するものが前記意味カテゴリ連結関係リストの中から
検索できたとき、そのうち係り語と受け語の距離が最も
近い候補を正解と判定する係り受け候補検索部と、 前記係り受け候補検索部において正解が判定できなかっ
た場合、前記候補の中の係り語の意味カテゴリ番号と受
け語の意味カテゴリ番号が等しい候補が存在するか否か
を判定し、存在する場合、その候補を正解とし、 存在しない場合、文節間係り受けの判定を保留する、係
り受け判定部と、 すべての文節の係り受け解析が終了した時点で、正解を
決定できなかった文節間係り受け候補について、係り語
と受け語の距離が最も短い文節間係り受け候補を正解と
する、係る受け関係決定部とを有し、 前記係り受け候補検索部は、前記係り受けの判定を保留
した候補の接続関係が前記新規に登録された接続関係と
一致する候補を正解と判定し、 正解が判定できなかった場合、前記係り受けの判定を保
留した候補の連結関係が前記新規に登録された連結関係
と一致する候補を正解と判定することを特徴とする、 係り受け解析装置。(57) [Claims] A connection table that records a connection relation composed of a pair of a dependency word and a reception word, and a semantic category number that records a connection relation of a semantic category number composed of a pair of a meaning category number of a modification word and a semantic category number of a reception word A dependency candidate storage unit consisting of a connection relationship list, and a dependency relationship table storing a pair of the connection relationship and the connection relationship of the semantic category number as a dependency relationship; and a phrase unit for dividing an input sentence into phrases. A segmentation unit; a segment extraction unit that extracts a segment that is sequentially divided from the beginning of the input sentence; and an inter-segment dependency candidate consisting of a pair of the extracted segment as a dependency word and the remaining segments as a word. A dependency candidate extraction unit that, when the independent word of the extracted phrase is a compound word composed of two or more words, divides the compound word into words to obtain a prefix and a suffix; The compound word division unit to be excluded, and the inter-word dependency relationship of the compound word, a pair of the dependency word and the receiving word relating to the immediately succeeding word is set as a connection relationship, and the pair of the meaning category of the dependency word and the meaning category of the receiving word is determined. A compound word dependency analyzing unit for connecting the semantic category numbers, and the inter-word dependency relationship is newly registered in the dependency candidate storage unit, and if the inter-phrase dependency candidate is one, the candidate And a dependency relation registering section for newly registering the same in the dependency candidate storage section, and when the number of inter-phrase dependency candidates is two or more, a connection that matches the connection relation of the candidate can be searched from the connection table. When the candidate has the closest distance between the dependent word and the received word, the candidate is determined to be the correct answer, and if the correct answer cannot be determined, a candidate that matches the connection relationship of the candidate is searched from the semantic category connection relationship list. A dependency candidate search unit that determines a candidate having the closest distance between the dependency word and the received word as a correct answer, and if the correct answer cannot be determined by the dependency candidate search unit, the dependency word in the candidate Determine whether there is a candidate whose semantic category number is equal to the semantic category number of the receiving word. If there is, determine the candidate as the correct answer. If not, suspend the determination of inter-phrase dependency. Dependency determination When the dependency analysis of all clauses is completed, the inter-clause dependency candidate with the shortest distance between the dependent word and the received word is regarded as the correct answer for the inter-clause dependency candidate for which the correct answer could not be determined. And a dependency relationship determining unit, wherein the dependency candidate search unit determines that a candidate whose connection relationship with the candidate for which the determination of the dependency is suspended matches the newly registered connection relationship is a correct answer, If the solution can not be determined, characterized in that the connection of candidates pending determination of the dependency is determined to correct the candidate that matches the registered connection relationship to the new dependency analyzer.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP62173011A JP2960936B2 (en) | 1987-07-13 | 1987-07-13 | Dependency analyzer |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP62173011A JP2960936B2 (en) | 1987-07-13 | 1987-07-13 | Dependency analyzer |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPS6417152A JPS6417152A (en) | 1989-01-20 |
| JP2960936B2 true JP2960936B2 (en) | 1999-10-12 |
Family
ID=15952542
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP62173011A Expired - Lifetime JP2960936B2 (en) | 1987-07-13 | 1987-07-13 | Dependency analyzer |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JP2960936B2 (en) |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH0628057B2 (en) * | 1989-12-07 | 1994-04-13 | キヤノン株式会社 | Character processor |
| JP3293619B2 (en) * | 1990-08-20 | 2002-06-17 | 株式会社シーエスケイ | Japanese parsing system |
| JP2666549B2 (en) * | 1990-09-27 | 1997-10-22 | 日本電気株式会社 | Semiconductor memory device and method of manufacturing the same |
| JP2855409B2 (en) * | 1994-11-17 | 1999-02-10 | 日本アイ・ビー・エム株式会社 | Natural language processing method and system |
| JP2002354941A (en) * | 2001-06-01 | 2002-12-10 | Toray Ind Inc | Sheet for raising sugar content |
| DE602006004754D1 (en) * | 2005-07-29 | 2009-02-26 | Fiberweb Inc | LIQUID, NON-FLUID LUBRICANT FROM BICOMPONENT FILAMENTS |
| JP2008017729A (en) * | 2006-07-11 | 2008-01-31 | Mkv Platech Co Ltd | Coating material for preventing adhesion of flying pesticides |
| JP7082333B2 (en) * | 2017-11-30 | 2022-06-08 | 学校法人酪農学園 | Question automatic generation program and question automatic generation device |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS61260366A (en) * | 1985-05-14 | 1986-11-18 | Sharp Corp | Mechanical translating system having learning function |
| JPS62139076A (en) * | 1985-12-13 | 1987-06-22 | Agency Of Ind Science & Technol | Language analysis system |
-
1987
- 1987-07-13 JP JP62173011A patent/JP2960936B2/en not_active Expired - Lifetime
Also Published As
| Publication number | Publication date |
|---|---|
| JPS6417152A (en) | 1989-01-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US5794177A (en) | Method and apparatus for morphological analysis and generation of natural language text | |
| US6169999B1 (en) | Dictionary and index creating system and document retrieval system | |
| US8027966B2 (en) | Method and system for searching a multi-lingual database | |
| WO1997004405A9 (en) | Method and apparatus for automated search and retrieval processing | |
| US20060047691A1 (en) | Creating a document index from a flex- and Yacc-generated named entity recognizer | |
| KR20160060253A (en) | Natural Language Question-Answering System and method | |
| US20060047690A1 (en) | Integration of Flex and Yacc into a linguistic services platform for named entity recognition | |
| JP2960936B2 (en) | Dependency analyzer | |
| Khoo et al. | Using statistical and contextual information to identify two‐and three‐character words in Chinese text | |
| Bhat | Morpheme segmentation for kannada standing on the shoulder of giants | |
| CN118132668A (en) | Rule-based component specification model custom word segmentation method | |
| KR20050064574A (en) | System for target word selection using sense vectors and korean local context information for english-korean machine translation and thereof | |
| JPH06149887A (en) | Text type database device | |
| JP2003303194A (en) | Idiom dictionary creation device, search index creation device, document search device, their methods, programs and recording media | |
| KR100617319B1 (en) | Apparatus for selecting target word for noun/verb using verb patterns and sense vectors for English-Korean machine translation and method thereof | |
| Kumar et al. | TelStem: An unsupervised telugu stemmer with heuristic improvements and normalized signatures | |
| JPH01243116A (en) | Method for processing japanese sentence | |
| Takahasi et al. | Keyboard logs as natural annotations for word segmentation | |
| Rosset et al. | The LIMSI Qast systems: comparison between human and automatic rules generation for question-answering on speech transcriptions | |
| JP2655711B2 (en) | Homomorphic reading system | |
| KR20020054244A (en) | Apparatus and method of long sentence translation using partial sentence frame | |
| JPS63228326A (en) | Automatic key word extracting system | |
| JP3139624B2 (en) | Morphological analyzer | |
| JPH0320866A (en) | Text-based search method | |
| JP2002342325A (en) | Device, method and program for applying translation probability |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| EXPY | Cancellation because of completion of term |