[go: up one dir, main page]

JP5564898B2 - Information search program and information search apparatus - Google Patents

Information search program and information search apparatus Download PDF

Info

Publication number
JP5564898B2
JP5564898B2 JP2009253926A JP2009253926A JP5564898B2 JP 5564898 B2 JP5564898 B2 JP 5564898B2 JP 2009253926 A JP2009253926 A JP 2009253926A JP 2009253926 A JP2009253926 A JP 2009253926A JP 5564898 B2 JP5564898 B2 JP 5564898B2
Authority
JP
Japan
Prior art keywords
sentence
question
independent word
independent
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2009253926A
Other languages
Japanese (ja)
Other versions
JP2011100258A (en
Inventor
博 増市
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujifilm Business Innovation Corp
Original Assignee
Fuji Xerox Co Ltd
Fujifilm Business Innovation Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuji Xerox Co Ltd, Fujifilm Business Innovation Corp filed Critical Fuji Xerox Co Ltd
Priority to JP2009253926A priority Critical patent/JP5564898B2/en
Publication of JP2011100258A publication Critical patent/JP2011100258A/en
Application granted granted Critical
Publication of JP5564898B2 publication Critical patent/JP5564898B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

本発明は、情報検索プログラム及び情報検索装置に関する。   The present invention relates to an information search program and an information search apparatus.

自然言語により入力された質問文に応じて文書を検索する技術が提案されている。   There has been proposed a technique for retrieving a document according to a question sentence input in a natural language.

これに関連する技術として、特許文献1には、自然言語により構成された質問文により文書の検索が要求されると、この質問文から言語解析により自立語と付属語とを抽出し、付属語を予め設定された複数の演算子の一つにそれぞれ変換して、対応する自立語と組み合わせることで検索条件を生成する技術が開示されている。   As a technology related to this, in Patent Document 1, when a document search is requested by a question sentence composed of a natural language, an independent word and an attached word are extracted from the question sentence by language analysis, and the attached word Is converted into one of a plurality of preset operators and combined with a corresponding independent word to generate a search condition.

特開平8−339383号公報JP-A-8-339383

本発明の目的は、質問文に対する適切な回答となる文書に含まれる文章中にその質問文から抽出される自立語が含まれていない場合において、適切な回答を行う情報検索プログラム及び情報検索装置を提供することにある。   An object of the present invention is to provide an information search program and an information search apparatus that make an appropriate answer when a sentence included in a document that is an appropriate answer to the question sentence does not contain an independent word extracted from the question sentence. Is to provide.

[1]コンピュータを、
受け付けた質問文から第1の自立語を抽出する自立語抽出手段と、
前記自立語抽出手段によって前記質問文から抽出された前記第1の自立語を含む、記憶手段に記憶された文書に含まれる文章を検索する文章検索手段と、
前記文章検索手段が検索した前記文章に含まれる第2の自立語を抽出し、当該第2の自立語から特徴的に用いられる第3の自立語を特定する特定手段と、
前記自立語抽出手段が抽出した第1の自立語から、質問種別情報に含まれる自立語と一致する自立語を前記質問文の質問の種別を示す自立語として特定する質問種別特定手段と、
前記質問文の質問の種別が予め定めた種別である場合、前記第3の自立語と前記質問の種別を示す自立語に関連する関連語とを含む文書を検索して該当する文書を、前記質問文に対する回答として抽出し、前記質問文の質問の種別が予め定めた種別でない場合、前記文章検索手段が検索した前記文章を回答として抽出する回答抽出手段として機能させるための情報検索プログラム。
[1]
An independent word extracting means for extracting the first independent word from the accepted question sentence;
A sentence search means for searching for a sentence contained in a document stored in a storage means, including the first independent word extracted from the question sentence by the independent word extraction means;
Identifying means for extracting a second independent word contained in the sentence searched by the sentence searching means and identifying a third independent word used characteristically from the second independent word;
From a first content words the independent word extracting means has extracted, question type specifying means for specifying a content words independent words that match the self Tatsugo that is included in the question type information indicating the type of question the question When,
If the type of question the question is predetermined type, the corresponding document by searching the document including a Related Terms that are related to the independent words indicating the type of the question and the third independent words, An information retrieval program for extracting as an answer to the question sentence and functioning as an answer extracting means for extracting the sentence retrieved by the sentence retrieving means as an answer when the question type of the question sentence is not a predetermined type.

[2]前記特定手段は、前記文章に含まれる第2の自立語のうち、前記文章中で用いられる頻度が高く、かつ前記記憶された文書全体に含まれる文章において用いられる頻度が低い自立語を前記第3の自立語として特定する前記[1]に記載の情報検索プログラム。 [2] Of the second independent words included in the sentence, the specifying unit is an independent word that is frequently used in the sentence and is less frequently used in the sentence included in the entire stored document. The information search program according to [1], in which is specified as the third independent word.

[3]前記回答抽出手段は、前記特定手段が前記第2の自立語から前記第3の自立語を特定できない場合、前記文章検索手段が検索した前記文章を回答として抽出する請求項[]又は[]に記載の情報検索プログラム。 [3] the answer extraction means, when said identifying means is unable to identify the third content words from said second independent word, claim for extracting the sentences the sentence search means searches a reply [1] Or the information search program as described in [ 2 ].

[4]受け付けた質問文から第1の自立語を抽出する自立語抽出手段と、
前記自立語抽出手段によって前記質問文から抽出された前記第1の自立語を含む、記憶手段に記憶された文書に含まれる文章を検索する文章検索手段と、
前記文章検索手段が検索した前記文章に含まれる第2の自立語を抽出し、当該第2の自立語から特徴的に用いられる第3の自立語を特定する特定手段と、
前記自立語抽出手段が抽出した第1の自立語から、質問種別情報に含まれる自立語と一致する自立語を前記質問文の質問の種別を示す自立語として特定する質問種別特定手段と、
前記質問文の質問の種別が予め定めた種別である場合、前記第3の自立語と前記質問の種別を示す自立語に関連する関連語とを含む文書を検索して該当する文書を、前記質問文に対する回答として抽出し、前記質問文の質問の種別が予め定めた種別でない場合、前記文章検索手段が検索した前記文章を回答として抽出する回答抽出手段とを有する情報検索装置。
[4] Independent word extracting means for extracting the first independent word from the accepted question sentence;
A sentence search means for searching for a sentence contained in a document stored in a storage means, including the first independent word extracted from the question sentence by the independent word extraction means;
Identifying means for extracting a second independent word contained in the sentence searched by the sentence searching means and identifying a third independent word used characteristically from the second independent word;
From a first content words the independent word extracting means has extracted, question type specifying means for specifying a content words independent words that match the self Tatsugo that is included in the question type information indicating the type of question the question When,
If the type of question the question is predetermined type, the corresponding document by searching the document including a Related Terms that are related to the independent words indicating the type of the question and the third independent words, An information search apparatus comprising: an answer extraction unit that extracts the sentence searched by the sentence search unit as an answer when the question is extracted as an answer to the question sentence and a question type of the question sentence is not a predetermined type.

請求項1及び4に係る発明によれば、質問文に対する適切な回答となる文書に含まれる文章中にその質問文から抽出される自立語が含まれていない場合において、予め定めた質問種別である場合は、質問の種別を示す自立語及びその自立語の関連語を回答に反映させることができ、適切な回答を行うことができるとともに、予め定めた質問種別でない場合は、質問文から抽出される自立語を含む文章を回答として抽出することができるAccording to the inventions according to claims 1 and 4, in the case where an independent word extracted from the question sentence is not included in the sentence included in the document that is an appropriate answer to the question sentence, the question type is determined in advance. In some cases, an independent word indicating the type of question and related words of the independent word can be reflected in the answer, and an appropriate answer can be made, and if it is not a predetermined question type, it is extracted from the question text Sentences that contain independent words can be extracted as answers .

請求項2に係る発明によれば、出現頻度に基づいて特定した自立語を用いて回答を抽出できる。   According to the invention which concerns on Claim 2, an answer can be extracted using the independent word specified based on appearance frequency.

請求項3に係る発明によれば、関連文章中に特徴語がない場合は、質問文から抽出される自立語を含む文章を回答として抽出することができるAccording to the invention which concerns on Claim 3, when there is no feature word in a related sentence , the sentence containing the independent word extracted from a question sentence can be extracted as an answer .

本発明の情報検索システムの構成例を示す概略図である。It is the schematic which shows the structural example of the information search system of this invention. 情報検索装置の構成例を示す概略図である。It is the schematic which shows the structural example of an information search device. 質問種別情報の構成例を示す概略図である。It is the schematic which shows the structural example of question classification information. 製品具体情報の構成例を示す概略図である。It is the schematic which shows the structural example of product specific information. (a)及び(b)は、質問文の構成例を示す概略図である。(A) And (b) is the schematic which shows the structural example of a question sentence. (a)〜(c)は、質問文に関連する関連文章の構成例を示す概略図である。(A)-(c) is the schematic which shows the structural example of the related text relevant to a question sentence. (a)〜(c)は、関連文章から抽出される特徴語及び特徴語から抽出される回答の構成例を示す概略図である。(A)-(c) is the schematic which shows the structural example of the reply extracted from the feature word extracted from a related sentence, and a feature word. 情報検索システムの動作例を示すフローチャートである。It is a flowchart which shows the operation example of an information search system.

(情報検索システムの構成)
図1は、本発明の情報検索システムの構成例を示す概略図である。
(Configuration of information retrieval system)
FIG. 1 is a schematic diagram showing a configuration example of an information search system of the present invention.

この情報検索システム5は、情報検索装置1と、文書管理サーバ装置2と、端末装置3とをネットワーク4によって互いに通信可能に接続することで構成される。   The information search system 5 is configured by connecting the information search device 1, the document management server device 2, and the terminal device 3 so as to communicate with each other via a network 4.

情報検索装置1は、質問文に応じた情報を検索するための機能を備えたCPU(Central Processing Unit)や記憶部等の電子部品を有する情報処理装置である。   The information search apparatus 1 is an information processing apparatus having an electronic component such as a CPU (Central Processing Unit) and a storage unit having a function for searching for information according to a question sentence.

文書管理サーバ装置2は、文書を格納する文書データベース2Aに接続され、文書データベース2Aに格納された文書の保存、削除、書込み、移動、コピー等の管理動作を行うための機能を備えたCPUや記憶部等の電子部品を有する情報処理装置である。   The document management server device 2 is connected to a document database 2A for storing documents, and includes a CPU having a function for performing management operations such as saving, deleting, writing, moving, copying, and the like of documents stored in the document database 2A. An information processing apparatus having electronic components such as a storage unit.

端末装置3は、情報検索装置1にアクセスして質問文を入力するため、及び質問文に対する検索結果を表示するための装置であって、操作入力用の操作部と、液晶ディスプレイ等の表示部と、CPUや記憶部当の電子部品を備えた制御部とを有する。なお、端末装置3は、例えば、パーソナルコンピュータやPDA(Personal Digital Assistant)、携帯電話機等でもよい。また、端末装置3は、同図では1台を図示するが、複数台でもよい。   The terminal device 3 is a device for accessing the information search device 1 to input a question sentence and displaying a search result for the question sentence, and includes an operation unit for operation input and a display unit such as a liquid crystal display And a control unit provided with electronic components such as a CPU and a storage unit. The terminal device 3 may be, for example, a personal computer, a PDA (Personal Digital Assistant), a mobile phone, or the like. Further, although one terminal device 3 is illustrated in the figure, a plurality of terminal devices 3 may be provided.

ネットワーク4は、LAN(Local Area Network)、インターネット等でもよく、有線、無線は問わない。   The network 4 may be a LAN (Local Area Network), the Internet, etc., and may be wired or wireless.

ここで、「質問文」とは、自然言語により構成される文章であり、長さや文の数は問わない。また、「検索結果」とは、自然言語により構成される単語や文章、又は文章を含む文書自体若しくは当該文書に対するリンク等である。本実施の形態では、例えば、質問文として製品や商品の性質を示す自立語の集合を用いて、検索結果として具体的な製品や商品の名称を検索する場合を説明する。   Here, the “question sentence” is a sentence composed of a natural language, and the length and the number of sentences do not matter. The “search result” is a word or sentence composed of a natural language, a document itself containing a sentence, a link to the document, or the like. In the present embodiment, for example, a case will be described in which a specific product or product name is searched as a search result using a set of independent words indicating the nature of the product or product as a question sentence.

図2は、情報検索装置1の構成例を示す概略図である。   FIG. 2 is a schematic diagram illustrating a configuration example of the information search apparatus 1.

情報検索装置1は、CPU等から構成され各部を制御するとともに各種のプログラムを実行する制御部10と、HDD(Hard Disk Drive)やフラッシュメモリ等の記憶媒体から構成され情報を記憶する記憶部11と、ネットワーク4を介して外部と通信する通信部12とを有する。   The information retrieval apparatus 1 includes a CPU and the like, and controls each unit and executes various programs. A storage unit 11 includes a storage medium such as an HDD (Hard Disk Drive) or a flash memory and stores information. And a communication unit 12 that communicates with the outside via the network 4.

制御部10は、後述する図2に示す情報検索プログラム11Aを実行することで、自立語抽出手段10A、質問種別特定手段10B、関連文章検索手段10C、特徴語特定手段10D、回答抽出手段10E、出力手段10F等として機能する。   The control unit 10 executes an information search program 11A shown in FIG. 2 to be described later, whereby an independent word extraction unit 10A, a question type specification unit 10B, a related sentence search unit 10C, a feature word specification unit 10D, an answer extraction unit 10E, It functions as the output means 10F and the like.

自立語抽出手段10Aは、端末装置3において入力された質問文を受け付けて自立語を抽出する。   The independent word extracting unit 10A accepts the question sentence input in the terminal device 3 and extracts an independent word.

質問種別特定手段10Bは、質問文に含まれる自立語から、質問の種別(以下、「質問種別」と記載することもある。)を特定する。ここで、「質問種別」とは、質問文が要求している回答の種別と同義であり、例えば、ビデオカメラの商品名、政治家の名前、首都の名前等を回答として要求している場合には、質問文においてそれぞれ「『ビデオカメラ』は?」、「『政治家』は?」、「『首都』は?」というような自立語が質問種別として用いられる。具体的には、後述する質問種別情報11Bに予め記載された内容から特定される。   The question type specifying means 10B specifies the type of question (hereinafter also referred to as “question type”) from the independent words included in the question sentence. Here, “question type” is synonymous with the type of answer requested by the question text. For example, when a product name of a video camera, the name of a politician, the name of a capital, etc. are requested as an answer. In the question sentence, independent words such as “What is“ video camera ”?”, “What is“ politician ”?” And “What is“ capital ”?” Are used as question types. Specifically, it is specified from the contents described in advance in question type information 11B described later.

関連文章検索手段10Cは、文書データベース2Aに格納される文書を構成する文章から、質問文の内容に応じて関連性の高い文章を検索する。   The related sentence search unit 10C searches for sentences having high relevance according to the contents of the question sentence from the sentences constituting the document stored in the document database 2A.

特徴語特定手段10Dは、関連文章検索手段10Cにより検索された文章中から、その文章を構成する自立語のうち特徴的に用いられる特徴語を特定し、後述する特徴語情報11Dとして記憶部11に記憶する。   The feature word specifying unit 10D specifies a feature word used characteristically from the independent words constituting the sentence from the sentences searched by the related sentence search unit 10C, and the storage unit 11 as feature word information 11D described later. To remember.

回答抽出手段10Eは、特徴語特定手段10Dによって特定された特徴語と関連付けられた回答を抽出し、後述する回答情報11Eとして記憶部11に記憶する。   The answer extracting unit 10E extracts the answer associated with the feature word specified by the feature word specifying unit 10D and stores it in the storage unit 11 as answer information 11E described later.

出力手段10Fは、回答情報11Eを端末装置3が表示できる形式の情報で出力する。   The output means 10F outputs the answer information 11E in information in a format that can be displayed by the terminal device 3.

記憶部11は、制御部10を上述した各手段として動作させる情報検索プログラム11Aと、特定の質問種別とその所属カテゴリーとを関連付ける情報である質問種別情報11Bと、特定の質問種別と特定の製品名とを関連づける製品具体情報11Cと、特徴語特定手段10Dによって特定された特徴語情報11Dと、回答抽出手段10Eによって抽出された回答情報11Eとを記憶する。   The storage unit 11 includes an information search program 11A that causes the control unit 10 to operate as the above-described units, question type information 11B that is information associating a specific question type with its category, a specific question type, and a specific product. Product specific information 11C for associating a name, feature word information 11D specified by the feature word specifying means 10D, and answer information 11E extracted by the answer extracting means 10E are stored.

図3は、質問種別情報11Bの構成例を示す概略図である。   FIG. 3 is a schematic diagram illustrating a configuration example of the question type information 11B.

質問種別情報11Bは、質問文に含まれる質問種別を示す質問種別欄110Bと、各々の質問種別に対応する所属カテゴリーを示す所属欄111Bとを有する。   The question type information 11B includes a question type column 110B indicating the question type included in the question sentence, and an affiliation column 111B indicating the affiliation category corresponding to each question type.

図4は、製品具体情報11Cの構成例を示す概略図である。   FIG. 4 is a schematic diagram illustrating a configuration example of the product specific information 11C.

製品具体情報11Cは、質問文に含まれる質問種別を示す質問種別欄110Cと、各々の質問種別に対応する具体的な製品名を示す製品名欄111Cとを有する。   The product specific information 11C includes a question type column 110C indicating a question type included in the question sentence, and a product name column 111C indicating a specific product name corresponding to each question type.

(情報検索システムの動作)
以下に、情報検索システム5の動作を各図を参照しつつ説明する。
(Operation of information retrieval system)
Hereinafter, the operation of the information search system 5 will be described with reference to the drawings.

図8は、情報検索システム5の動作例を示すフローチャートである。   FIG. 8 is a flowchart showing an operation example of the information search system 5.

まず、利用者が端末装置3の操作部を操作して製品又は商品の検索要求を情報検索装置1に対して行うと、情報検索装置1の制御部10は、情報検索プログラム11Aに基づいて、端末装置3の表示部に質問受付のための表示画面を表示する。   First, when a user operates the operation unit of the terminal device 3 to make a search request for a product or product to the information search device 1, the control unit 10 of the information search device 1 is based on the information search program 11A. A display screen for accepting questions is displayed on the display unit of the terminal device 3.

次に、端末装置3に表示された質問受付のための表示画面において、利用者が操作部を操作して質問文を入力すると、自立語抽出手段10Aは、ネットワーク4及び通信部12を介してその質問文を受信して受け付ける(S1)。   Next, on the display screen for accepting a question displayed on the terminal device 3, when the user operates the operation unit and inputs a question sentence, the independent word extracting means 10 </ b> A is connected via the network 4 and the communication unit 12. The question text is received and accepted (S1).

図5(a)及び(b)は、質問文の構成例を示す概略図である。   FIGS. 5A and 5B are schematic diagrams illustrating an example of the structure of a question sentence.

図5(a)に示す質問文100は、例えば、「暗いところできれいに撮影できるビデオカメラは?」という文章で構成される。   The question sentence 100 shown in FIG. 5A is composed of, for example, a sentence “What is a video camera that can be taken beautifully in a dark place?”

次に、自立語抽出手段10Aは、図5(a)に示す質問100から図5(b)に示す自立語100a〜100dを抽出する(S2)。質問種別特定手段10Bは、自立語100a〜100dのうち、図3に示す質問種別情報11Bの質問種別欄110Bに存在するものがあるか否かを調べる。本実施の形態では、自立語100dの「ビデオカメラ」が質問種別情報欄110Bに存在するので、質問種別特定手段10Bは、自立語100dの「ビデオカメラ」を、質問種別と特定する(S3)。   Next, the independent word extracting unit 10A extracts the independent words 100a to 100d shown in FIG. 5B from the question 100 shown in FIG. 5A (S2). The question type identification unit 10B checks whether there are any words in the question type column 110B of the question type information 11B shown in FIG. 3 among the independent words 100a to 100d. In the present embodiment, since “video camera” of the independent word 100d exists in the question type information column 110B, the question type specifying means 10B specifies “video camera” of the independent word 100d as the question type (S3). .

また、質問種別特定手段10Bは、「ビデオカメラ」がいずれの所属カテゴリにあたるのか、質問種別情報11Bの所属欄111Bから判断し(S4)、所属が「製品・商品」に該当する場合(S;Yes)、関連文章検索手段10Cは、自立語100a〜100dに基づいて文書データベース2Aに格納される文書から関連する文章(以下、「関連文章」という。)を検索する(S5)。このとき、関連文章検索手段10Cは、抽出された自立語100a〜100dをなるべく多く含む関連文章を検索するが、検索条件はこれに限られない。
Further, the question type specifying unit 10B determines which category the “video camera” belongs to from the belonging column 111B of the question type information 11B (S4), and when the belonging corresponds to “product / product” (S 4 Yes), the related sentence search means 10C searches for a related sentence (hereinafter referred to as “related sentence”) from the documents stored in the document database 2A based on the independent words 100a to 100d (S5). At this time, the related text search unit 10C searches for related text that includes as many extracted independent words 100a to 100d as possible, but the search condition is not limited to this.

図6(a)〜(c)は、質問文に関連する関連文章の構成例を示す概略図である。   FIGS. 6A to 6C are schematic diagrams illustrating configuration examples of related sentences related to the question sentence.

図6(a)〜(c)に示す関連文章102a〜102cは、図5(b)に示す自立語100a〜100dを用いてそれぞれ検索された文章である。いずれの関連文章102a〜102cにも「暗い」、「きれい」、「撮影」等の質問文100の自立語100a〜100cが含まれている。   The related sentences 102a to 102c shown in FIGS. 6A to 6C are sentences searched using the independent words 100a to 100d shown in FIG. All the related sentences 102a to 102c include independent words 100a to 100c of the question sentence 100 such as “dark”, “beautiful”, and “photographing”.

次に、特徴語特定手段10Dは、検索された関連文章102a〜102cから自立語を抽出し(S6)、その自立語の中から特徴語を特定する(S7)。特徴語の特定は、例えば、以下に説明するTF・IDF法を用いて行う。   Next, the feature word specifying unit 10D extracts an independent word from the retrieved related sentences 102a to 102c (S6), and specifies the feature word from the independent word (S7). The feature word is specified using, for example, the TF / IDF method described below.

まず、いずれかの関連文章中の自立語tの出現頻度をF(t)とすると、

Figure 0005564898
は、頻出する自立語について値が大きくなる。また、自立語tを含む文章の数をM(t)とし、全文章の数をMとすると、
Figure 0005564898
は、めずらしい自立語について値が大きくなる。以上より、
Figure 0005564898
は、特徴的に頻出する自立語について値が大きくなるため、TFIDF(t)が大きくなる単語tを特徴語とする。一例として、TFIDF(t)が予め定められた閾値を超える自立語を特徴語として特定する。 First, let F (t i ) be the frequency of occurrence of an independent word t i in any related sentence.
Figure 0005564898
Increases for frequent independent words. Further, if the number of sentences including the independent word t i is M (t i ) and the number of all sentences is M,
Figure 0005564898
Increases the value for rare independent words. From the above,
Figure 0005564898
Since the value of an independent word that frequently appears is large, the word t i that increases TFIDF (t i ) is used as a feature word. As an example, an independent word whose TFIDF (t i ) exceeds a predetermined threshold is specified as a feature word.

図7(a)〜(c)は、関連文章から抽出される特徴語及び特徴語から抽出される回答の構成例を示す概略図である。   FIGS. 7A to 7C are schematic diagrams illustrating configuration examples of feature words extracted from related sentences and answers extracted from the feature words.

図7(a)に示す特徴語103aは、図6(a)に示す関連文章102aから特定された特徴語である。また、図7(b)に示す特徴語103bは、図6(b)に示す関連文章102bから特定された特徴語であり、図7(c)に示す特徴語103cは、図6(c)に示す関連文章102cから特定された特徴語である。   A feature word 103a shown in FIG. 7A is a feature word specified from the related sentence 102a shown in FIG. Moreover, the feature word 103b shown in FIG.7 (b) is a feature word specified from the related sentence 102b shown in FIG.6 (b), and the feature word 103c shown in FIG.7 (c) is shown in FIG.6 (c). It is the feature word specified from the related sentence 102c shown in FIG.

図7(a)〜(c)に示すように特徴語が特定された場合(S7;Yes)、回答抽出手段10Eは、特徴語103a〜103cが含まれる文書に同時に含まれる製品・商品名のうち、図4に示す製品具体情報11Cの製品名欄111Cに含まれるものを回答として抽出する(S8)。   When a feature word is specified as shown in FIGS. 7A to 7C (S7; Yes), the answer extraction unit 10E determines the product / product name included in the document including the feature words 103a to 103c at the same time. Among them, what is included in the product name column 111C of the product specific information 11C shown in FIG. 4 is extracted as an answer (S8).

例えば、図7(a)に示すように、特徴語103a「シーモアグレースセンサー」に基づいて回答104a「ハンドカムAB15、ハンドカムAB25、ハンドカムAB35」が回答として抽出される。回答104b及び回答104cについても同様に抽出される。また、抽出された特徴語103a〜103c及び回答104a〜104cは、回答抽出手段10Eにより、それぞれ特徴語情報11D及び回答情報11Eとして記憶部11に記憶される。   For example, as shown in FIG. 7A, an answer 104a “hand cam AB15, hand cam AB25, hand cam AB35” is extracted as an answer based on the feature word 103a “Seymour Grace Sensor”. The answers 104b and 104c are extracted in the same manner. The extracted feature words 103a to 103c and answers 104a to 104c are stored in the storage unit 11 as feature word information 11D and answer information 11E, respectively, by the answer extracting means 10E.

次に、出力手段10Fは、回答情報11Eを記憶部11から読み出して端末装置3に送信し、回答を端末装置3の表示画面に表示する(S9)。表示形態は、回答104a〜104cを文字列として表示してもよいし、回答104a〜104cが含まれていた文書の内容すべてを表示したり、文書に対するリンクを表示してもよい。   Next, the output means 10F reads the response information 11E from the storage unit 11, transmits it to the terminal device 3, and displays the response on the display screen of the terminal device 3 (S9). As the display form, the answers 104a to 104c may be displayed as character strings, all the contents of the document including the answers 104a to 104c may be displayed, or a link to the document may be displayed.

また、ステップS4において質問種別が「製品・商品」以外のものであった場合(S4;No)、及びステップS7において特徴語が抽出されなかった場合(S7;No)、従来の一般的な質問応答システムで質問を処理する(S10)。一般的な質問応答システムとは、例えば、質問文から自立語を抽出して、抽出された自立語に基づいて関連文章を検索し、その関連文章を表示するシステムである。また、関連文章から、製品名と同等の人物名、地名等を応答してもよい。   Further, when the question type is other than “product / product” in step S4 (S4; No), and when no feature word is extracted in step S7 (S7; No), the conventional general question The question is processed by the response system (S10). A general question answering system is, for example, a system that extracts independent words from a question sentence, searches for related sentences based on the extracted independent words, and displays the related sentences. Further, a person name equivalent to the product name, a place name, or the like may be returned from the related text.

[他の実施の形態]
なお、本発明は、上記実施の形態に限定されず、本発明の要旨を逸脱しない範囲で種々な変形が可能である。例えば、製品・商品以外のものに対して本願発明を適用してもよい。例えば、「現在、最も人気のある政治家は?」という質問文に対して、「変えようニッポン、今こそぞ」というスローガンが特徴語として特定された場合、この特徴語を用いて検索を実行し、「富士花子」を回答として抽出してもよい。
[Other embodiments]
The present invention is not limited to the above embodiment, and various modifications can be made without departing from the gist of the present invention. For example, you may apply this invention with respect to things other than a product and goods. For example, if the slogan “Nippon to change, now is the time” is specified as a feature word for a question sentence “What is the most popular politician now?”, A search is performed using this feature word. Then, “Hanako Fuji” may be extracted as an answer.

また、上記実施の形態で使用される自立語抽出手段10A、質問種別特定手段10B、関連文章検索手段10C、特徴語特定手段10D、回答抽出手段10E、出力手段10Fは、CD−ROM等の記憶媒体から装置内の記憶部に読み込んでも良く、インターネット等のネットワークに接続されているサーバ装置等から装置内の記憶部にダウンロードしてもよい。また、上記実施の形態で使用される手段の一部または全部をASIC等のハードウェアによって実現してもよい。   The independent word extraction means 10A, question type identification means 10B, related sentence search means 10C, feature word identification means 10D, answer extraction means 10E, and output means 10F used in the above embodiment are stored in a CD-ROM or the like. It may be read from a medium into a storage unit in the device, or may be downloaded from a server device connected to a network such as the Internet to a storage unit in the device. Moreover, you may implement | achieve part or all of the means used by the said embodiment by hardware, such as ASIC.

1…情報検索装置、2…文書管理サーバ装置、2A…文書データベース、3…端末装置、4…ネットワーク、5…情報検索システム、10…制御部、10A…自立語抽出手段、10B…質問種別特定手段、10C…関連文章検索手段、10D…特徴語特定手段、10E…回答抽出手段、10F…出力手段、11…記憶部、11A…情報検索プログラム、11B…質問種別情報、11C…製品具体情報、11D…特徴語情報、11E…回答情報、12…通信部、100…質問文、100…質問、100a-100d…自立語、102a-102c…関連文章、103a-103c…特徴語、104a-104c…回答、110B…質問種別欄、111B…所属欄、110C…質問種別欄、111C…製品名欄 DESCRIPTION OF SYMBOLS 1 ... Information retrieval apparatus, 2 ... Document management server apparatus, 2A ... Document database, 3 ... Terminal device, 4 ... Network, 5 ... Information retrieval system, 10 ... Control part, 10A ... Independent word extraction means, 10B ... Question type specification Means, 10C: Related text search means, 10D: Feature word specifying means, 10E ... Answer extraction means, 10F ... Output means, 11 ... Storage section, 11A ... Information search program, 11B ... Question type information, 11C ... Product specific information, 11D ... feature word information, 11E ... answer information, 12 ... communication unit, 100 ... question sentence, 100 ... question, 100a-100d ... independent words, 102a-102c ... related sentences, 103a-103c ... feature words, 104a-104c ... Answer, 110B ... Question type column, 111B ... Affiliation column, 110C ... Question type column, 111C ... Product name column

Claims (4)

コンピュータを、
受け付けた質問文から第1の自立語を抽出する自立語抽出手段と、
前記自立語抽出手段によって前記質問文から抽出された前記第1の自立語を含む、記憶手段に記憶された文書に含まれる文章を検索する文章検索手段と、
前記文章検索手段が検索した前記文章に含まれる第2の自立語を抽出し、当該第2の自立語から特徴的に用いられる第3の自立語を特定する特定手段と、
前記自立語抽出手段が抽出した第1の自立語から、質問種別情報に含まれる自立語と一致する自立語を前記質問文の質問の種別を示す自立語として特定する質問種別特定手段と、
前記質問文の質問の種別が予め定めた種別である場合、前記第3の自立語と前記質問の種別を示す自立語に関連する関連語とを含む文書を検索して該当する文書を、前記質問文に対する回答として抽出し、前記質問文の質問の種別が予め定めた種別でない場合、前記文章検索手段が検索した前記文章を回答として抽出する回答抽出手段として機能させるための情報検索プログラム。
Computer
An independent word extracting means for extracting the first independent word from the accepted question sentence;
A sentence search means for searching for a sentence contained in a document stored in a storage means, including the first independent word extracted from the question sentence by the independent word extraction means;
Identifying means for extracting a second independent word contained in the sentence searched by the sentence searching means and identifying a third independent word used characteristically from the second independent word;
From the first independent word extracted by the independent word extracting means, a question type specifying means for specifying an independent word that matches the independent word included in the question type information as an independent word indicating the question type of the question sentence;
When the question type of the question sentence is a predetermined type, a document including the third independent word and a related word related to the independent word indicating the question type is searched, and the corresponding document is An information search program for extracting as a reply to a question sentence and functioning as an answer extraction means for extracting the sentence searched by the sentence search means as an answer when the question type of the question sentence is not a predetermined type.
前記特定手段は、前記文章に含まれる第2の自立語のうち、前記文章中で用いられる頻度が高く、かつ前記記憶された文書全体に含まれる文章において用いられる頻度が低い自立語を前記第3の自立語として特定する請求項1に記載の情報検索プログラム。   The specifying means selects the independent words that are frequently used in the sentence among the second independent words included in the sentence and are low in the sentence included in the entire stored document. The information search program according to claim 1, which is specified as three independent words. 前記回答抽出手段は、前記特定手段が前記第2の自立語から前記第3の自立語を特定できない場合、前記文章検索手段が検索した前記文章を回答として抽出する請求項又はに記載の情報検索プログラム。 The answer extraction means, when said identifying means is unable to identify the third content words from said second independent words, according to claim 1 or 2 for extracting the sentences the sentence search means searches a reply Information retrieval program. 受け付けた質問文から第1の自立語を抽出する自立語抽出手段と、
前記自立語抽出手段によって前記質問文から抽出された前記第1の自立語を含む、記憶手段に記憶された文書に含まれる文章を検索する文章検索手段と、
前記文章検索手段が検索した前記文章に含まれる第2の自立語を抽出し、当該第2の自立語から特徴的に用いられる第3の自立語を特定する特定手段と、
前記自立語抽出手段が抽出した第1の自立語から、質問種別情報に含まれる自立語と一致する自立語を前記質問文の質問の種別を示す自立語として特定する質問種別特定手段と、
前記質問文の質問の種別が予め定めた種別である場合、前記第3の自立語と前記質問の種別を示す自立語に関連する関連語とを含む文書を検索して該当する文書を、前記質問文に対する回答として抽出し、前記質問文の質問の種別が予め定めた種別でない場合、前記文章検索手段が検索した前記文章を回答として抽出する回答抽出手段とを有する情報検索装置。
An independent word extracting means for extracting the first independent word from the accepted question sentence;
A sentence search means for searching for a sentence contained in a document stored in a storage means, including the first independent word extracted from the question sentence by the independent word extraction means;
Identifying means for extracting a second independent word contained in the sentence searched by the sentence searching means and identifying a third independent word used characteristically from the second independent word;
From the first independent word extracted by the independent word extracting means, a question type specifying means for specifying an independent word that matches the independent word included in the question type information as an independent word indicating the question type of the question sentence;
When the question type of the question sentence is a predetermined type, a document including the third independent word and a related word related to the independent word indicating the question type is searched, and the corresponding document is An information search apparatus comprising: an answer extraction unit that extracts the sentence searched by the sentence search unit as an answer when the question is extracted as an answer to the question sentence and the question type of the question sentence is not a predetermined type.
JP2009253926A 2009-11-05 2009-11-05 Information search program and information search apparatus Expired - Fee Related JP5564898B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2009253926A JP5564898B2 (en) 2009-11-05 2009-11-05 Information search program and information search apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2009253926A JP5564898B2 (en) 2009-11-05 2009-11-05 Information search program and information search apparatus

Publications (2)

Publication Number Publication Date
JP2011100258A JP2011100258A (en) 2011-05-19
JP5564898B2 true JP5564898B2 (en) 2014-08-06

Family

ID=44191389

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2009253926A Expired - Fee Related JP5564898B2 (en) 2009-11-05 2009-11-05 Information search program and information search apparatus

Country Status (1)

Country Link
JP (1) JP5564898B2 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5519608A (en) * 1993-06-24 1996-05-21 Xerox Corporation Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation
JP2002132812A (en) * 2000-10-19 2002-05-10 Nippon Telegr & Teleph Corp <Ntt> Question answering method, question answering system and recording medium recording question answering program
JP4181818B2 (en) * 2002-08-08 2008-11-19 株式会社リコー SEARCH DEVICE, SEARCH METHOD, AND SEARCH PROGRAM
CN101339551B (en) * 2007-07-05 2013-01-30 日电(中国)有限公司 Natural language query requirement expansion equipment and method thereof

Also Published As

Publication number Publication date
JP2011100258A (en) 2011-05-19

Similar Documents

Publication Publication Date Title
US8874590B2 (en) Apparatus and method for supporting keyword input
US9298813B1 (en) Automatic document classification via content analysis at storage time
US20110219299A1 (en) Method and system of providing completion suggestion to a partial linguistic element
JP2937520B2 (en) Document search device
JP2005128872A (en) Document retrieving system and document retrieving program
JP5564898B2 (en) Information search program and information search apparatus
US11150871B2 (en) Information density of documents
KR102340403B1 (en) Method and apparatus for managing travel recommended items using language units
JP2005234772A (en) Documentation management system and method
JP6800478B2 (en) Evaluation program for component keywords that make up a Web page
JP4449539B2 (en) Information display control device and information display control program
JPH06348756A (en) Index preparing device and index utilizing device
JP7522885B1 (en) Information processing device, information processing system, and program
JP3056810B2 (en) Document search method and apparatus
JPS63175965A (en) document processing device
JP2004334690A (en) Character data input / output device, character data input / output method, character data input / output program, and computer-readable recording medium
JP3498635B2 (en) Information retrieval method and apparatus, and computer-readable recording medium
JP2005044071A (en) Electronic dictionary
JP2002312363A (en) Information distribution method and information distribution device
JP5588901B2 (en) Document processing apparatus and program
JP2006172029A (en) Search result presentation method
JPH08153112A (en) Document creating apparatus and document creating method
CN120705119A (en) Information processing system, program product, and information processing method
JP2003150546A (en) Data input device and input method to data input form
JP5344649B2 (en) Character string conversion apparatus, character string conversion method, program, and recording medium

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20121023

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20130827

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20130903

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20131105

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20140107

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20140218

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20140325

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20140425

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20140520

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20140602

R150 Certificate of patent or registration of utility model

Ref document number: 5564898

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

LAPS Cancellation because of no payment of annual fees