JP3588510B2

JP3588510B2 - Information filtering device

Info

Publication number: JP3588510B2
Application number: JP31330195A
Authority: JP
Inventors: 哲也酒井; 誠司三池; 一男住田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-11-30
Filing date: 1995-11-30
Publication date: 2004-11-10
Anticipated expiration: 2015-11-30
Also published as: JPH09153063A

Description

【０００１】
【発明の属する技術分野】
この発明は、膨大な数のテキスト情報からユーザの要求・興味にあったものを選出してユーザに提示する情報フィルタリング装置に関する。
【０００２】
【従来の技術】
近年、ワードプロセッサや電子計算機の普及、インターネットなどの計算機ネットワークを介した電子メールや電子ニュースの普及に伴ない、文書の電子化が加速的に進みつつある。電子出版という言葉が示すように、今後は新聞、雑誌や本の情報も電子的に提供されることが一般的になると考えられる。これにより、個人にとってリアルタイムで入手可能となるテキスト情報の量は膨大になっていくと予測される。
【０００３】
これに伴ない、新聞や雑誌などの膨大なテキスト情報からユーザの要求・興味にあったものを選出して随時ユーザに提供する情報フィルタリングシステムの需要が高まりつつある。
【０００４】
これまでに実現されている情報フィルタリングシステムには、提示したテキストの適合性をユーザに評価させ、プロファイルと呼ばれるユーザの興味にあったテキストを検索するための検索条件にその結果をフィードバックさせることによって、個々のユーザに対するテキストの適合性を高めていくというレレバンスフィードバック機能を実現しているものがある。
【０００５】
しかし、今後、情報フィルタリングシステムは、たとえば研究所の特定のテーマをもった研究グループのように、同一の分野に興味をもった複数の人間にも活用されるようになると考えられる。従来の情報フィルタリングシステムにおけるレレバンスフィードバックは、あくまでも個々のユーザに対応させるためのものであり、このような複数のユーザの要求を総括的に分析してフィードバックを行なうことはできなかった。また、個々のユーザが行なうテキストの適合性判定は、一貫性および信頼性に乏しい場合があるため、フィードバックにより必ずしもテキストの適合性が高まるとは限らず、より信頼性の高いレレバンスフィードバック機能の実現が望まれている。
【０００６】
また、従来の情報フィルタリングシステムでは、レレバンスフィードバックにより新しい検索語をプロファイルに追加することはできても、もはやテキスト中で使用されないようになり時代おくれとなった検索語を自動的に削除することはできなかった。したがって、言葉のはやりすたりや、話題の移りかわりに追従できるような情報フィルタリングシステムの実現が望まれている。
【０００７】
【発明が解決しようとする課題】
前述したように、従来の情報フィルタリングシステムにおけるレレバンスフィードバックは、あくまでも個々のユーザに対応させるためのものであり、このような複数のユーザの要求を総括的に分析してフィードバックを行なうことができないといった問題があった。
【０００８】
また、従来の情報フィルタリングシステムでは、レレバンスフィードバックにより新しい検索語をプロファイルに追加することはできても、もはやテキスト中で使用されないようになり時代おくれとなった検索語を自動的に削除することはできないといった問題があった。
【０００９】
この発明はこのような実情に鑑みてなされたものであり、同一の分野に興味をもった人間のグループが情報フィルタリングシステムを共有している場合に、グループのメンバが個々に行なったレレバンスフィードバック情報を共通のプロファイルに反映させることを可能にすることにより、信頼性の高いレレバンスフィードバックを実現し、共通のプロファイルの更新と、個々のユーザーのプロファイルの更新とを効率的に行ない、さらに、プロファイル中で古くなった検索条件および検索語を自動的に削除することにより、時代に即した知識のみを用いた検索を実現する情報フィルタリングを提供することを目的とする。
【００１２】
【課題を解決するための手段】
この発明は、複数のテキスト情報の中から所望のテキスト情報を選出してユーザに提示する情報フィルタリング装置において、複数のユーザによって構成されるグループ毎の検索条件を保持する第１の保持手段と、ユーザ毎の検索条件を保持する第２の保持手段と、前記第１の保持手段および前記第２の保持手段に保持された検索条件に合致する前記テキスト情報を選出する手段と、この手段で選出された前記テキスト情報を前記グループを構成するユーザに提示する手段と、この手段で提示された前記テキスト情報に対するユーザの評価結果であるレレバンスフィードバック情報を収集する手段と、この手段で収集された全ユーザについての前記レレバンスフィードバック情報を解析して前記第１の保持手段または前記第２の保持手段に保持された検索条件に反映させるべき語であるフィードバック情報を抽出し、この抽出したフィードバック情報を前記第１の保持手段に保持された検索条件に反映させるべきものと前記第２の保持手段に保持された検索条件に反映させるべきものに振り分ける手段と、この手段で抽出されて前記第１の保持手段に保持された検索条件に反映させるべきものとして振り分けられたフィードバック情報をもとに前記第１の保持手段に保持された検索条件を修正すると共に、前記第２の保持手段に保持された検索条件に反映させるべきものとして振り分けられたフィードバック情報をもとに前記第２の保持手段に保持された検索条件を修正する手段とを具備したことを特徴とする。
【００１３】
この発明においては、個々のユーザのフィードバック情報を共通プロファイル更新用の情報とユーザ毎のプロファイル更新用の情報とに振り分けてフィードバックを行なうため、グループとして信頼性の高いレレバンスフィードバック機能を実現しつつ、メンバに共通な情報はなるべく共通プロファイルの更新に利用し、それ以外のメンバに固有な情報をメンバ毎のプロファイルの更新に利用することにより、メンバそれぞれに対しては、より適合した情報フィルタリングが実現でき、かつフィードバックのためのシステムの処理量および情報フィルタリングのための記憶容量を大幅に節約できる。
【００１８】
【発明の実施の形態】
以下、図面を参照してこの発明の実施の形態について説明する。
【００１９】
（第１実施形態）
まず、図１を参照して第１実施形態の情報フィルタリングシステムの利用形態について説明する。
【００２０】
図１に示すように、本実施形態では、共通の興味をもった５人のユーザ３（ユーザＡ，Ｂ，Ｃ，Ｄ，Ｅ）が、情報フィルタリングシステム１を共有している。そして、この情報フィルタリングシステム１には、テキスト情報源２から随時テキスト情報が到着する。たとえば、ユーザＡ〜Ｅが半導体の研究を行なっているグループである場合に、情報フィルタリングシステム１に、「われわれは半導体に関するテキスト情報に興味がある」と登録すると、これがプロファイルという検索条件に変換される。以後、情報フィルタリングシステム１は、新着情報の中から自動的に半導体に関するテキスト情報のみを抽出し、ユーザＡ〜Ｅにこのテキスト情報を提示する。
【００２１】
本実施形態においては、情報フィルタリングシステム１は、ユーザＡ〜Ｅという一つのグループに対して一つの共通プロファイル１０をもっており、この共通プロファイル１０を検索条件に用いて情報の絞りこみを行なうので、図中のユーザＡ〜Ｅに提示されるテキスト情報は同じものとなる。
【００２２】
（システムの構成）
図２には、本実施形態の情報フィルタリングシステム１の機能構成が示されている。図中、実線の矢印はデータの流れを表している。
【００２３】
情報フィルタリングシステム１は、図示のように、テキスト情報解析部１６、テキスト情報記憶部１７、テキスト情報検索部１４、テキスト情報出力部１５、ユーザ情報入力部１１、ユーザ情報解析部１２およびユーザ情報記憶部１３から構成されている。これら構成要素のうち、鎖線で囲まれているテキスト情報解析部１２、テキスト情報検索部１４およびユーザ情報解析部１６は、計算機の中央処理装置によって実行されるソフトウエアによって実現でき、またテキスト情報記憶部１３およびユーザ情報記憶部１７は、計算機の主記憶装置やハードディスク装置などによって実現できる。さらにテキスト情報出力部１５は、ユーザ３にテキスト情報を提示するためのＣＲＴディスプレイなどから構成され、ユーザ情報入力部１１は、ユーザ３が興味のあるトピックやレレバンスフィードバック情報を入力するためのキーボードやマウスなどから構成される。
【００２４】
図３に、テキスト情報解析部１６の処理の流れの一例を示す。
【００２５】
テキスト情報解析部１６は、はじめにテキスト情報源２からテキスト情報を取り込む（ステップＡ１）。ここで、テキスト情報源２とは、新聞社や出版社のようにテキスト情報を生成して情報フィルタリングシステムに提供してくれる機関や、電子メールシステムや文書検索システムのようにテキスト情報を扱う別個のシステムや、計算機ネットワーク上でテキスト情報を一般公開しているサイトなどを指す。
【００２６】
テキスト情報解析部１６は、入力されたテキスト情報に対して形態素解析、構文解析、意味解析および書式解析などを行ない、単語、句、文および段落などのテキスト構成要素に関する頻度情報や位置情報、テキストの主題や５Ｗ１Ｈ的な情報を抽出する（ステップＡ２）。
【００２７】
そして、この抽出した情報により個々のテキストを表現する（ステップＡ３）。続いて、テキスト情報から抽出した情報をテキスト情報検索部１４が検索できる形式に変換し（ステップＡ４）、これらをテキスト情報記憶部１７に格納する（ステップＡ５）。これは、通常の情報検索におけるインデキシング処理に相当する。
【００２８】
図４には、テキスト情報解析部１６により表現されたテキスト情報の一例が示されている。
【００２９】
この図は、「○○社と△△社が今月１７日に、□□県××市に半導体の合弁会社を設立する」という内容の新聞記事を○×新聞社から受信した場合に得られるテキスト情報の表現例である。
【００３０】
図５に、テキスト情報検索部１４の処理の流れの一例を示す。
【００３１】
テキスト情報検索部１４は、はじめに、ユーザの興味を表現した検索条件であるプロファイルをユーザ情報記憶部１３から取り出す（ステップＢ１）。複数のユーザ３が「半導体に関する情報に興味がある」と情報フィルタリングシステム１に登録していたとすると、これらのユーザ３に対するプロファイル１０には、たとえば図６に示すような検索条件が記述されることになる。
【００３２】
図６には、「半導体」に関係するテキスト情報を検索するための検索条件が列挙されている。「（条件１）」は、テキスト中に「半導体」という語が出現しているか否かを判定するものであり、「（条件２）」は、「メモリ」という語と「半導体」という語がテキスト中に共起しているか否かを検査するものである。これらは従来のキーワードによるプール検索の検索条件に相当するものである。「（条件３）」は、テキストの見出しに特定の半導体会社の社名が出現するテキストを検索するものであり、単語の出現位置の情報と、半導体会社の社名に関する知識を用いた検索条件となっている。「（条件４）」は、単語の頻度情報に基づく検索条件である。「（条件５）」は、「ＤＲＡＭ」や「フラッシュメモリ」といった「半導体」の分野の関連語を利用した検索条件である。
【００３３】
テキスト情報検索部１４は、図６に示したようなプロファイル１０を検索条件に用い、テキスト情報記憶部１７に記憶されたテキストを検索対象にして、テキスト検索を行なう（ステップＢ２）。ここで、テキスト検索とは、たとえば図６に示したような検索条件を満たすテキスト情報を選出することに相当し、具体的には、検索条件を満たすテキストと満たさないテキストに振り分けたり、検索条件を満たす度合いによってテキストの順位付けを行なうことをいう。たとえば、後者の場合、検索により順位付けされた上位のテキストのうち、数件がユーザ提示用に選出され、テキスト情報出力部１５に渡される（ステップＢ３）。テキスト検索の具体的な手法としては、たとえば文献（「ＳＭＡＲＴ情報検索システム」、ジェラルド・サントン編著、神保健二監訳、企画センタ）に開示されている技術などを採用すればよい。
【００３４】
図７に、テキスト情報出力部１５の処理の流れの一例を示す。
【００３５】
テキスト情報出力部１５は、ユーザ３に提示するテキスト情報をテキスト情報検索部１４から受け取り（ステップＣ１）、これをユーザ３に提示する（ステップＣ２）。
【００３６】
図８に、ユーザ情報入力部１１の処理の流れの一例を示す。
【００３７】
ユーザ情報入力部１１は、ユーザ３からユーザ情報を受け付け（ステップＤ１）、ユーザ情報解析部１２にわたす（ステップＤ２）。このユーザ情報には、以下に示す２種類が存在する。
【００３８】
第１は、たとえば「半導体に関する記事がほしい。」のように、ユーザ３が情報フィルタリングシステム１に対して予め指定する、ユーザ３がどのようなテキスト情報を求めているかに関する情報である。ここでは、この種の情報を初期設定情報と呼ぶことにする。
【００３９】
第２は、システムが提示したテキスト情報の適合性をユーザ３が判定したレレバンスフィードバック情報である。これは、ユーザ３に提示する記事がよりユーザ３の要求に合ったものになるようにプロファイル１０を修正するためのものであり、具体的にはたとえば図９および図１０のような形態の情報が考えられる。
【００４０】
図９は、ユーザ３が提示されたテキスト情報の各々に対して、「要／やや要／不要」の３段階評価を行なった情報の一例である。
【００４１】
この例では、たとえば「要」と判定されている「テキスト１」や「テキスト２」に含まれる単語から有用なものを抽出してプロファイル１０に追加する、などの処理を行なうことにより、次回からは、よりフィルタリング結果が得られる可能性がある。
【００４２】
また、図９の変形例として、適合性の判定をテキスト単位ではなく、テキストの構成要素単位で行なってもよい。たとえば、ユーザ３に提示されたテキストの文や段落を抜きだして、「この部分は有用だった」といった情報をシステムにフィードバックすることが考えられる。さらに、図９では３段階評価が行なわれているが、これを拡張して数値により適合性を評価させるようにしてもよい。
【００４３】
図１０は、キーボードなどを介してユーザ３により与えられた自然言語によるレレバンスフィードバック情報の例である。フィルタリング結果の上位に「半導体製造装置」に関する記事が提示されたが、ユーザ３は「半導体製造装置」についてはあまり興味がない場合、図中の（Ａ）のような要望をシステムに返すことにより、次回からは、「半導体製造装置」という語を含むテキストの点数を下げてもらうことが考えられる。また、プロファイル１０中の検索語に重要度が付与されているような検索方式の場合には、図中の（Ｂ）のように、「フラッシュメモリよりもＤＲＡＭを重視せよ」といった要求を出すことにより、プロファイル中の検索語の重要度を変更することが考えられる。
【００４４】
図１１に、ユーザ情報解析部１２の処理の流れの一例を示す。
【００４５】
ユーザ情報解析部１２は、ユーザ情報記憶部１３に既に情報が格納されているか否かを判定し（ステップＥ１，Ｅ２）、この判定結果にしたがって２通りの動作を行なう。ユーザ情報記憶部１２が空である場合は（ステップＥ２のＹ）、初期選択情報解析処理を行ない（ステップＥ３）、空でない場合はレレバンスフィードバック情報解析処理を行なう（ステップＥ４）。
【００４６】
図１２に、初期選択情報解析処理の流れの一例を示す。
【００４７】
初期選択情報解析処理においては、ユーザ情報解析部１２は、予め準備された言語解析に必要な解析用辞書１０１などを参照して初期選択情報を解析し、選択された話題を表す語や表現を特定する（ステップＦ１）。次に、選択された話題に関連する検索語やその同義語などに関する知識を得るために、予め準備されたトピック知識を参照し検索語を決定する（ステップＦ２）。そして、この決定した検索語を用い、図６に示したようなプロファイル１０を記述する。このような共通プロファイル１０の生成は、システムが自動的に行なってもよいし、生成したプロファイル１０をユーザ３に修正させるなどして半自動で行なってもよい。この生成された共通プロファイル１０は、ユーザ情報記憶部１３に記憶する（ステップＦ３）。
【００４８】
図１３に、前述したトピック知識の一例を示す。
【００４９】
図１３（ａ）は、検索語間の関係を記したトピックの知識の例である。たとえば、「半導体メモリ」の下位概念には「ＲＯＭ」や「ＲＡＭ」があることが記されているので、「半導体メモリ」という話題に対するプロファイル１０を記述する際に、「ＲＯＭ」や「ＲＡＭ」などを検索語として用いることができる。また、図１３（ｂ）は、同義語情報に関するトピック知識の例である。このような知識を利用し、プロファイルに「ＲＯＭ」だけでなく「読み出し専用メモリ」という同義語も検索語として登録しておけば、見逃しの少ない検索を行なうことができる。
【００５０】
図１４に、レレバンスフィードバック情報解折処理の流れの一例を示す。
【００５１】
レレバンスフィードバック情報解折処理においては、ユーザ情報解析部１２は、予め準備された言語解析に必要な解析用辞書１０１などを参照して各々のユーザ３のレレバンスフィードバック情報を解析する（ステップＧ１〜ステップＧ４）。次に、これらの情報の中からユーザ３に共通のプロファイル１０に反映させるフィードバック情報を選出する（ステップＧ５）。そして、たとえば文献（「ＳＭＡＲＴ情報検索システム」、ジェラルド・サルトン編著、神保健二監訳、企画センタ）に開示されているようなレレバンスフィードバック手法を用いて共通プロファイルを更新し、これをユーザ情報記憶部１３に格納する（ステップＧ６）。なお、プロファイル更新の際に新しい単語を追加するときなどには、その単語の関連語に関する情報などを得るために、その話題に関するトピック知識１０２を参照してもよい。
【００５２】
従来のレレバンスフィードバック処理と本実施形態におけるレレバンスフィードバック情報解析処理との違いは、前者は単一ユーザから得たフィードバック情報を単一プロファイルに反映させるだけであるのに対して、後者は複数ユーザから得たフィードバック情報の中から共通のプロファイル１０に反映させるべき情報を選出してフィードバックを行なうということである。
【００５３】
図１５に、複数ユーザから得たレレバンスフィードバック情報の例を示す。
【００５４】
この例では、同一の話題に興味をもつ３人のユーザ３が情報フィルタリングシステム１を共有しており、彼らに共通に提示された３つのテキスト情報に対する各々の適合性判定結果が「○」あるいは「×」で示されている。ユーザＡ、Ｂ、Ｃは、同一の話題に興味をもつため、「テキスト１」や「テキスト３」に対する適合性判定結果のように、各ユーザ３の判定結果は一般的には一致すると考えられる。しかしながら、個々のユーザ３の関心の若干の食い違いやその分野に関する知識の違い、または判定時の気分や忙しさなどによって、図中の「テキスト２」の判定結果のように食い違いが出てくることが考えられる。この例では、ユーザＡおよびユーザＢが「テキスト２」を「有用」と判定しているにも関わらず、ユーザＣは「不要」と判定している。このような場合、たとえば、多数決で「テキスト２」は有用であるとしてフィードバック処理を行なえば、信頼性の高いフィードバックが行なえると考えられる。さらに、この変形例として、たとえば図１５のユーザＡ、Ｂ、ＣのうちユーザＡが最も信頼できる適合性判定者であるという情報を予めシステムに与えておけば、ユーザＡのフィードバック情報を重視したレレバンスフィードバックを行なうことも可能である。
【００５５】
図１６に、複数ユーザから得たレレバンスフィードバック情報の変形例を示す。
【００５６】
このようなフィードバック情報は、たとえば、各々のユーザ３が「有用である」と判定したテキストまたはその一部に頻繁に出現する語句を抽出することにより得ることができる。この例では、ユーザＡの指定したテキストあるいはその一部には、「６４メガＤＲＡＭ」および「半導体合弁会社」という語が頻出していたということになる。ユーザＢおよびユーザＣについても同様である。「半導体合弁会社」、「メモリ特許」および「半導体製造装置」などの語は、一人のユーザ３のフィードバック情報にしか含まれていないのに対して、「６４メガＤＲＡＭ」という語は全ユーザのフィードバック情報に含まれている。このような場合、「６４メガＤＲＡＭ」は、全ユーザが有用であると判定したテキストまたはその部分に含まれていた語であるので、共通プロファイル１０に反映させる情報としては最も重要なものであると考えられる。そこで、「６４メガＤＲＡＭ」のみを共通プロファイルに反映させたり、あるいは図１６に示した語すべてを共通プロファイル１０に反映させる際にも、「６４メガＤＲＡＭ」の重みを他の語よりも高くしたりすれば、複数のユーザ３の多数決をもとにした信頼性の高いレレバンスフィードバックを行なうことができる。具体的なプロファイル１０への反映方法としては、たとえば図６における「（条件５）」のところに、「６４メガＤＲＡＭ」を追加したり、「（条件１）」から「（条件５）」のすべてに「６４メガＤＲＡＭ」を追加したりすればよい。
【００５７】
以上では、複数のユーザ３の多数決に基づくレレバンスフィードバックについて説明したが、これ以外の方針によって複数のユーザ３のフィードバック情報から共通プロファイル１０に反映させる情報を決定することも可能である。たとえば、各々のテキスト情報について、それが有用であるというユーザ３がグループ中に一人でもいれば、そのテキスト情報をフィードバックに用いるということが考えられる。この場合、図１５の例では、多数決の場合と同様、「テキスト１」と「テキスト２」とが有用であると判断できる。グループが、全体としてなるべく洩れのないフィルタリングを求めている場合、このような方針が有効となる場合があると考えられる。同様に、各々のテキストについて、それが不要であるというユーザ３が一人でもいれば、そのテキストはレレバンスフィードバックに用いることはしない、などの方針を採用することも考えられる。この場合、図１５の例では、「テキスト１」のみがレレバンスフィードバックに適する情報として採用される。したがって、以上のような方針をユーザ３が逐次指定できるようにし、指定されている方針に応じてレレバンスフィードバックに採用する情報の選出方法を切替えるようにしてもよい。
【００５８】
（第２実施形態）
次に、図１７を参照して第２実施形態の情報フィルタリングシステムの利用形態について説明する。
【００５９】
第１実施形態と本実施形態の違いは、前者が複数ユーザ３に共通の一つのプロファイル１０を有し、これを用いた一つのフィルタリング結果を全ユーザ３に提示するものであるのに対し、後者は共通のプロファィル１０とユーザ３毎のプロファイル１８との両方を有し、最終的にはユーザ３毎にカスタマイズされた情報を個々のユーザに提示することである。本実施形態における共通プロファイル１０は、全ユーザ３に共通な情報要求を反映したものであり、ユーザプロファイル１８は、ユーザ３固有の情報要求を反映したものである。たとえば、図中のユーザＡ〜Ｅが同一テーマの研究を行なっているグループであり、はじめに情報フィルタリングシステム１に「半導体に関するテキスト情報に興味がある」と登録したとしても、時間とともに、また新しいテキスト情報を取り入れていくとともに、個々のユーザ３の要求が細かい点で変わってくる可能性がある。本実施形態では、このようなことに対処するために、複数のユーザ３から得たレレバンスフィードバック情報を共通プロファイル１０に反映させる情報と、ユーザプロファイル１８に反映させる情報とに振り分けるものである。
【００６０】
本実施形態の機器構成は、図２に示した第１実施形態のものと同じである。また、テキスト情報解析部１６およびユーザ情報入力部１１の機能も第１実施形態で説明したものと同じである。ここでは、第１実施形態と異なる点のみについて説明する。
【００６１】
図１８に、本実施形態におけるテキスト情報検索部１４の処理の流れの一例を示す。
【００６２】
第１実施形態のテキスト情報検索部１４の処理の流れと本実施形態との違いは、後者が共通プロファイル１０と、個々のユーザプロファイル１８とを融合したものを検索条件として検索を行ない、個々のユーザ３毎に検索結果を得ることである。すなわち、ユーザＡのためには、共通プロファイル１０とユーザＡ用のユーザプロファイル１８とから検索条件を作成して（ステップＨ４）、この検索条件にしたがって検索を行ない（ステップＨ５）、検索されたテキスト情報をテキスト情報出力部１５に渡す（ステップＨ６）。同様にユーザＢのためには、共通プロファイル１０とユーザＢ用のユーザプロファイル１８から検索条件を作成して検索を行ない、検索されたテキスト情報をテキスト情報出力部１５に渡す。これを全ユーザ３に対して行なう。
【００６３】
図１９に、本実施形態におけるテキスト情報出力部１５の処理の流れの一例を示す。
【００６４】
第１実施形態のテキスト情報出力部１５の処理の流れと、本実施形態との違いは、後者はユーザ３毎にフィルタリング結果を出力することである。
【００６５】
本実施形態におけるユーザ情報解析部１２の処理の流れのうち、レレバンスフィードバック情報解析処理のみが第１実施形態と異なるので、以下にこれを説明する。
【００６６】
図２０に、本実施形態におけるレレバンスフィードバック情報解析処理の流れの一例を示す。
【００６７】
第１実施形態のレレバンスフィードバック情報解析処理の流れと本実施例との違いは、後者においては、複数のユーザ３から得たレレバンスフィードバック情報を共通プロファイル１０に反映する情報と、個々のユーザプロファイル１８に反映する情報とに振り分けてからフィードバックを行なう点である。これを図１６に示したようなフィードバック情報が得られた場合を例にとって説明する。
【００６８】
図１６は、「半導体」に関心をもっている３人のユーザ３が個々にレレバンスフィーヘドバックを行ない、その情報から抽出された「６４メガＤＲＡＭ」や「半導体合弁会社」などの語を表している。この例では、３人のフィードバック情報に共通して「６４メガＤＲＡＭ」という語が出現しているので、この語は個々のユーザ３の細かい嗜好を表す語というよりも、むしろ「半導体」という話題に関する大元の検索条件をより時代の流れに即したものに修正するのに役立つ情報である可能性がある。たとえば、従来のテキストには「１６メガＤＲＡＭ」という語しか出現しなかったが、新たに「６４メガＤＲＡＭ」が開発され、この語がテキスト中で一般に使われるようになってきたような場合である。このような場合に、共通プロファイル１０、すなわち「半導体」に関する一般的なテキスト情報を得るための大元の検索条件に、「６４メガＤＲＡＭ」という語を新規登録する。より一般的には、多くのユーザ３のフィードバック情報に共通に出現した単語は共通プロファイル１０の更新に用いるようにする。たとえば、図１６に示した単語のうち、２人以上のフィードバック情報に出現した単語は共通プロファイル１０へのフィードバックに用い、残りの語は個々のユーザプロファイル１８へのフィードバックに用いることにすると、「６４メガＤＲＡＭ」のみが共通プロファイル１０にフィードバックされることになる（ステップＪ５）。
【００６９】
共通プロファイル１０に関するフィードバックが行われた後に、今度は個々のユーザプロファイル１８に関するフィードバック処理を行なう（ステップＪ７）。図１６に示した例では、ユーザＡに固有のフィードバック情報として、「半導体合弁会社」という語が得られている。そこで、ユーザＡのユーザプロファイル１８に「半導体合弁会社」を登録する。同様にして、ユーザＢのユーザプロファイル１８には「メモリ特許」および「ＳＲＡＭ」を、ユーザＣのユーザプロファイル１８には「半導体製造装置」を登録する。
【００７０】
このように、共通プロファイル１０へのフィードバックと、ユーザプロファイル１８へのフィードバックを分けて行なうことにすれば、時間が経つにつれて必要となる、大元の一般的な検索条件の更新と、個々のユーザカスタマイゼイションを一つの枠組で行なうことができる。複数のユーザ３に対してユーザ３毎にカスタマイズされたフィルタリング結果を提示する場合、従来のシステムはユーザ３毎のプロファイル１８のみをもち、これらを個々に更新していたのに対し、本実施形態では、共通なフィードバック情報が一つの共通プロファイル１０に対してのみ反映され、ユーザプロファイル１８には共通プロファイル１０との差分のみを記述すればよいので、処理量および記憶容量の観点からもより効率的である。
【００７１】
（共通プロファイル１０と、ユーザプロファイル１８に関して区分けした検索結果の表示）
本実施形態においては、共通プロファイル１０とユーザプロファイル１８が共存するが、これに関する情報をユーザに提示することも考えられる。
【００７２】
図２１に、共通プロファイル１０と２人のユーザ３に対するユーザプロファイル１８の例を示す。
【００７３】
共通プロファイル１０には、（条件１），（条件２），（条件３），…などの検索条件とそれに対応する検索語とが記されており、ユーザＡのためのユーザプロファイル１８には、（条件Ａ１），（条件Ａ２），（条件Ａ３）などの検索条件とそれに対応する検索語とが記されている。ユーザＡのための検索は、この両者を併用して行なわれる。仮に、ユーザＡのための記事が３件得られたとする。このうち、「記事１」は、図２１の（条件１）に適合した記事であり、「記事２」は、（条件２）および（条件３）に適合した記事であり、「記事３」は、（条件Ａ１）および（条件Ａ３）に適合した記事である場合、図２２に示したような記事の提示方法が考えられる。
【００７４】
検索結果は、図示のように、「共通プロファイルの検索条件に適合した記事」と、「あなたの個人プロファイルの検索条件に適合した記事」とに区分けされており、「記事１」および「記事２」は前者の方で、「記事３」は後者の方で提示されている。これにより、ユーザ３は、提示された記事がグループ共通の興味に適合したものであるのか、または個人的な興味に適合したものであるのかを容易に知ることができる。
【００７５】
図２３は、図２２の変形例である。この例では、各々の記事に、共通プロファイル１０の貢献度と、ユーザプロファイル１８の貢献度の情報が付加されている。たとえば、「記事１」は、共通プロファイル１０の検索条件のうち３つを満たしたために、３０点の部分点を与えられ、ユーザプロファイル１８の検索条件のうち７つを満たしたために、７０点の部分点を与えられたという情報が図示のように表示されている。この例では、「記事１」が最も個人的な興味に適合したものであり、「記事３」が最もグループ共通の興味に適合したものであることがわかる。
【００７６】
図２２や図２３のような検索結果とともに、共通プロファイル１０のサイズとユーザプロファイル１８のサイズの比に関する情報を提供してもよい。ここで、プロファイルのサイズとは、プロファイル中の検索語や検索条件の数、検索語の重みの和などの値をいう。たとえば、「共通プロファイルの語数：ユーザプロファイルの語数」が５０：２０てあるようなユーザは、それが５０：３であるユーザよりもはるかに個人的な興味を反映した、グループの他のメンバとは異なる検索結果を得ていることがわかる。
【００７７】
（第３実施形態）
次に、図２４を参照して第３実施形態の情報フィルタリングシステムの利用形態について説明する。
【００７８】
本実施形態は、ユーザ３が必ずしも複数存在する必要がないことを除けば、第１および第２実施形態と同じである。
【００７９】
図２５に、本実施形態における機器構成を示す。
【００８０】
第１および第２実施形態との違いは、ユーザ情報解析部１２とユーザ情報記憶部１３との間にユーザ情報管理部２０を具備している点である。テキスト情報解析部１６、テキスト情報出力部１５、ユーザ情報入力部１１の機能は第１実施形態と同じである。また、ユーザ情報解析部１２の機能は、解析したユーザ情報を直接ユーザ情報記憶部１３に記憶する代わりに、ユーザ情報管理部２０に渡すところのみが図１１、図１２および図１４で示したものと異なる。したがって、ここでは、第１実施形態と異なるテキスト情報検索部１４、ユーザ情報管理部２０の機能のみについて説明する。
【００８１】
図２６に、本実施形態におけるテキスト情報検索部１４の処理の流れの一例を示す。
【００８２】
ここでは、プロファイル１９とテキスト情報との類似度を算出し、これをランキングすることにより検索を行なう検索方式の場合を例にして説明する。テキスト情報検索部１４は、新たに到着したすべてのテキスト情報に対して、以下の処理を行なう。
【００８３】
まず、通常の検索方式にしたがい、プロファイル１９とテキスト情報との類似度を計算する（ステップＫ３）。次に、プロファイル１９中の検索条件のうち前述の類似度計算においてテキスト情報に適合した条件に、現在の時刻を付加する（ステップＫ４）。同様に、プロファイル１９中の検索語のうち前述の類似度計算においてテキスト情報に適合した検索語に、現在の時刻を付加する。検索条件および検索語に付加されるこれらの時刻を、本実施形態では最新適合時刻と呼ぶことにする。テキスト情報検索部１４は、すべてのテキスト情報に対する以上の処理を終えると（ステップＫ６のＹ）、テキスト情報を類似度順にランキングし、この結果をテキスト情報出力部１５に渡す（ステップＫ７）。本実施形態におけるテキスト情報検索部１４の機能と、第１実施形態におけるそれとの違いは、前者がプロファイル１９中の検索条件および検索語に現在の時刻を記入することができる点のみである。
【００８４】
図２７に、最新適合時刻が付加された検索条件および検索語の一例を示す。
【００８５】
具体的な検索条件および検索語としては、たとえば図４に示したようなものが考えられる。ここでは、検索条件は４つあり、各条件に（検索語Ａ）〜（検索語Ｆ）が指定されている。そして、各検索条件および検索語に、最新適合時刻が付与されている。たとえば、あるテキスト情報が（検索語Ａ）および（検索語Ｂ）を含んでおり、これらの検索語が「６日１２時２５分」に（検索条件１）を満足したとすると、（検索条件１）、（検索語Ａ）、（検索語Ｂ）の各々には「６日１２時２５分」という最新適合時刻が付加される。次に、別のテキスト情報が（検索語Ａ）を含んでおり、これが「９日１１時３０分」に（検索条件１）を満足したとすると、（検索条件１）および（検索語Ｂ）には「９日１１時３０分」という最新適合時刻が付加される。この結果、プロファイル１９は図２７のようになる。以上のように、最新適合時刻は、その検索条件あるいは検索語が最近適合したのはいつかを表している。すなわち、この時刻が古いということは、その検索条件あるいは検索語が最近使われなくなったことを表しており、この時刻が新しいということは、その検索条件あるいは検索語が現在でも検索において有効に使われていることを表している。
【００８６】
図２８に、本実施形態におけるユーザ情報管理部２０の処理の流れの一例を示す。
【００８７】
ユーザ情報管理部２０は、まず図２７に示したような最新適合時刻が付加されたプロファイル１９をユーザ情報解析部１２から受け取る。次に、プロファイル１９中の各検索条件および検索語に付加された最新適合時刻と、現在時刻とを比較する。そして、一定期間どのようなテキスト情報にも適合していない検索条件および検索語をプロファイル１９から削除し、このように修正されたプロファイルをユーザ情報記憶部１３に格納する。たとえば、現在時刻が「２６日１２時３０分」であるとし、ユーザ情報管理部２０は、２０日以上適合していない検索条件および検索語を削除するようにしているとする。このとき、図２７のようなプロファイル１９がユーザ情報管理部２０に渡されると、２０日以上使われていないのは、最新適合時刻が「６日１２時２５分」である（検索条件１）の（検索語Ａ）、および（検索条件２）の（検索語Ｃ）のみである。よってこれらをプロファイル１９から削除すれば、図２９のような更新されたプロファイル１９が得られる。あるいは、この変形例として、プロファイル１９中の各検索条件や検索語に重みが付与されている場合、前述の古い検索語を削除してしまわずに、その重みだけを少なくすることも考えられる。
【００８８】
以上のような処理により、新しい検索語がレレバンスフィードバックによりプロファイルに追加されていく一方で、検索の役に立たなくなった古い検索条件や検索語はプロファイルから削除されていく。これにより、プロファイルを時代に即したものに保つことが可能であると考えられる。
【００８９】
（第４実施形態）
次に、本発明の第４実施形態について説明する。
【００９０】
本実施形態における情報フィルタリングシステム１の利用形態および機器構成は第３実施形態と同じである。また、テキスト情報解析部１６、テキスト情報検索部１４、テキスト情報出力部１５およびユーザ情報入力部１１の機能は第１実施形態と同じである。したがって、ここではユーザ情報解析部１２およびユーザ情報管理部２０の機能のみについて説明する。
【００９１】
図３０に、本実施形態におけるユーザ情報解析部１２の処理の流れの一例を示す。
【００９２】
これは、第１実施形態における図１１、図１２および図１４で示したものにほぼ対応するが、はじめにプロファイル１９を生成する際、あるいはレレバンスフィードバックによりプロファイル１９を更新する際に、各検索条件および各検索語に現在時刻を付加する点が第１実施形態例と異なる。すなわち、初期選択情報解析処理においては（ステップＭ２のＹ）、プロファイル１９中の各検索条件および各検索語に、それらが生成された時刻を付加し（ステップＭ５）、同様に、レレバンスフィードバック情報解析処理においては（ステップＭ２のＮ）、レレバンスフィードバック手法により新たに追加された検索条件および検索語に、その時刻を付加する（ステップＭ７）。検索条件および検索語に付加されるこれらの時刻を、ここではプロファイル登録時刻と呼ぶことにする。たとえば、あるプロファイル１９において、（検索条件１）の時刻が「６日１２時２５分」であり、この条件に対応する検索語としては（検索語Ａ）だけが指定されていたとする。そして、「９日１１時３０分」に、レレバンスフィードバックにより、新たに（検索語Ｂ）が（検索条件１）のところに追加されたとする。このとき、（検索語Ｂ）には「９日１１時３０分」というプロファイル登録時刻が付加され、同時に（検索条件１）のプロファイル登録時刻も「９日１１時３０分」に更新されて、新しいプロファイルは図２６のようになる。以上のように、プロファイル登録時刻は、その検索条件あるいは検索語がいつプロファイルに登録されたのかを示している。
【００９３】
図３１に、本実施形態におけるユーザ情報管理部２０の処理の流れの一例を示す。
【００９４】
ユーザ情報管理部２０は、まず図２７に示したようなプロファイル登録時刻が付加されたプロファイル１９を、ユーザ情報解析部２０から受け取る（ステップＮ１）。次に、プロファイル１９中の各検索条件および検索語に付加されたプロファイル登録時刻と、現在時刻とを比較する（ステップＮ２）。そして、プロファイル１９に登録されてから一定期間たった検索条件および検索語をプロファイル１９から削除し（ステップＮ３）、このように修正されたプロファイル１９をユーザ情報記憶部１３に格納する（ステップＮ４）。たとえば、現在時刻が「２６日１２時３０分」であるとし、ユーザ情報管理部２０は、登録されてから２０日以上たった検索条件および検索語を削除するようにしているとする。このとき、図２７に示すようなプロファイル１９がユーザ情報管理部２０に渡されると、登録されてから２０日以上たっているのは、最新適合時刻が「６日１２時２５分」である（検索条件１）の（検索語Ａ）および（検索条件２）の（検索語Ｃ）のみである。よってこれらをプロファイルから削除すれば、図２９のような更新されたプロファイルが得られる。また、第３実施形態の場合と同様に、この変形例として、プロファイル１９中の各検索条件や検索語に重みが付与されている場合に、前述の古い検索条件および検索語を削除してしまわずに、その重みだけを少なくすることも考えられる。
【００９５】
第３実施形態と本実施形態との違いは、前者が検索条件および検索語がテキストに適合した最新の時刻を検索時にプロファイルに付加するものであるのに対し、後者が検索条件および検索語がプロファイルに登録された時刻をレレバンスフィードバック時に付加するものである点である。本実施形態においても、第３実施形態と同様に、新しい検索語がレレバンスフィードバックによりプロファイルに追加されていく一方で、検索の役に立たなくなった古い検索条件や検索語はプロファイルから削除されていく。これにより、プロファイルを時代に即したものに保つことが可能であると考えられる。
【００９６】
なお、第１乃至第４実施形態で説明した本発明に係る情報フィルタリング装置は、分散したネットワーク環境のみに構築されるものではなく、単独の環境で動作するパーソナルコンピュータ上などにおいても構築可能である。
【００９７】
【発明の効果】
以上詳述したように、この発明によれば、グループのメンバが個々に行なったレレバンスフィードバック情報が、共通プロファイルに反映されるため、協調的に情報フィルタリングシステムのカスタマイゼイションを行なうことが可能となる。また、複数メンバの適合性判断に基づくため、フィードバック情報の信頼性が高まる。さらに、メンバに共通な情報はなるべく共通プロファイルの更新に利用し、それ以外のメンバに固有な情報をメンバー毎のプロファイルの更新に利用するので、システムの処理量、記憶容量が節約できる。また、古い検索条件や検索語がプロファイルから自動的に削除されるため、常に最新の知識を用いた検索を行なうことができる。
【図面の簡単な説明】
【図１】本発明の第１実施形態に係る情報フィルタリングシステムの利用形態を示す概念図。
【図２】本発明の第１および第２実施形態に係る情報フィルタリングシステムの機器構成を示す図。
【図３】本発明の第１および第２実施形態におけるテキスト情報解析部の処理の流れの一例を示す図。
【図４】本発明の第１および第２実施形態におけるテキスト情報解析部により表現されたテキスト情報の例を示す図。
【図５】本発明の第１実施形態におけるテキスト情報検索部の処理の流れの一例を示す図。
【図６】本発明の第１および第２実施形態における共通プロファイルの例を示す図。
【図７】本発明の第１実施形態におけるテキスト情報出力部の処理の流れの一例を示す図。
【図８】本発明の第１および第２実施形態におけるユーザ情報入力部の処理の流れの一例を示す図。
【図９】本発明の第１および第２実施形態におけるレレバンスフィードバック情報の例を示す図。
【図１０】本発明の第１および第２実施形態におけるレレバンスフィードバック情報の例を示す図。
【図１１】本発明の第１および第２実施形態におけるユーザ情報解析部の処理の流れの一例を示す図。
【図１２】本発明の第１および第２実施形態における初期選択情報解析処理の流れの一例を示す図。
【図１３】本発明の第１および第２実施形態におけるトピック知識の一例を示す図。
【図１４】本発明の第１実施形態におけるレレバンスフィードバック情報解析処理の流れの一例を示す図。
【図１５】本発明の第１実施形態における複数ユーザから得たレレバンスフィードバック情報の一例を示す図。
【図１６】本発明の第１および第２実施形態における複数ユーザから得たレレバンスフィードバック情報の変形例を示す図。
【図１７】本発明の第２実施形態に係る情報フィルタリングシステムの利用形態を示す概念図。
【図１８】本発明の第２実施形態におけるテキスト情報検索部の処理の流れの一例を示す図。
【図１９】本発明の第２実施形態におけるテキスト情報出力部の処理の流れの一例を示す図。
【図２０】本発明の第２実施形態におけるレレバンスフィードバック情報解析処理の流れの一例を示す図。
【図２１】本発明の第２実施形態における共通プロファイルおよび２人のユーザに対するユーザプロファイルの例を示す図。
【図２２】本発明の第２実施形態における記事の提示方法の一例を示す図。
【図２３】本発明の第２実施形態における記事の提示方法の変形例を示す図。
【図２４】本発明の第３および第４実施形態に係る情報フィルタリングシステムの利用形態を示す概念図。
【図２５】本発明の第３および第４実施形態に係る情報フィルタリングシステムの機器構成を示す図。
【図２６】本発明の第３実施形態におけるテキスト情報検索部の処理の流れの一例を示す図。
【図２７】本発明の第３および第４実施形態における最新適合時刻／プロファイル登録時刻が付加された検索条件および検索語の一例を示す図。
【図２８】本発明の第３実施形態におけるユーザ情報管理部の処理の流れの一例を示す図。
【図２９】本発明の第３実施形態における最新適合時刻をもとに更新されたプロファイルの一例を示す図。
【図３０】本発明の第４実施形態におけるユーザ情報解析部の処理の流れの一例を示す図。
【図３１】本発明の第４実施形態におけるユーザー情報管理部の処理の流れの一例を示す図。
【符号の説明】
１…情報フィルタリングシステム、２…テキスト情報源、３…ユーザ、１０…共有プロファイル、１１…ユーザ情報入力部、１２…ユーザ情報解析部、１３…ユーザ情報記憶部、１４…テキスト情報検索部、１５…テキスト情報出力部、１６…テキスト情報解析部、１７…テキスト情報記憶部、１８…ユーザプロファイル、１９…プロファイル、２０…ユーザ情報管理部、１０１…解析用辞書、１０２…トピック知識、[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an information filtering device that selects a user's request / interest from an enormous number of text information and presents it to the user.
[0002]
[Prior art]
In recent years, with the spread of word processors and electronic computers, and the spread of electronic mail and electronic news via computer networks such as the Internet, the digitization of documents is accelerating. As the term "electronic publishing" implies, it is expected that information on newspapers, magazines and books will be provided electronically in the future. As a result, the amount of text information available to individuals in real time is expected to be enormous.
[0003]
Along with this, there is a growing demand for an information filtering system that selects a user's request / interest from a huge amount of text information such as newspapers and magazines and provides the user with the information at any time.
[0004]
Information filtering systems that have been implemented so far allow the user to evaluate the suitability of the presented text and feed the results back to search conditions called profiles, which search for text that interests the user. Some implement a relevance feedback function of improving the suitability of a text for each user.
[0005]
However, in the future, information filtering systems will be used by multiple people who are interested in the same field, for example, a research group with a specific theme at a research institute. The relevance feedback in the conventional information filtering system is only for the purpose of responding to individual users, and it has not been possible to comprehensively analyze such requests of a plurality of users and to provide feedback. Also, text relevance judgments performed by individual users may not be consistent and reliable in some cases. Therefore, feedback does not always increase text relevance, and a more reliable relevance feedback function is used. Realization is desired.
[0006]
Also, with conventional information filtering systems, relevance feedback allows new search terms to be added to the profile, but automatically removes outdated search terms that are no longer used in text. Could not. Therefore, it is desired to realize an information filtering system that can change words and follow topics.
[0007]
[Problems to be solved by the invention]
As described above, the relevance feedback in the conventional information filtering system is only for the purpose of responding to individual users, and it is not possible to comprehensively analyze such requests of a plurality of users and provide feedback. There was a problem.
[0008]
Also, with conventional information filtering systems, new search terms can be added to the profile by relevance feedback, but the search terms that are no longer used in the text and are out of date are automatically deleted. Was not possible.
[0009]
The present invention has been made in view of such circumstances, and when group of people who are interested in the same field share an information filtering system, relevance feedback individually performed by members of the group. By enabling information to be reflected in a common profile, it provides reliable relevance feedback, efficiently updates common profiles and individual user profiles, and It is an object of the present invention to provide information filtering that realizes a search using only knowledge that is in keeping with the era by automatically deleting old search conditions and search words in a profile.
[0012]
[Means for Solving the Problems]
According to the present invention, in an information filtering device for selecting desired text information from a plurality of text information and presenting the selected text information to a user, a first holding unit for holding a search condition for each group constituted by a plurality of users; Second holding means for holding search conditions for each user, means for selecting the text information that matches the search conditions held in the first holding means and the second holding means, and selection by the means Means for presenting the provided text information to the users constituting the group, means for collecting relevance feedback information that is a user's evaluation result for the text information presented by the means, About all users The relevance feedback information is analyzed to extract feedback information which is a word to be reflected in the search condition held in the first holding means or the second holding means, and the extracted feedback information is referred to as the first feedback information. Means for sorting the information to be reflected in the search condition held in the holding means and the means to be reflected in the search condition held in the second holding means; and the means extracted and stored in the first holding means The search condition held in the first holding unit is corrected based on the feedback information sorted as to be reflected in the held search condition, and the search condition held in the second holding unit is added to the search condition. Means for correcting the search condition held in the second holding means based on the feedback information sorted as one to be reflected. Characterized in that Bei was.
[0013]
In the present invention, the feedback information of each user is distributed to the information for updating the common profile and the information for updating the profile for each user, and the feedback is performed. Therefore, a highly reliable relevance feedback function as a group is realized. By using information common to members as much as possible to update common profiles and using information unique to other members to update profiles for each member, more appropriate information filtering can be performed for each member. This can be realized and the amount of processing of the system for feedback and the storage capacity for information filtering can be greatly saved.
[0018]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0019]
(1st Embodiment)
First, a usage form of the information filtering system according to the first embodiment will be described with reference to FIG.
[0020]
As shown in FIG. 1, in the present embodiment, five users 3 (users A, B, C, D, and E) having a common interest share the information filtering system 1. Then, text information arrives at the information filtering system 1 from the text information source 2 at any time. For example, if users A to E are a group conducting semiconductor research and register in the information filtering system 1 "we are interested in text information on semiconductors", this is converted into a search condition called a profile. You. Thereafter, the information filtering system 1 automatically extracts only the text information relating to the semiconductor from the new arrival information and presents the text information to the users A to E.
[0021]
In the present embodiment, the information filtering system 1 has one common profile 10 for one group of users A to E, and narrows down information using this common profile 10 as a search condition. The text information presented to the users A to E is the same.
[0022]
(System configuration)
FIG. 2 shows a functional configuration of the information filtering system 1 of the present embodiment. In the figure, solid arrows indicate the flow of data.
[0023]
As illustrated, the information filtering system 1 includes a text information analysis unit 16, a text information storage unit 17, a text information search unit 14, a text information output unit 15, a user information input unit 11, a user information analysis unit 12, and a user information storage unit. It comprises a unit 13. Among these components, the text information analysis unit 12, the text information search unit 14, and the user information analysis unit 16 surrounded by a chain line can be realized by software executed by a central processing unit of a computer. The unit 13 and the user information storage unit 17 can be realized by a main storage device or a hard disk device of a computer. Further, the text information output unit 15 includes a CRT display or the like for presenting text information to the user 3. The user information input unit 11 includes a keyboard for inputting a topic of interest to the user 3 and relevance feedback information. And a mouse.
[0024]
FIG. 3 shows an example of the processing flow of the text information analysis unit 16.
[0025]
The text information analysis unit 16 first takes in text information from the text information source 2 (step A1). Here, the text information source 2 is an organization that generates text information and provides it to an information filtering system, such as a newspaper company or a publisher, or a separate entity that handles text information, such as an e-mail system or a document search system. Systems and sites that make text information publicly available on computer networks.
[0026]
The text information analysis unit 16 performs morphological analysis, syntax analysis, semantic analysis, format analysis, and the like on the input text information, and performs frequency information and position information on text components such as words, phrases, sentences, and paragraphs, and text data. (5) A subject and 5W1H information are extracted (step A2).
[0027]
Then, individual texts are represented by the extracted information (step A3). Subsequently, the information extracted from the text information is converted into a format that can be searched by the text information search unit 14 (step A4), and these are stored in the text information storage unit 17 (step A5). This corresponds to an indexing process in a normal information search.
[0028]
FIG. 4 shows an example of the text information expressed by the text information analysis unit 16.
[0029]
This figure is obtained when a newspaper article from XX Newspaper Company, which reads, "XX and XX companies will establish a semiconductor joint venture in XX Prefecture XX City on 17th of this month". It is an expression example of text information.
[0030]
FIG. 5 shows an example of the processing flow of the text information search unit 14.
[0031]
First, the text information search unit 14 extracts a profile, which is a search condition expressing the user's interest, from the user information storage unit 13 (step B1). Assuming that a plurality of users 3 have registered in the information filtering system 1 that they are "interested in information on semiconductors", for example, search conditions as shown in FIG. become.
[0032]
FIG. 6 lists search conditions for searching for text information related to “semiconductor”. “(Condition 1)” determines whether or not the word “semiconductor” appears in the text. “(Condition 2)” indicates that the words “memory” and “semiconductor” This is to check whether they co-occur in the text. These correspond to the search conditions of a conventional pool search using a keyword. “(Condition 3)” is for searching for a text in which the name of a specific semiconductor company appears in the headline of the text, and is a search condition using information on the appearance position of a word and knowledge of the company name of the semiconductor company. ing. “(Condition 4)” is a search condition based on word frequency information. “(Condition 5)” is a search condition using a related word in the field of “semiconductor” such as “DRAM” or “flash memory”.
[0033]
The text information search unit 14 performs a text search on the text stored in the text information storage unit 17 using the profile 10 as shown in FIG. 6 as a search condition (step B2). Here, the text search corresponds to, for example, selecting text information that satisfies the search conditions as shown in FIG. 6. Specifically, the text search is divided into texts that satisfy the search conditions and texts that do not satisfy the search conditions. This means that texts are ranked according to the degree of satisfying. For example, in the case of the latter, some of the top texts ranked by the search are selected for presentation by the user and passed to the text information output unit 15 (step B3). As a specific method of the text search, for example, a technique disclosed in a document (“SMART information search system”, edited by Gerald Sandton, translated by Shin Kenko, Planning Center) may be adopted.
[0034]
FIG. 7 shows an example of a processing flow of the text information output unit 15.
[0035]
The text information output unit 15 receives text information to be presented to the user 3 from the text information search unit 14 (Step C1), and presents it to the user 3 (Step C2).
[0036]
FIG. 8 shows an example of a processing flow of the user information input unit 11.
[0037]
The user information input unit 11 receives user information from the user 3 (Step D1) and passes the user information to the user information analysis unit 12 (Step D2). The user information has the following two types.
[0038]
The first is information on what text information the user 3 wants, such as "I want an article related to semiconductors", which the user 3 pre-designates for the information filtering system 1. Here, this type of information is referred to as initial setting information.
[0039]
The second is relevance feedback information in which the user 3 has determined the suitability of the text information presented by the system. This is for modifying the profile 10 so that the article presented to the user 3 is more suitable for the request of the user 3. Specifically, for example, information in the form shown in FIGS. 9 and 10 is used. Can be considered.
[0040]
FIG. 9 is an example of information obtained by performing a three-level evaluation of “necessary / slightly necessary / unnecessary” for each piece of text information presented by the user 3.
[0041]
In this example, for example, a useful word is extracted from words included in “text 1” and “text 2” determined to be “necessary” and added to the profile 10, thereby performing processing such as the following. May have more filtering results.
[0042]
Further, as a modification of FIG. 9, the suitability may be determined not in units of text but in units of components of text. For example, it is conceivable to extract a sentence or paragraph of the text presented to the user 3 and feed back information such as “this part was useful” to the system. Further, in FIG. 9, three-stage evaluation is performed. However, the evaluation may be extended to evaluate suitability by numerical values.
[0043]
FIG. 10 is an example of relevance feedback information in a natural language given by the user 3 via a keyboard or the like. Although an article related to “semiconductor manufacturing equipment” was presented at the top of the filtering result, if the user 3 is not very interested in “semiconductor manufacturing equipment”, a request as shown in FIG. From the next time, it is conceivable that the score of a text including the word "semiconductor manufacturing apparatus" may be reduced. In the case of a search method in which importance is assigned to search words in the profile 10, a request such as "put emphasis on DRAM over flash memory" is issued as shown in FIG. Thus, it is possible to change the importance of the search word in the profile.
[0044]
FIG. 11 shows an example of a processing flow of the user information analysis unit 12.
[0045]
The user information analysis unit 12 determines whether or not information has already been stored in the user information storage unit 13 (steps E1 and E2), and performs two operations according to the determination results. If the user information storage unit 12 is empty (Y in step E2), an initial selection information analysis process is performed (step E3), and if not, a relevance feedback information analysis process is performed (step E4).
[0046]
FIG. 12 shows an example of the flow of the initial selection information analysis processing.
[0047]
In the initial selection information analysis process, the user information analysis unit 12 analyzes the initial selection information with reference to an analysis dictionary 101 and the like necessary for language analysis prepared in advance, and extracts words and expressions representing the selected topic. Specify (step F1). Next, in order to obtain knowledge on a search word related to the selected topic and its synonyms, a search word is determined with reference to topic knowledge prepared in advance (step F2). Then, the profile 10 as shown in FIG. 6 is described using the determined search word. The generation of such a common profile 10 may be performed automatically by the system, or may be performed semi-automatically by, for example, causing the user 3 to modify the generated profile 10. The generated common profile 10 is stored in the user information storage unit 13 (Step F3).
[0048]
FIG. 13 shows an example of the topic knowledge described above.
[0049]
FIG. 13A is an example of topic knowledge describing the relationship between search terms. For example, since it is described that “ROM” and “RAM” are included in the lower concept of “semiconductor memory”, when describing the profile 10 for the topic “semiconductor memory”, “ROM” or “RAM” Can be used as a search term. FIG. 13B is an example of topic knowledge regarding synonym information. By utilizing such knowledge and registering not only “ROM” but also a synonym of “read only memory” as a search word in the profile, it is possible to perform a search with few misses.
[0050]
FIG. 14 shows an example of the flow of the relevance feedback information breaking process.
[0051]
In the relevance feedback information breaking process, the user information analysis unit 12 analyzes the relevance feedback information of each user 3 with reference to an analysis dictionary 101 and the like necessary for language analysis prepared in advance (step G1). -Step G4). Next, feedback information to be reflected on the profile 10 common to the user 3 is selected from among these pieces of information (step G5). Then, the common profile is updated using a relevance feedback method disclosed in, for example, a document ("SMART Information Retrieval System", edited by Gerald Sulton, translated by Shinji Kenko, Planning Center) and stored in the user information storage. It is stored in the unit 13 (step G6). When a new word is added at the time of updating the profile, the topic knowledge 102 on the topic may be referred to in order to obtain information on a related word of the word.
[0052]
The difference between the conventional relevance feedback processing and the relevance feedback information analysis processing in this embodiment is that the former only reflects feedback information obtained from a single user in a single profile, while the latter That is, information to be reflected on the common profile 10 is selected from feedback information obtained from the user, and feedback is performed.
[0053]
FIG. 15 shows an example of relevance feedback information obtained from a plurality of users.
[0054]
In this example, three users 3 who are interested in the same topic share the information filtering system 1, and the relevance determination result for the three pieces of text information commonly presented to them is “O” or It is indicated by "x". Since the users A, B, and C are interested in the same topic, it is generally considered that the determination result of each user 3 matches, such as the compatibility determination result for “text 1” or “text 3”. . However, due to a slight discrepancy in the interests of the individual users 3 and a difference in knowledge in the field, or a difference in mood or busyness at the time of discrimination, a discrepancy appears as in the determination result of “Text 2” in the figure. Can be considered. In this example, the user A and the user B determine “text 2” as “useful”, but the user C determines “unnecessary”. In such a case, for example, it is considered that highly reliable feedback can be performed if feedback processing is performed by determining that “text 2” is useful in majority rule. Further, as a modified example, if information is given to the system in advance that the user A among the users A, B, and C in FIG. Relevance feedback can also be provided.
[0055]
FIG. 16 shows a modification of the relevance feedback information obtained from a plurality of users.
[0056]
Such feedback information can be obtained, for example, by extracting words frequently appearing in the text determined by each user 3 to be "useful" or a part thereof. In this example, the words “64 mega DRAM” and “semiconductor joint venture” frequently appear in the text specified by the user A or a part thereof. The same applies to user B and user C. Terms such as "semiconductor joint venture", "memory patent" and "semiconductor manufacturing equipment" are included only in feedback information of one user 3, whereas "64M DRAM" is used by all users. Included in feedback information. In such a case, “64 mega DRAM” is the most important information to be reflected in the common profile 10 because it is a word included in a text or a portion thereof that is determined to be useful by all users. it is conceivable that. Therefore, when reflecting only “64 Mega DRAM” on the common profile or reflecting all the words shown in FIG. 16 on the common profile 10, the weight of “64 Mega DRAM” is set higher than other words. If this is the case, highly reliable relevance feedback based on the majority decision of the plurality of users 3 can be performed. As a specific reflection method on the profile 10, for example, “64 mega DRAM” is added to “(condition 5)” in FIG. 6, or “(condition 1)” to “(condition 5)” is changed. For example, "64 Mega DRAM" may be added to all of them.
[0057]
In the above, relevance feedback based on the majority decision of the plurality of users 3 has been described. However, it is also possible to determine information to be reflected on the common profile 10 from feedback information of the plurality of users 3 by other policies. For example, for each piece of text information, if there is at least one user 3 in the group who finds it useful, the text information may be used for feedback. In this case, in the example of FIG. 15, it can be determined that “text 1” and “text 2” are useful, as in the case of majority decision. It is considered that such a policy may be effective in a case where the group requires filtering that is as leak-free as possible as a whole. Similarly, for each text, if there is at least one user 3 who does not need it, it is conceivable to adopt a policy that the text is not used for relevance feedback. In this case, in the example of FIG. 15, only “text 1” is adopted as information suitable for the relevance feedback. Therefore, the user 3 may be allowed to sequentially designate the above-described policies, and the method of selecting information to be used for relevance feedback may be switched according to the designated policies.
[0058]
(2nd Embodiment)
Next, a usage form of the information filtering system according to the second embodiment will be described with reference to FIG.
[0059]
The difference between the first embodiment and the present embodiment is that the former has one profile 10 common to a plurality of users 3 and presents one filtering result using this to all the users 3. The latter has both a common profile 10 and a profile 18 for each user 3, and ultimately presents information customized for each user 3 to each user. The common profile 10 in the present embodiment reflects an information request common to all users 3, and the user profile 18 reflects an information request unique to the user 3. For example, even if users A to E in the figure are groups that are conducting research on the same theme, and first register in the information filtering system 1 as "I am interested in text information on semiconductors", a new text will be added over time. As information is taken in, there is a possibility that the demands of individual users 3 may change in small points. In the present embodiment, in order to deal with such a problem, the relevance feedback information obtained from the plurality of users 3 is divided into information to be reflected in the common profile 10 and information to be reflected in the user profile 18.
[0060]
The device configuration of the present embodiment is the same as that of the first embodiment shown in FIG. The functions of the text information analysis unit 16 and the user information input unit 11 are the same as those described in the first embodiment. Here, only differences from the first embodiment will be described.
[0061]
FIG. 18 shows an example of the processing flow of the text information search unit 14 in the present embodiment.
[0062]
The difference between the present embodiment and the flow of the process of the text information search unit 14 of the first embodiment is that the latter performs a search using a combination of the common profile 10 and the individual user profiles 18 as a search condition, and the individual That is, a search result is obtained for each user 3. That is, for the user A, a search condition is created from the common profile 10 and the user profile 18 for the user A (step H4), and a search is performed according to the search condition (step H5), and the searched text is searched. The information is passed to the text information output unit 15 (Step H6). Similarly, for the user B, a search condition is created from the common profile 10 and the user profile 18 for the user B, a search is performed, and the searched text information is passed to the text information output unit 15. This is performed for all users 3.
[0063]
FIG. 19 shows an example of the processing flow of the text information output unit 15 in the present embodiment.
[0064]
The difference between the flow of processing of the text information output unit 15 of the first embodiment and this embodiment is that the latter outputs a filtering result for each user 3.
[0065]
Since only the relevance feedback information analysis processing of the processing flow of the user information analysis unit 12 in the present embodiment is different from that of the first embodiment, this will be described below.
[0066]
FIG. 20 shows an example of the flow of the relevance feedback information analysis processing in the present embodiment.
[0067]
The difference between the flow of the relevance feedback information analysis processing of the first embodiment and the present embodiment is that in the latter, the information reflecting the relevance feedback information obtained from a plurality of users 3 in the common profile 10 and the individual user The point is that feedback is performed after sorting to information reflected in the profile 18. This will be described with an example in which feedback information as shown in FIG. 16 is obtained.
[0068]
FIG. 16 shows terms such as “64 mega DRAM” and “semiconductor joint venture” extracted from information obtained by three users 3 who are interested in “semiconductor” individually conduct feedback feedback. I have. In this example, since the word "64 mega DRAM" appears in common among the feedback information of the three people, this word is a topic of "semiconductor" rather than a word expressing the individual users' detailed preferences. It may be useful information to modify the original search conditions for the search to be more current. For example, in the conventional text, only the word "16 mega DRAM" appeared, but when a new "64 mega DRAM" was developed and this word became commonly used in the text, is there. In such a case, the word "64 mega DRAM" is newly registered in the common profile 10, that is, the original search condition for obtaining general text information on "semiconductor". More generally, words commonly appearing in feedback information of many users 3 are used for updating the common profile 10. For example, of the words shown in FIG. 16, words that appear in two or more pieces of feedback information are used for feedback to the common profile 10, and the remaining words are used for feedback to individual user profiles 18. Only the "64 mega DRAM" is fed back to the common profile 10 (step J5).
[0069]
After the feedback on the common profile 10 is performed, a feedback process on each user profile 18 is performed (step J7). In the example illustrated in FIG. 16, the term “semiconductor joint venture” is obtained as feedback information unique to the user A. Therefore, the “semiconductor joint venture” is registered in the user profile 18 of the user A. Similarly, “memory patent” and “SRAM” are registered in the user profile 18 of the user B, and “semiconductor manufacturing equipment” is registered in the user profile 18 of the user C.
[0070]
As described above, if the feedback to the common profile 10 and the feedback to the user profile 18 are separately performed, it is necessary to update the general general search conditions required over time, Customization can be done in one framework. In the case where a filtering result customized for each user 3 is presented to a plurality of users 3, the conventional system has only the profiles 18 for each user 3 and updates them individually. Then, since the common feedback information is reflected only in one common profile 10 and only the difference from the common profile 10 needs to be described in the user profile 18, it is more efficient from the viewpoint of the processing amount and the storage capacity. It is.
[0071]
(Display of search results classified for common profile 10 and user profile 18)
In the present embodiment, the common profile 10 and the user profile 18 coexist, but it is also conceivable to present information on this to the user.
[0072]
FIG. 21 shows an example of the common profile 10 and the user profiles 18 for the two users 3.
[0073]
In the common profile 10, search conditions such as (condition 1), (condition 2), (condition 3),... And search terms corresponding to the search conditions are described. The user profile 18 for the user A includes Search conditions such as (condition A1), (condition A2), and (condition A3) and search words corresponding thereto are described. The search for the user A is performed using both of them. Suppose that three articles for user A are obtained. Of these, “Article 1” is an article that satisfies (condition 1) in FIG. 21, “Article 2” is an article that satisfies (condition 2) and (condition 3), and “article 3” is , (Condition A1) and (Condition A3), an article presentation method as shown in FIG. 22 can be considered.
[0074]
As shown in the figure, the search results are classified into “articles that match the search conditions of the common profile” and “articles that match the search conditions of your personal profile”, and “article 1” and “article 2”. "Is presented in the former, and" Article 3 "is presented in the latter. Thereby, the user 3 can easily know whether the presented article is adapted to the interests common to the group or to the personal interests.
[0075]
FIG. 23 is a modification of FIG. In this example, information on the contribution of the common profile 10 and the contribution of the user profile 18 is added to each article. For example, “Article 1” is given 30 partial points because it satisfies three of the search conditions of the common profile 10 and 70 points because it satisfies seven of the search conditions of the user profile 18. Information that a partial point has been given is displayed as shown. In this example, it can be seen that "article 1" is the one that most suits the personal interest, and "article 3" is the one that most suits the interest common to the group.
[0076]
Information about the ratio between the size of the common profile 10 and the size of the user profile 18 may be provided together with the search results as shown in FIGS. Here, the size of the profile refers to a value such as the number of search words and search conditions in the profile and the sum of the weights of the search words. For example, a user with "words in common profile: number of words in user profile" being 50:20, may have other members of the group who reflect much more personal interests than users whose words are 50: 3. Shows that different search results are obtained.
[0077]
(Third embodiment)
Next, a usage form of the information filtering system according to the third embodiment will be described with reference to FIG.
[0078]
This embodiment is the same as the first and second embodiments, except that there is no need to always have a plurality of users 3.
[0079]
FIG. 25 shows a device configuration in the present embodiment.
[0080]
The difference from the first and second embodiments is that a user information management unit 20 is provided between the user information analysis unit 12 and the user information storage unit 13. The functions of the text information analysis unit 16, the text information output unit 15, and the user information input unit 11 are the same as in the first embodiment. The function of the user information analysis unit 12 is the same as that shown in FIGS. 11, 12 and 14 except that the analyzed user information is passed to the user information management unit 20 instead of being directly stored in the user information storage unit 13. And different. Therefore, here, only the functions of the text information search unit 14 and the user information management unit 20 different from those of the first embodiment will be described.
[0081]
FIG. 26 shows an example of the flow of processing of the text information search unit 14 in the present embodiment.
[0082]
Here, a description will be given of an example of a search method in which a similarity between the profile 19 and the text information is calculated, and a search is performed by ranking the similarity. The text information search unit 14 performs the following processing on all newly arrived text information.
[0083]
First, the similarity between the profile 19 and the text information is calculated according to a normal search method (step K3). Next, the current time is added to the search conditions in the profile 19 that match the text information in the above-described similarity calculation (step K4). Similarly, the current time is added to the search words in the profile 19 that match the text information in the above-described similarity calculation. In the present embodiment, these times added to the search condition and the search word are referred to as the latest matching time. When the above processing for all text information is completed (Y in step K6), the text information search unit 14 ranks the text information in order of similarity, and passes the result to the text information output unit 15 (step K7). The only difference between the function of the text information search unit 14 in the present embodiment and that in the first embodiment is that the former allows the current time to be entered in the search conditions and search words in the profile 19.
[0084]
FIG. 27 shows an example of a search condition and a search word to which the latest matching time is added.
[0085]
As specific search conditions and search terms, for example, those shown in FIG. 4 can be considered. Here, there are four search conditions, and (search word A) to (search word F) are specified in each condition. Then, the latest matching time is given to each search condition and search word. For example, if certain text information includes (search term A) and (search term B) and these search terms satisfy (search condition 1) at “12:25 on March 6,” 1), (search term A) and (search term B) each have a latest matching time of “12:25 on the 6th”. Next, assuming that another piece of text information includes (search term A), and this satisfies (search term 1) at "11:30 on 9th," (search term 1) and (search term B) Is added with the latest matching time of “11:30 at 9th”. As a result, the profile 19 becomes as shown in FIG. As described above, the latest matching time indicates when the search condition or search word has recently been matched. In other words, the fact that this time is old indicates that the search condition or the search word has not been used recently, and that this time is new means that the search condition or the search word is effectively used in the search even now. It shows that it is being done.
[0086]
FIG. 28 illustrates an example of a processing flow of the user information management unit 20 in the present embodiment.
[0087]
The user information management unit 20 first receives from the user information analysis unit 12 the profile 19 to which the latest matching time as shown in FIG. Next, the latest matching time added to each search condition and search word in the profile 19 is compared with the current time. Then, search conditions and search terms that do not match any text information for a certain period are deleted from the profile 19, and the profile thus modified is stored in the user information storage unit 13. For example, it is assumed that the current time is “12:30 on the 26th”, and that the user information management unit 20 deletes search conditions and search terms that do not match for 20 days or more. At this time, when the profile 19 as shown in FIG. 27 is passed to the user information management unit 20, the latest matching time that is not used for 20 days or more is “6:12:25” (search condition 1). (Search term A) and (search term C) of (search condition 2). Therefore, if these are deleted from the profile 19, an updated profile 19 as shown in FIG. 29 is obtained. Alternatively, as a modified example, when a weight is given to each search condition or search word in the profile 19, it is possible to reduce only the weight without deleting the old search word.
[0088]
Through the above processing, new search terms are added to the profile by relevance feedback, while old search conditions and search terms that are no longer useful for search are deleted from the profile. Thus, it is considered that the profile can be kept up to date.
[0089]
(Fourth embodiment)
Next, a fourth embodiment of the present invention will be described.
[0090]
The usage mode and the device configuration of the information filtering system 1 in the present embodiment are the same as those in the third embodiment. The functions of the text information analysis unit 16, the text information search unit 14, the text information output unit 15, and the user information input unit 11 are the same as in the first embodiment. Therefore, only the functions of the user information analysis unit 12 and the user information management unit 20 will be described here.
[0091]
FIG. 30 illustrates an example of a processing flow of the user information analysis unit 12 in the present embodiment.
[0092]
This substantially corresponds to that shown in FIGS. 11, 12 and 14 in the first embodiment. However, when the profile 19 is first generated or the profile 19 is updated by the relevance feedback, each search condition The second embodiment differs from the first embodiment in that the current time is added to each search word. That is, in the initial selection information analysis process (Y in step M2), the time at which they were generated is added to each search condition and each search word in the profile 19 (step M5). In the analysis process (N in Step M2), the time is added to the search condition and the search word newly added by the relevance feedback method (Step M7). These times added to the search condition and the search word are referred to as profile registration times here. For example, in a certain profile 19, it is assumed that the time of (search condition 1) is “12:25 on the 6th,” and only (search word A) is specified as a search word corresponding to this condition. Then, it is assumed that (search term B) is newly added to (search condition 1) by “relative feedback” at “11:30 on 9th”. At this time, the profile registration time of “9:30 11:30” is added to (search term B), and the profile registration time of (search condition 1) is also updated to “9th 11:30” at the same time. The new profile is as shown in FIG. As described above, the profile registration time indicates when the search condition or search word was registered in the profile.
[0093]
FIG. 31 shows an example of the flow of processing of the user information management unit 20 in the present embodiment.
[0094]
First, the user information management unit 20 receives the profile 19 to which the profile registration time is added as shown in FIG. 27 from the user information analysis unit 20 (step N1). Next, the current time is compared with the profile registration time added to each search condition and search word in the profile 19 (step N2). Then, the search condition and the search word which have been registered for the profile 19 for a certain period of time are deleted from the profile 19 (step N3), and the profile 19 thus modified is stored in the user information storage unit 13 (step N4). For example, it is assumed that the current time is “12:30 on the 26th”, and that the user information management unit 20 deletes the search condition and the search word 20 days or more after the registration. At this time, when the profile 19 as shown in FIG. 27 is passed to the user information management unit 20, the latest matching time that has been registered for 20 days or more is “6th 12:25” (search Only (search term A) of condition 1) and (search term C) of (search condition 2). Therefore, if these are deleted from the profile, an updated profile as shown in FIG. 29 is obtained. As in the case of the third embodiment, as a modified example, when each search condition or search word in the profile 19 is weighted, the old search condition and search word are deleted. Instead, it is also conceivable to reduce only the weight.
[0095]
The difference between the third embodiment and the present embodiment is that the former adds the latest time at which the search condition and the search word matched the text to the profile at the time of the search, whereas the latter adds the search condition and the search word to the text. The time registered in the profile is added at the time of relevance feedback. In the present embodiment, as in the third embodiment, new search terms are added to the profile by relevance feedback, while old search conditions and search terms that are no longer useful for search are deleted from the profile. Thus, it is considered that the profile can be kept up to date.
[0096]
The information filtering device according to the present invention described in the first to fourth embodiments is not built only in a distributed network environment, but can also be built on a personal computer operating in a single environment. .
[0097]
【The invention's effect】
As described above in detail, according to the present invention, the relevance feedback information individually performed by the members of the group is reflected in the common profile, so that the information filtering system can be cooperatively customized. It becomes. Further, the reliability of the feedback information is enhanced because the determination is based on the suitability judgment of a plurality of members. Furthermore, information common to the members is used to update the common profile as much as possible, and information unique to the other members is used to update the profile for each member. Therefore, the processing amount and storage capacity of the system can be reduced. Further, since old search conditions and search words are automatically deleted from the profile, a search using the latest knowledge can always be performed.
[Brief description of the drawings]
FIG. 1 is a conceptual diagram showing a use form of an information filtering system according to a first embodiment of the present invention.
FIG. 2 is a diagram showing a device configuration of an information filtering system according to first and second embodiments of the present invention.
FIG. 3 is a diagram showing an example of a processing flow of a text information analysis unit according to the first and second embodiments of the present invention.
FIG. 4 is a diagram showing an example of text information expressed by a text information analysis unit according to the first and second embodiments of the present invention.
FIG. 5 is a diagram showing an example of a processing flow of a text information search unit according to the first embodiment of the present invention.
FIG. 6 is a diagram showing an example of a common profile according to the first and second embodiments of the present invention.
FIG. 7 is a diagram showing an example of a processing flow of a text information output unit according to the first embodiment of the present invention.
FIG. 8 is a diagram showing an example of a processing flow of a user information input unit according to the first and second embodiments of the present invention.
FIG. 9 is a diagram showing an example of relevance feedback information in the first and second embodiments of the present invention.
FIG. 10 is a diagram showing an example of relevance feedback information in the first and second embodiments of the present invention.
FIG. 11 is a diagram showing an example of a processing flow of a user information analysis unit in the first and second embodiments of the present invention.
FIG. 12 is a diagram showing an example of the flow of an initial selection information analysis process in the first and second embodiments of the present invention.
FIG. 13 is a diagram showing an example of topic knowledge in the first and second embodiments of the present invention.
FIG. 14 is a diagram showing an example of the flow of relevance feedback information analysis processing according to the first embodiment of the present invention.
FIG. 15 is a diagram illustrating an example of relevance feedback information obtained from a plurality of users according to the first embodiment of the present invention.
FIG. 16 is a diagram showing a modification of the relevance feedback information obtained from a plurality of users in the first and second embodiments of the present invention.
FIG. 17 is a conceptual diagram showing a use form of the information filtering system according to the second embodiment of the present invention.
FIG. 18 is a diagram illustrating an example of a processing flow of a text information search unit according to the second embodiment of the present invention.
FIG. 19 is a diagram showing an example of a processing flow of a text information output unit according to the second embodiment of the present invention.
FIG. 20 is a diagram showing an example of the flow of relevance feedback information analysis processing according to the second embodiment of the present invention.
FIG. 21 is a diagram illustrating an example of a common profile and a user profile for two users according to the second embodiment of the present invention.
FIG. 22 is a diagram showing an example of an article presentation method according to the second embodiment of the present invention.
FIG. 23 is a diagram showing a modification of the article presentation method according to the second embodiment of the present invention.
FIG. 24 is a conceptual diagram showing a use form of the information filtering system according to the third and fourth embodiments of the present invention.
FIG. 25 is a diagram showing a device configuration of an information filtering system according to third and fourth embodiments of the present invention.
FIG. 26 is a diagram showing an example of a processing flow of a text information search unit according to the third embodiment of the present invention.
FIG. 27 is a diagram showing an example of a search condition and a search word added with the latest matching time / profile registration time according to the third and fourth embodiments of the present invention.
FIG. 28 is a diagram showing an example of a processing flow of a user information management unit according to the third embodiment of the present invention.
FIG. 29 is a diagram showing an example of a profile updated based on the latest matching time according to the third embodiment of the present invention.
FIG. 30 is a diagram showing an example of a processing flow of a user information analysis unit according to the fourth embodiment of the present invention.
FIG. 31 is a diagram showing an example of a processing flow of a user information management unit according to the fourth embodiment of the present invention.
[Explanation of symbols]
REFERENCE SIGNS LIST 1 information filtering system 2 text information source 3 user 10 shared profile 11 user information input unit 12 user information analysis unit 13 user information storage unit 14 text information search unit 15 ... text information output unit, 16 ... text information analysis unit, 17 ... text information storage unit, 18 ... user profile, 19 ... profile, 20 ... user information management unit, 101 ... analysis dictionary, 102 ... topic knowledge,

Claims

In an information filtering device that selects desired text information from a plurality of text information and presents it to a user,
First holding means for holding search conditions for each group constituted by a plurality of users;
Second holding means for holding search conditions for each user;
Means for selecting the text information matching the search condition held in the first holding means and the second holding means;
Means for presenting the text information selected by this means to users constituting the group;
Means for collecting relevance feedback information which is a user's evaluation result on the text information presented by the means,
The relevance feedback information for all users collected by this means is analyzed to extract feedback information, which is a word to be reflected in the search condition held in the first holding means or the second holding means. Means for distributing the extracted feedback information into information to be reflected in the search condition held in the first holding means and information to be reflected in the search condition held in the second holding means,
The search condition held in the first holding unit is corrected based on the feedback information extracted by this unit and sorted as to be reflected in the search condition held in the first holding unit, Means for modifying the search condition held in the second holding means based on feedback information sorted as to be reflected in the search condition held in the second holding means. Information filtering device.