WO2003046765A1 - Procede d'extraction automatique de mot associe - Google Patents
Procede d'extraction automatique de mot associe Download PDFInfo
- Publication number
- WO2003046765A1 WO2003046765A1 PCT/JP2002/012504 JP0212504W WO03046765A1 WO 2003046765 A1 WO2003046765 A1 WO 2003046765A1 JP 0212504 W JP0212504 W JP 0212504W WO 03046765 A1 WO03046765 A1 WO 03046765A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- words
- word
- important
- database
- list
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
Definitions
- the present invention relates to an automatic related word extraction method for automatically extracting words closely related to a word specified by a user based on statistical information of words included in a database.
- the present invention relates to an automatic related word extraction method and a related word automatic extraction device that enable extraction of technical terms appearing in a specific field designated by a user, new words and buzzwords, which are not described in the above. Background art
- the conventional related word automatic extraction device has an existing thesaurus dictionary as its internal component, and simply searches the thesaurus specified by the user from the thesaurus dictionary and displays the result as a related word extraction result.
- conventional related word automatic extraction devices have the disadvantage that technical terms, new words, and buzzwords that are not described in existing thesaurus dictionaries cannot be extracted regardless of their importance. there were.
- the conventional method for automatically extracting related words from statistical data based on data without using an existing thesaurus dictionary uses only the appearance frequency of words that appear alone Is common.
- the present invention has been made to solve the above-mentioned problems of the prior art, and its purpose is to appear in a specific field specified by a user which is not described in a general existing thesaurus dictionary. Automatic extraction of related terms, new words and buzzwords, and a related word automatic extraction method and related words that can accurately and accurately extract important words closely related to the words specified by the user.
- An automatic extraction device is provided.
- the first invention uses a group of documents in a field designated by a user as a database, selects important words that are words of high importance from the documents in the database, and Alternatively, it is characterized by using an automatic related word extraction method that calculates the degree of relevance between important words using statistical information for pairs of important words.
- the importance refers to the characteristic of the content indicated by the document or the degree to which the characteristic is well represented in the genre of the document.
- the related words of each field can be automatically extracted. It is characterized by.
- related words specific to the field such as being related words in one field but not in another field
- related words specific to the field such as being related words in one field but not in another field
- existing thesaurus dictionaries Users can set their own fields regardless of the field, so related words can be extracted according to the level of the field set.
- the database in addition to the configuration of the first or second invention, can be updated / added at any time, and the difference data is sequentially reflected at the time of automatic extraction of related words. It is characterized by having made it.
- a fourth aspect of the present invention in addition to the configuration of any one of claims 1 to 3, it is determined whether or not the document group in the database is the same document using one piece of document header information. It is characterized in that when the same document is included, one document is left and another same document is removed.
- the important words are compound words created by dividing the document in the database into parts of speech and dividing them into morphemes.
- important words are words of speech which are expected to represent characteristics for each document in the database.
- the words excluded from the important words are retained as an exclusion list, and the words in the exclusion list after extracting important words are excluded from the important words. It is characterized by doing.
- an important word having the same meaning is held as a same word list, and the words in the same word list are extracted when extracting the important words. It is characterized in that statistical information is collectively stored. According to this, in addition to the effect of any one of claims 1 to 7, it is possible to improve the extraction accuracy of important words.
- the statistical information includes a total number of appearances in the database and an important word in the database. It is characterized by the ratio of the number of documents to be processed.
- the statistical information includes, in addition to the single occurrence frequency of an important word included in a document in the database, the occurrence frequency of a plurality of important words within a certain range. It is characterized by being used.
- the meaning can be more accurately determined by a plurality of pairs of important words, and as a result, related word extraction accuracy can be improved.
- a surface expression included in a document in the database is automatically extracted, and upper and lower important words automatically constructed from the surface expression. It is characterized by using a hierarchical relationship. According to this, in addition to the effect of claim 9, it is possible to remove noise caused by a plurality of unrelated important words accidentally appearing. The extraction accuracy can be improved.
- a plurality of different search condition expressions are created, and the plurality of different search condition expressions are generated.
- the database section according to the first aspect stores a document group in a field designated by a user, and the database section includes a database section.
- An important word analysis unit that extracts and selects important words to be included, a counting unit that obtains statistical information on the important words selected by the important word analysis unit and information about the hierarchical relationship of the important words, and a count that is generated by the counting unit It comprises a related word extraction unit that calculates the degree of relevance between important words using a list, and is characterized in that a series of processes use the related word automatic extraction method according to claim 1.
- the user can accurately extract related words desired by the user, such as technical terms, new words, and buzzwords, without being aware of the internal structure of the related word automatic extraction device.
- the fourteenth invention automatically extracts a plurality of important words using not only the number of appearances of the important words included in the document during the evening but also the number of occurrences of the plurality of important words within a certain range.
- documents in the database are read one by one, key words are searched from the document, and another key word is found within a predetermined range from the key words searched. Search whether there is any When an important word present in the range is searched, the important word pair is sequentially stored in the count list, the important word pair is searched from the already created count list, and the same important word pair is already counted.
- the count list is updated by adding 1 to the count of the number of occurrences. If it is not found in the count list, the count of the important word pair is set to 1 and saved in the count list
- a fifteenth invention is directed to an important word upper / lower hierarchical relationship extraction program that automatically extracts a surface expression included in a document in a database, and uses an upper / lower hierarchical relationship of an important word automatically constructed from the surface expression.
- An important word upper / lower hierarchical relationship extraction program that automatically extracts a surface expression included in a document in a database, and uses an upper / lower hierarchical relationship of an important word automatically constructed from the surface expression.
- FIG. 1 is a block diagram of a related word automatic extraction device according to an embodiment of the present invention.
- FIG. 2 is a conceptual diagram of an important word list used in the related word automatic extraction device according to the embodiment.
- FIG. 3 is a conceptual diagram of a count list used in the automatic related word extracting apparatus according to the embodiment.
- FIG. 4 is a conceptual diagram of a relevance judgment list created based on the count list of FIG. 3 and the keyword list of FIG.
- FIG. 5 is a flowchart showing a procedure for extracting a plurality of important words within a certain range in the method for automatically extracting related words according to the embodiment.
- FIG. 6 is a flowchart showing a procedure for extracting upper and lower hierarchical relationships of important words in the related word automatic extraction method according to the embodiment.
- FIG. 1 is a block diagram of a related word automatic extraction device according to an embodiment of the present invention.
- the automatic related word extraction device includes a database section 1 for storing documents in a field designated by a user, an important word analysis section 2 for extracting and selecting important words contained in the database section 1.
- the counting unit 3 that obtains statistical information on the important words selected by the important word analyzing unit 2 and the hierarchical relationship information of the important words, and the relationship between the important words using the count list generated by the counting unit 3.
- It has a related word extraction unit 4 that calculates the degree of importance, and selects important words that are words of high importance from the documents in the database 1 Processing to calculate the degree of relevance between key words is performed using statistical information on the pair.
- the database unit 1 determines the same document from the input document group, and, when a plurality of the same documents are included, the same document determination function unit 1 1 that leaves one document and removes another same document. And a database 12 for storing the documents from which the same document has been removed by the same document determination function unit 11.
- the documents in the database 12 are patent documents, extract the “name of the applicant”, “name of the invention” and “name of the inventor” from the header of the patent document, and (1) The names of the applicants are the same. (2) The names of the inventions are the same. (3) The number of inventors is the same, and each of the names of the inventors is the same. Are all the same (in any order). All documents that meet the above conditions (1) to (3) are regarded as the same document.
- the important word analysis unit 2 includes a morphological analysis unit 21 and an important word extraction unit 22.
- the morphological analysis unit 21 divides the document in the data base into parts of speech by morphological analysis and acquires part of speech information.
- the key word extraction unit 22 creates a compound word by performing compound word processing such as combining continuous nouns with the morphemes divided by the morphological analysis unit 21 into parts of speech, for example.
- the compound word is stored as an important word in the important word list together with the part of speech information and the statistical information.
- C Important words are limited to compound words created by the above method
- the part of speech of words that are considered to characterize the content of each document in the database 12 such as common nouns other than compound words, proper nouns, undefined words, etc.
- This exclusion list may include words to be excluded as long as they partially match, in addition to words to be completely matched for each morpheme.
- key words with the same meaning are stored in the same word list, and when extracting important words, statistical information on the words in this key word list is saved together to extract important words. Accuracy can be improved.
- Figure 2 is a conceptual diagram of the important word list.
- the “statistical information” to be stored in the keyword list includes the number of occurrences 25 of the keyword 23 in the database, and the number of documents 24 containing the keyword in the database. Use proportions. These are information that is the basis of various statistics used in the counting unit 3 and the related word extracting unit 41 later.
- a plurality of different search condition expressions corresponding to each important word are created, and the plurality of different search condition expressions are super-parallel having a plurality of different processors. It is set separately on the plurality of different processors of the computer, and a full text search is performed simultaneously and in parallel with the plurality of different search condition expressions for the document group stored in the database 12.
- the results obtained can be used.
- the number of results that match each search condition expression is the number of documents that include each important word in the database 12.
- the accuracy of the statistical information can be maintained by performing the full-text search each time the important word analysis unit 2 performs the processing.
- the massively parallel computer incorporates thousands to tens of thousands of processors (hereinafter collectively referred to as a pipeline) so that a plurality of different search condition expressions can be simultaneously set in the pipeline. And these massive programs A full-text search is performed by simultaneously operating the speech processor and performing multiple search conditions and data-based matching. If a document that matches the search condition is found as a result of the matching, it has a function that regards the document as a hit.
- the massively parallel computer is desirably a device such as a full-text search engine (for example, FDF (registered trademark) 4 TT ext Finder) manufactured by Paracel Corporation. Good.
- the counting unit 3 includes an extracting unit 31 for extracting a plurality of important words within a certain range, and an extracting unit 32 for extracting a hierarchical relationship between important words.
- the user selects in advance either one of the extraction unit 31 for a plurality of important words within a certain range or the extraction unit 32 for the hierarchy of important words, and the user selects one. Only the performed processing is performed.
- the extraction unit 31 for a plurality of important words within a certain range uses the important words extracted by the important word analysis unit 2 as a reference, and when there is another important word within a certain range defined in advance from the reference. An important word is defined, and the number of occurrences of the plurality of important words is counted and saved as a count list.
- the procedure for extracting multiple important words is shown in the flowchart of FIG. 5, and the details will be described later.
- the extraction unit 32 of the upper and lower hierarchical relations of important words defines in advance the surface expression in which the relation between the upper and lower terms is clearly expressed, and includes the important words extracted by the important word analysis unit 2.
- the surface expression is extracted.
- the important words in the extracted surface expression are defined as upper and lower important words, and the count of the number of occurrences is stored as a count list.
- the procedure for extracting the hierarchical relationship of key words is shown in the flowchart of Fig. 6, and the details will be described later.
- the related word extracting unit 4 includes a related word extracting unit 41.
- the related word extraction unit 41 performs related word determination based on the count list created by the counting unit 3. For example, to determine dissimilarity between two words, Inf o rm ation Rad ius (.Chr ist opher d.Manning and Hinrich S chut ze, Foundat ions 0 f St at istical Judgment indices such as Natura l Language Proscessing, The MI T Press (MAN FH 0-262-13360-1))) can be used.
- the extraction unit 31 When the extraction unit 31 is selected, a pair of important words that have a common keyword within a certain range, or when the extraction unit 32 of the upper and lower hierarchical relations of the important word is selected, the lower significant words are common.
- the key of the key word that is used can also be determined as a related word.
- Fig. 3 is a conceptual diagram of the count list, where ID 33 of keyword 1 and ID 34 of keyword 2 and the number of occurrences 35 of the pair of keyword 1 and keyword 2 are created as a list item. ing.
- FIG. 4 is a conceptual diagram of a relevance judgment list created based on the count list of FIG. 3 and the keyword list of FIG.
- each column and each row is an important word extracted by the important word analysis unit 2, and one of the important word pairs extracted by the counting unit 3 is arranged in a column and the other is arranged in a row. For example, for key word pairs that exist within a certain range in Fig. 5, key word A is placed in a column, and key word B is placed in a row.
- the upper important words are arranged in columns and the lower important words are arranged in rows.
- the number of each cell indicates the appearance probability. For example, in column c, row A, “probability that key word A and key word c appear within a certain range” or “probability that key word A is an upper word and key word c is a lower word”.
- related word determination a description will be given of a determination example in the case of using a determination index of Infoformat on Radius to determine the dissimilarity between two words.
- the statistic is the “dissimilarity between two words” calculated using this probability of occurrence, and is calculated for all pairs of uppercase letters in each column (A and B, A and C , A and D, ⁇ ⁇ ⁇ , B and C, B and D ' ⁇ ⁇ , C and D, ⁇ ⁇ ⁇ ⁇ ).
- the probability of occurrence of a, b, c, d, ... for A and the probability of occurrence of a, b, c, d, ... for D Is calculated as dissimilarity.
- FIG. 5 is a flowchart showing a procedure for counting the number of simultaneous appearances of a plurality of important words existing within a certain range in the related word automatic extraction method according to the embodiment of the present invention.
- the documents in the database are read one by one (step S1), and the key words extracted by the key word analysis unit 2 are searched from the documents (step S2).
- the important words to be searched here are not limited to those extracted by the important word analysis unit 2, but may be words included in a user-defined important word list defined by the user in some cases.
- a user-defined important word list in addition to words whose search condition is a perfect match, words that are searched for if they partially match May be included.
- the total number of occurrences in the database the ratio of the number of documents in which the key word is included in the database, and the number of characters are applied to the filter of the key word to be searched as necessary. You may. By applying these various filters, important words can be further narrowed down, and as a result, the accuracy of related words finally extracted can be improved.
- step S3 When an important word is searched (when YES is determined in step S3), another important word (this is referred to as an important word A) within a predetermined range from the searched important word (this is called important word A). A search is made to see if there is an important word B) (step S4).
- within a certain range means, for example, within one sentence (the range from the beginning of a sentence to the period ".”), which is defined as being close to two before and after, but not limited to this. Specify the range that is expected to represent the feature for each.
- an important word B existing within a certain range from the important word A is searched for (determined as YES in step S5), a pair of the important word A and the important word B is sequentially stored in a count list.
- the key word A and the key word B are searched for from the already created count list (step S6), and when the same pair already exists in the count list (when YES is determined in step S7) Then, the count list is updated by adding one to the count of the number of appearances (step S8).
- step S7 If it does not exist in the count list (if NO is determined in step S7), the count of the pair of the important word A and the important word B is set to 1 and is newly stored in the count list (step S9). .
- step S10 the processing from step S1 to step S9 is performed for a plurality of documents designated in advance in the database (step S10).
- step S11 the importance of the pair of important word A and important word B is determined (step S11).
- step S11 for example, a Die coefficient and a mutual information amount can be used.
- FIG. 6 is a flowchart showing a procedure for extracting upper and lower hierarchical relationships of important words in the related word automatic extraction method according to the embodiment of the present invention.
- the documents in the database are read one by one (step S21), and a surface expression described in a surface expression list created in advance is extracted from the document (step S22). .
- the surface expression to be written in the surface expression list is one in which the relation between the broader word and the lower word is clearly expressed.
- D such as A, B, C
- the upper word is D and the lower words are A, B, and C.
- step S24 the key words extracted by the key word analysis unit 2 are included in the upper word part and the lower word part in the surface expression extracted in step S22 (when YES is determined in step S23).
- a search is made as to whether or not they are to be performed (step S24).
- the important words to be searched are not limited to those extracted by the important word analysis unit 2, and may be words included in a user-defined important word list defined in advance by a user in some cases.
- the user-defined important word list may include words that are to be searched if they partially match, in addition to words for which a perfect match is a search condition.
- the searched upper and lower important word pairs are sequentially stored in the count list. I do.
- a judgment scale of the importance of the upper and lower key words a comparison of the ratio of the number of documents containing the upper and lower keywords in the database 12, a comparison of the morphemes of the upper and lower keywords,
- the upper and lower key word pairs that are always excluded are retained as upper and lower key word pair exclusion lists. The function of excluding upper and lower key word pairs in the upper and lower key word pair exclusion list may be applied as necessary.
- step S26 The upper and lower key word pairs are searched from the already created count list (step S26), and if the same pair already exists in the count list (if YES is determined in step S27), the occurrence count is counted.
- the count list is updated by 1 (step S28).
- step S27 If it does not exist in the count list (when it is determined as NO in step S27), the count of the upper and lower important word pairs is set to 1 and is newly stored in the count list (step S29).
- step S30 the processing from step S21 to step S29 is performed for a plurality of documents specified in advance in the database (step S30).
- step S 31 an upper / lower hierarchical relationship of the important words is constructed based on the statistical information in the count list and the important word list created in steps S 21 to S 30 (step S 31).
- a threshold may be set for all occurrences of the upper / lower keyword pairs in the database, and the upper / lower keyword pairs below the threshold may be excluded as necessary.
- Industrial applicability in a related word automatic extraction method for automatically extracting words closely related to a word specified by a user based on statistical information of words included in a database, a general existing thesaurus dictionary is used. , Which is not described in the field, can be used effectively as a related automatic extraction device that can implement a technical term that appears in a specific field specified by the user and a related word automatic extraction method that enables extraction of new words and buzzwords. it can.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Selon l'invention, un groupe de documents du domaine spécifié par un utilisateur est stocké dans une base de données (1). Une unité de sélection de mot important (2) sélectionne, à partir du groupe de document dans la base de données (1), un mot possédant une importance élevée. Une unité de comptage (3) crée une liste de dénombrement en tant qu'information statistique concernant un mot important ou une paire de mots importants. Selon cette liste de dénombrement, une unité d'extraction de mot associé (4) estime un degré de corrélation ente des mots importants.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2001/367472 | 2001-11-30 | ||
JP2001367472A JP3553543B2 (ja) | 2001-11-30 | 2001-11-30 | 関連語自動抽出装置、複数重要語抽出プログラムおよび重要語の上下階層関係抽出プログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2003046765A1 true WO2003046765A1 (fr) | 2003-06-05 |
Family
ID=19177212
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2002/012504 WO2003046765A1 (fr) | 2001-11-30 | 2002-11-29 | Procede d'extraction automatique de mot associe |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP3553543B2 (fr) |
WO (1) | WO2003046765A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105786991A (zh) * | 2016-02-18 | 2016-07-20 | 中国科学院自动化研究所 | 结合用户情感表达方式的中文情感新词识别方法和系统 |
US12430496B2 (en) | 2021-11-24 | 2025-09-30 | International Business Machines Corporation | Iteratively updating a document structure to resolve disconnected text in element blocks |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5161891B2 (ja) * | 2007-12-26 | 2013-03-13 | 有限会社ティ辞書企画 | 辞書システム |
KR101071700B1 (ko) | 2009-11-04 | 2011-10-11 | 동국대학교 산학협력단 | 온톨로지를 이용한 문서의 주제어 및 관련어 측정 방법 및 장치 |
JP5208193B2 (ja) * | 2010-12-28 | 2013-06-12 | ヤフー株式会社 | 関連語グラフ作成装置、関連語グラフ作成方法、関連語提供装置、関連語提供方法及びプログラム |
JP5117590B2 (ja) * | 2011-03-23 | 2013-01-16 | 株式会社東芝 | 文書処理装置およびプログラム |
US9183600B2 (en) | 2013-01-10 | 2015-11-10 | International Business Machines Corporation | Technology prediction |
JP6079361B2 (ja) * | 2013-03-27 | 2017-02-15 | 富士通株式会社 | 文書管理装置、文書管理方法および文書管理プログラム |
JP6280859B2 (ja) * | 2014-11-20 | 2018-02-14 | 日本電信電話株式会社 | 行動ネットワーク情報抽出装置、行動ネットワーク情報抽出方法及び行動ネットワーク情報抽出プログラム |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5297039A (en) * | 1991-01-30 | 1994-03-22 | Mitsubishi Denki Kabushiki Kaisha | Text search system for locating on the basis of keyword matching and keyword relationship matching |
JPH11203311A (ja) * | 1998-01-13 | 1999-07-30 | Fujitsu Ltd | 関連語抽出装置および関連語抽出方法および関連語抽出プログラムが記録されたコンピュータ読取可能な記録媒体 |
JPH11328182A (ja) * | 1998-05-20 | 1999-11-30 | Ricoh Co Ltd | 関連語自動抽出装置及び方法並びに情報記憶媒体 |
JP2000222427A (ja) * | 1999-02-02 | 2000-08-11 | Mitsubishi Electric Corp | 関連語抽出装置、関連語抽出方法及び関連語抽出プログラムが記録された記録媒体 |
-
2001
- 2001-11-30 JP JP2001367472A patent/JP3553543B2/ja not_active Expired - Fee Related
-
2002
- 2002-11-29 WO PCT/JP2002/012504 patent/WO2003046765A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5297039A (en) * | 1991-01-30 | 1994-03-22 | Mitsubishi Denki Kabushiki Kaisha | Text search system for locating on the basis of keyword matching and keyword relationship matching |
JPH11203311A (ja) * | 1998-01-13 | 1999-07-30 | Fujitsu Ltd | 関連語抽出装置および関連語抽出方法および関連語抽出プログラムが記録されたコンピュータ読取可能な記録媒体 |
JPH11328182A (ja) * | 1998-05-20 | 1999-11-30 | Ricoh Co Ltd | 関連語自動抽出装置及び方法並びに情報記憶媒体 |
JP2000222427A (ja) * | 1999-02-02 | 2000-08-11 | Mitsubishi Electric Corp | 関連語抽出装置、関連語抽出方法及び関連語抽出プログラムが記録された記録媒体 |
Non-Patent Citations (1)
Title |
---|
KAZUNORI SATO ET AL.: "Bunsho no jido bunrui ni okeru bun'ya kanrengo jisho no kosatsu", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS KENKYU HOKOKU, vol. 100, no. 439, 10 November 2000 (2000-11-10), pages 5 - 10, XP002961743 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105786991A (zh) * | 2016-02-18 | 2016-07-20 | 中国科学院自动化研究所 | 结合用户情感表达方式的中文情感新词识别方法和系统 |
CN105786991B (zh) * | 2016-02-18 | 2019-03-15 | 中国科学院自动化研究所 | 结合用户情感表达方式的中文情感新词识别方法和系统 |
US12430496B2 (en) | 2021-11-24 | 2025-09-30 | International Business Machines Corporation | Iteratively updating a document structure to resolve disconnected text in element blocks |
Also Published As
Publication number | Publication date |
---|---|
JP2003167894A (ja) | 2003-06-13 |
JP3553543B2 (ja) | 2004-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9275339B2 (en) | System and method for probabilistic name matching | |
JP5010885B2 (ja) | 文書検索装置、文書検索方法および文書検索プログラム | |
US20070179930A1 (en) | Method for ranking and sorting electronic documents in a search result list based on relevance | |
CN106383836B (zh) | 将可操作属性归于描述个人身份的数据 | |
KR20140093762A (ko) | 태그들을 문서에 자동으로 추가하는 방법, 장치 및 컴퓨터 저장 매체 | |
CN102567409A (zh) | 一种提供检索关联词的方法及装置 | |
CN113934910A (zh) | 一种自动优化、更新的主题库构建方法,及热点事件实时更新方法 | |
CN113901173A (zh) | 一种检索方法、装置、电子设备及计算机存储介质 | |
JP2010287020A (ja) | 同義語展開システム及び同義語展開方法 | |
JP5522389B2 (ja) | 類似度算出装置、類似度算出方法、及びプログラム | |
WO2003046765A1 (fr) | Procede d'extraction automatique de mot associe | |
JP4969209B2 (ja) | 検索システム | |
US7072827B1 (en) | Morphological disambiguation | |
KR20020072092A (ko) | 단락 단위의 실시간 응답 색인을 이용한 자연어 질의-응답검색시스템 | |
CN110909532B (zh) | 用户名称匹配方法、装置、计算机设备和存储介质 | |
KR20030006201A (ko) | 홈페이지 자동 검색을 위한 통합형 자연어 질의-응답시스템 | |
JP2000132560A (ja) | 中国語テレテキスト処理方法及び装置 | |
JP2004013726A (ja) | キーワード抽出装置および情報検索装置 | |
JP3249743B2 (ja) | 文書検索システム | |
JP4015661B2 (ja) | 固有表現抽出装置、方法、プログラム及びそれを記録した記録媒体 | |
JP2006227823A (ja) | 情報処理装置及びその制御方法 | |
RU2266560C1 (ru) | Способ поиска информации в политематических массивах неструктурированных текстов | |
JP2002117043A (ja) | 文書検索装置、文書検索方法およびその方法を実施するためのプログラムを記録した記録媒体 | |
JP3953967B2 (ja) | 単語抽出方法、装置、およびプログラム | |
JP2008203997A (ja) | 文書検索装置及びプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
122 | Ep: pct application non-entry in european phase |