[go: up one dir, main page]

CN118673101B - Data retrieval method, device, electronic equipment and storage medium - Google Patents

Data retrieval method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN118673101B
CN118673101B CN202411162580.4A CN202411162580A CN118673101B CN 118673101 B CN118673101 B CN 118673101B CN 202411162580 A CN202411162580 A CN 202411162580A CN 118673101 B CN118673101 B CN 118673101B
Authority
CN
China
Prior art keywords
matching
matching field
database
extended
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411162580.4A
Other languages
Chinese (zh)
Other versions
CN118673101A (en
Inventor
邵嘉豪
段强
姜凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Science Research Institute Co Ltd
Original Assignee
Shandong Inspur Science Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Science Research Institute Co Ltd filed Critical Shandong Inspur Science Research Institute Co Ltd
Priority to CN202411162580.4A priority Critical patent/CN118673101B/en
Publication of CN118673101A publication Critical patent/CN118673101A/en
Application granted granted Critical
Publication of CN118673101B publication Critical patent/CN118673101B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The data retrieval method, the device, the electronic equipment and the storage medium provided by the invention relate to the technical field of electric digital data processing, and the matching fields of the query statement are obtained through the characteristic information, so that the accurate matching according to the query intention, the key information and the context information is realized, and the accuracy of the matching field group is improved. And a matching field group is obtained according to the field-topic mapping table, so that the quick positioning of the query statement is realized, and the efficiency and the accuracy of data retrieval are improved. And the multidimensional search is carried out according to the extended matching field group, so that the search is carried out according to the entity of the matching field group, and the accuracy and the efficiency of data search are improved. The retrieval results are weighted and fused according to the weight of the extended matching field group, so that the weight condition of each retrieval result can be accurately obtained, and a user can conveniently and quickly find the retrieval data most relevant to the query statement.

Description

Data retrieval method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of electronic digital data processing technologies, and in particular, to a data retrieval method, a data retrieval device, an electronic device, and a storage medium.
Background
With the deep development of the digital age, the amount of data is increasing explosively and the data structure is also becoming increasingly complex. Traditional data retrieval methods are frustrating when dealing with large-scale, multi-field databases. Although search enhanced generation (RETRIEVAL-Augmented Generation, RAG) systems have advanced by combining search and generation techniques, significant limitations remain in processing complex queries and multidimensional data.
Currently, the mainstream RAG system generally performs vectorization processing on only the main text of the database, and performs full word matching on other attributes simply as metadata. This approach causes a series of problems in that the flexibility of retrieval is limited and users often need to explicitly specify matching content. In addition, existing systems also have shortcomings in terms of query understanding capabilities, and it is difficult to accurately capture the user's real intent, particularly when dealing with complex, versatile queries. Finally, most data retrieval systems lack efficient results evaluation and optimization mechanisms.
The above problems result in inefficiency in existing data retrieval.
Disclosure of Invention
The invention provides a data retrieval method, a data retrieval device, electronic equipment and a storage medium, which are used for solving the defect of low data retrieval efficiency in the prior art and improving the data retrieval efficiency.
The invention provides a data retrieval method which comprises the following steps of obtaining feature information of a query sentence based on a semantic analysis result of the query sentence of a user by a language model, obtaining a plurality of matching field groups of the query sentence in a field-topic mapping table of a database based on the feature information, wherein the field-topic mapping table comprises preset topics of the database and mapping relations of fields of the database, the matching field groups are in one-to-one correspondence with the preset topics, expanding the matching fields in each matching field group to obtain a plurality of expansion matching field groups of the query sentence, carrying out multidimensional retrieval in the database based on the expansion matching field groups to obtain a retrieval result of each expansion matching field group, and carrying out weighted fusion on the retrieval result based on weights of the expansion matching field groups to obtain retrieval data of the query sentence.
The data retrieval method comprises the steps of obtaining a plurality of matching field groups of query sentences in a field-topic mapping table of a database based on characteristic information, obtaining a plurality of matching field sequences of the query sentences in the field-topic mapping table based on the characteristic information, obtaining the correlation between matching fields in each matching field sequence and the query sentences, sequencing the matching fields according to the sequence from big to small in correlation to obtain sequenced matching field sequences, and taking at least one matching field which is ranked at the forefront in each sequenced matching field sequence as the matching field group.
The data retrieval method comprises the steps of carrying out weighted fusion on retrieval results based on the weight of an extended matching field group to obtain retrieval data of a query sentence, updating the matching field group based on each ordered matching field sequence when the retrieval score of the retrieval data is lower than a set score, carrying out iterative retrieval based on the updated matching field group until the retrieval score is greater than or equal to the set score or the number of iterative retrieval reaches the set number of times, and taking the retrieval data with the highest retrieval score as final retrieval data.
The data retrieval method provided by the invention expands the matching fields in each matching field group to obtain a plurality of expansion matching field groups of the query statement, and comprises the steps of carrying out named entity recognition analysis on each matching field of the matching field groups to obtain the associated field of the matching field, and obtaining each expansion matching field group based on all the matching fields in each matching field group and the associated fields of the matching fields.
The data retrieval method provided by the invention is characterized in that the weight of the expansion matching field group is determined based on the steps of acquiring the basic weight of each expansion matching field group based on a preset theme, determining the importance level of each expansion matching field group based on characteristic information, and adjusting the basic weight based on the importance level to obtain the weight of the expansion matching field group.
The data retrieval method includes the steps that a database is determined based on the following steps that data is cleaned on initial texts participating in retrieval in the initial database to unify formats of the initial texts, the cleaned initial texts are converted into text vectors, if the repetition times of the text vectors in the initial database are lower than set repetition times, an index structure of the text vectors is built, and the database is obtained based on the index structure and the text vectors.
The data retrieval method comprises the steps of carrying out multidimensional retrieval in a database based on a plurality of expansion matching field groups to obtain a retrieval result of each expansion matching field group, carrying out multidimensional retrieval in the database based on one expansion matching field group, carrying out multidimensional retrieval in the database based on a plurality of expansion matching field groups, determining the retrieval result based on the similarity of a text vector of an index structure and an expansion matching field if the index structure matched with the expansion matching field exists in the database, and carrying out fuzzy matching on the unstructured text vector if the unstructured text vector matched with the expansion matching field exists in the database to obtain the retrieval result.
The invention further provides a data retrieval device, which comprises a characteristic information determining module, a matching module and a fusion module, wherein the characteristic information determining module is used for obtaining characteristic information of a query statement based on a semantic analysis result of the query statement of a user, the characteristic information comprises query intention, key information and context information, the matching module is used for obtaining a plurality of matching field groups of the query statement in a field-topic mapping table of a database based on the characteristic information, the field-topic mapping table comprises preset topics of the database and mapping relations of fields of the database, the matching field groups are in one-to-one correspondence with the preset topics, the expansion module is used for expanding the matching fields in each matching field group to obtain a plurality of expansion matching field groups of the query statement, the retrieval module is used for carrying out multidimensional retrieval in the database based on the expansion matching field groups to obtain retrieval results of each expansion matching field group, and the fusion module is used for carrying out weighted fusion on the retrieval results based on weights of the expansion matching field groups to obtain retrieval data of the query statement.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing any one of the data retrieval methods described above when executing the computer program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data retrieval method as any one of the above.
According to the data retrieval method, the data retrieval device, the electronic equipment and the storage medium, the matching fields of the query statement are obtained through the characteristic information, so that accurate matching according to the query intention, the key information and the context information is realized, and the accuracy of the matching field group is improved. And a matching field group is obtained according to the field-topic mapping table, so that the quick positioning of the query statement is realized, and the efficiency and the accuracy of data retrieval are improved. And the multidimensional search is carried out according to the extended matching field group, so that the search is carried out according to the entity of the matching field group, and the accuracy and the efficiency of data search are improved. The retrieval results are weighted and fused according to the weight of the extended matching field group, so that the weight condition of each retrieval result can be accurately obtained, and a user can conveniently and quickly find the retrieval data most relevant to the query statement.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a data retrieval method provided by the invention.
Fig. 2 is a schematic structural diagram of a data retrieval device provided by the present invention.
Fig. 3 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The data retrieval method, apparatus, electronic device and storage medium of the present invention are described below with reference to fig. 1 to 3.
Fig. 1 is a schematic flow chart of the data retrieval method provided by the present invention, as shown in fig. 1, the data retrieval method includes steps S100 to S500, and each step is specifically as follows.
And S100, acquiring characteristic information of the query statement based on a semantic analysis result of the language model on the query statement of the user, wherein the characteristic information comprises query intention, key information and context information.
A language model is a model capable of understanding and expressing a language, and is used to process a natural language. The language model processes semantic information of text using machine learning and deep learning techniques. The language model includes chat generation pre-training model (ChatGPT), universal thousand language model, bean bag language model, and the like.
And performing basic text cleaning and standardization on the query sentences of the user. Query intent, key information, and contextual information of a query statement are analyzed using a language model. Where query intent includes factual queries, relational queries, and exploratory queries. The key information includes keywords, entities, and relationships in the query statement. The context information includes potential semantics of the query statement and the context information.
S200, acquiring a plurality of matching field groups of the query statement in a field-topic mapping table of the database based on the characteristic information.
The field-topic mapping table comprises mapping relations between preset topics of a plurality of databases and fields of the databases, and the matched field groups are in one-to-one correspondence with the preset topics.
A field-topic mapping table of the database is pre-constructed. The field-topic mapping table is used to describe the semantics and applicable scenarios of each field in the database. The field-topic map includes a number of preset topics, such as title, body, author, date, journal quality, and the like. Wherein the title includes a plurality of title fields for generalized queries, keyword matching. The text includes a plurality of text fields for detailed content queries, full text searches. The author includes a plurality of author fields for a particular personally related query. The date includes a plurality of date fields for time-related queries.
And matching the query statement of the user with the field-topic mapping table by using the language model as a route, and selecting a plurality of matched preset topics. And selecting the field most relevant to the query statement and the matched preset theme as a matched field group.
For example, a query sentence is "I want to find a review article published in the top journal for the last five years, about the application of artificial intelligence in agricultural modernization. Of particular interest are studies using computer vision and deep learning techniques, preferably from authors of units a or B. It is better if the application of technology in small farmers is difficult or environmental sustainability problems are discussed in the paper. "it can be seen that the query intent is to find matching documents. The key information is artificial intelligence, agricultural modernization, computer vision and deep learning technology. The contextual information is "want to find", "pay particular attention to", "preferably" and "better if.
Multiple matching field sets of query statements are obtained in a field-topic mapping table based on feature information (including query intent, key information, and context information), including time matching field sets (last five years), type matching field sets (review articles), title matching field sets (artificial intelligence, agricultural modernization), body matching field sets (computer vision and deep learning), author matching field sets (authors in a or B units), journal quality matching field sets (top journals), and other matching field sets (small farmer application problem, environmental sustainability).
And S300, expanding the matching fields in each matching field group to obtain a plurality of expanded matching field groups of the query statement.
And expanding the matching fields in each matching field group to obtain a plurality of expanded matching field groups of the query statement, specifically, carrying out named entity recognition analysis on each matching field of the matching field groups to obtain an associated field of the matching field, and obtaining each expanded matching field group based on all the matching fields in each matching field group and the associated fields of the matching fields.
Named Entity Recognition (NER) analysis is an entity that analyzes matching fields, such as person name, organization name, place name, date, number, etc. The NER analysis of the present invention is used to accurately extract entities associated with matching fields, thereby improving the accuracy of the search.
For example, the matching field is nearly five years, and the extended matching field obtained after NER analysis is 2019 to 2024. And by analogy, the extended matching fields corresponding to the review articles are reviewed, commented, reviewed and meta-analyzed. The extended matching fields corresponding to the artificial intelligence are artificial intelligence, intelligent technology, machine learning and intelligent agriculture. The extended matching fields corresponding to the agriculture modernization are agriculture modernization, agriculture 4.0, intelligent agriculture and accurate agriculture. The extended matching fields corresponding to the computer vision are computer vision, image recognition, target detection and remote sensing image analysis. The extended matching field corresponding to the deep learning is a neural network, a convolutional neural network and a cyclic neural network. The extension matching field corresponding to the author of the A unit or the B unit is the author of the A unit, the B unit, the subordinate unit of the A unit and the subordinate unit of the B unit. The extended matching fields corresponding to the top-level journal are a C journal, a D journal and an E journal. The extended matching field corresponding to the small farmer application problem is a small farmer, technology popularization and cost problem. The extended matching fields corresponding to the environmental sustainability are environmental, ecological and green agriculture.
And taking all the matching fields in the matching field group and the associated fields of the matching fields as extension matching fields, thereby obtaining an extension matching field group.
According to the invention, the associated field of the matching field is obtained according to the named entity recognition analysis, so that the entity mining of the matching field is realized, the accuracy and the association of the associated field are improved, and the retrieval precision is improved.
And S400, carrying out multidimensional search in the database based on a plurality of expansion matching field groups to obtain a search result of each expansion matching field group.
And carrying out dimension retrieval according to each extension matching field group. For example, there are multiple sets of extension match fields, respectively a time extension match field set, a type extension match field set, a title extension match field set, a body extension match field set, an author extension match field set, a journal quality extension match field set, and other extension match field sets. And searching an extension matching field group corresponding to one dimension to obtain a search result.
For example, if the time-extended matching field group is 2019 to 2024, all documents in 2019 to 2024 are matched, and a document list set (search result) is obtained.
And S500, carrying out weighted fusion on the search result based on the weight of the extended matching field group to obtain the search data of the query sentence.
The weight of each extended matching field group is obtained, for example, the weight of the header extended matching field group is 1, the weight of the text extended matching field group is 0.8, the weight of the type extended matching field group is 0.7, the weight of the time extended matching field group is 0.6, the weight of the author extended matching field group is 0.4, the weight of the journal quality extended matching field group is 0.5, and the weight of other extended matching field groups is 0.3.
And determining the weighted and fused weight of the search result according to the weight of each extended matching field group, and sorting the search result according to the weighted and fused weight, for example, sorting the search result according to the order of the weights of the weighted summation from large to small, so as to obtain the search data.
For example, if the a document matches the time extension match field set, the type extension match field set, the title extension match field set, the body extension match field set, the author extension match field set, the journal quality extension match field set, and other extension match field sets, the weighted fusion weight of the a document is 1+0.8+0.7+0.6+0.5+0.4+0.3=4.3 (which is the maximum weight), and the a document is arranged in the first place.
According to the data retrieval method provided by the embodiment of the invention, the matching fields of the query statement are obtained through the characteristic information, so that the accurate matching according to the query intention, the key information and the context information is realized, and the accuracy of the matching field group is improved. And a matching field group is obtained according to the field-topic mapping table, so that the quick positioning of the query statement is realized, and the efficiency and the accuracy of data retrieval are improved. And the multidimensional search is carried out according to the extended matching field group, so that the search is carried out according to the entity of the matching field group, and the accuracy and the efficiency of data search are improved. The retrieval results are weighted and fused according to the weight of the extended matching field group, so that the weight condition of each retrieval result can be accurately obtained, and a user can conveniently and quickly find the retrieval data most relevant to the query statement.
Based on the above embodiment, a plurality of matching field sets of the query sentence are obtained in the field-topic mapping table of the database based on the feature information, including steps S210 to S230, and each step is specifically as follows.
S210, acquiring a plurality of matching field sequences of the query statement in a field-topic mapping table based on the characteristic information.
S220, obtaining the correlation between the matching fields in each matching field sequence and the query statement, and sequencing the matching fields according to the sequence from the big correlation to the small correlation to obtain a sequenced matching field sequence.
S230, in each ordered matching field sequence, at least one matching field arranged at the forefront is used as a matching field group.
And acquiring a plurality of matching field sequences matched with the query statement in the field-topic mapping table according to the query intention, the key information and the context information of the query statement. For example, a query statement is "a highly cited paper about artificial intelligence in medical applications published in the last five years". The matching field sequences include title and keyword matching field sequences (including artificial intelligence and medical domain, artificial intelligence, medical domain), referenced number matching field sequences (including referenced number greater than 100, referenced number between 100 and 10, referenced number less than 10), and time field matching sequences (including last 1 year, last 3 years, last 5 years, last 10 years).
And in the matching field sequence, the matching fields are ordered according to the sequence from the big correlation to the small correlation, and the ordered matching field sequence is obtained. For example, the ordered sequence of the referenced number matching fields is referenced number greater than 100, referenced number between 100 and 10, referenced number less than 10.
In each ordered matching field sequence, at least one matching field arranged at the forefront is used as a matching field group. For example, the matching field of the first bit in the front is set as the matching field group. The number of times referenced is greater than 100 as a set of matching fields.
The invention preliminarily locks the range of the query statement by acquiring a plurality of matching field sequences of the query statement. And the matching field group is determined according to the correlation, so that the degree of correlation between the matching field group and the query statement is improved, and the retrieval efficiency and accuracy are improved.
Based on the above embodiment, the method further includes steps S600 to S800 after the search result is weighted and fused based on the weight of the extended matching field group to obtain the search data of the query sentence, where each step is specifically as follows.
And S600, updating the matching field group based on each ordered matching field sequence when the retrieval score of the retrieval data is lower than the set score.
And S700, performing iterative search based on the updated matching field group until the search score is greater than or equal to the set score or the number of iterative search reaches the set number.
And S800, taking the search data with the highest search scores as final search data.
And carrying out semantic relevance scoring on the search data according to the language model to obtain the search score of the query sentence. And comparing the search score with the set score, and judging whether optimization is needed or not. If the search score is lower than the set score, the search data of the current time is not ideal, and the search needs to be carried out again.
And updating the matching field group based on each ordered matching field sequence. For example, the ordered matching field sequence is referenced more than 100 times, referenced between 100 and 10 times, and referenced less than 10 times. The matching field group in the current retrieval is referenced times greater than 100 times. The updated set of matching fields in the next search is referenced between 100 and 10 times.
And searching again according to all the updated matching field groups until the search score is greater than or equal to the set score or the number of iterative search reaches the set number of times, stopping searching. And taking the search data with the highest search score as final search data.
According to the invention, the matching field group is updated to perform iterative search, so that the final search precision is improved.
Based on the above embodiment, the weight of the extended matching field group is determined based on steps S510 to S530, and each step is specifically as follows.
And S510, acquiring the basic weight of each extension matching field group based on a preset theme.
And S520, determining the importance level of each extension matching field group based on the characteristic information.
And S530, adjusting the basic weight based on the importance level to obtain the weight of the extended matching field group.
The basic weight of each extended matching field group is preset, for example, the basic weight of the header extended matching field group is 1, the basic weight of the text extended matching field group is 0.8, the basic weight of the type extended matching field group is 0.7, the basic weight of the time extended matching field group is 0.6, and the basic weight of the author extended matching field group is 0.4.
The importance level of each set of extended matching fields is determined based on the characteristic information, including query intent, key information, and context information. For example, if the query statement is determined to be time-sensitive according to the feature information, the importance level of the time-expansion matching field set is determined to be the highest importance level, the basic weight of the time-expansion matching field set is adjusted to be 1 according to the highest importance level, and the weight of the final time-expansion matching field set is 1.
The invention adjusts the basic weight according to the characteristic information, realizes the adjustment of the basic weight according to the actual demand of the query statement, ensures that the retrieval result can meet the actual demand of the user to the greatest extent, and is beneficial to improving the accuracy of data retrieval.
Based on the above embodiment, the database is determined based on steps S410 to S440, and each step is specifically as follows.
And S410, data cleaning is carried out on the initial texts participating in the retrieval in the initial database so as to unify the formats of the initial texts.
S420, converting the cleaned initial text into a text vector.
And S430, if the repetition number of the text vector in the initial database is lower than the set repetition number, constructing an index structure of the text vector.
And S440, obtaining a database based on the index structure and the text vector.
All initial text in the initial database that may be involved in the search is identified, including but not limited to title, body, abstract, keywords, author, date, etc. And cleaning the data of the initial text to remove special characters and unify the format of the initial text. The cleaned initial text is converted into text vectors using a pre-trained language model, e.g., a BGE-M3 language model. For a relatively long initial text, segmenting the initial text to obtain segmented texts, and converting each segmented text into a text vector.
If the number of repetitions of the text vector in the initial database is lower than the set number of repetitions, it is indicated that the text vector is a text vector that is not repeated in large numbers, such as body and title. An index structure of a non-large number of repeated text vectors is constructed. For example, the index structure is constructed using a hierarchically navigable small world (HIERARCHICAL NAVIGABLE SMALL WORLD, HNSW) algorithm. The HNSW algorithm can complete searching within logarithmic time complexity, and the retrieval speed of large-scale vector data is improved. The HNSW algorithm supports dynamic insertion of new vectors, suitable for processing ever-increasing data sets. The index structure obtained according to HNSW algorithm is relatively compact, and memory resources can be effectively utilized.
If the number of repetitions of the text vector in the initial database is greater than the set number of repetitions, the text vector is a number of repeated text vectors, e.g., author, tag, etc. No index structure is built for this large number of repeated text vectors, matching is done directly at query time.
According to the invention, the index structure is constructed by the text vector with the repetition frequency lower than the set repetition frequency, so that the searching speed of the database is improved, and the utilization rate of the memory resource of the database is also improved.
According to the invention, vectorization processing is carried out on all initial texts participating in matching or searching in the database, so that the semantic understanding capability of the database is greatly expanded, and a foundation is laid for subsequent accurate searching. Meanwhile, to support efficient retrieval, the database also builds an efficient index structure (e.g., HNSW) for the vectorized non-large number of repeated fields to support fast retrieval. The fields with a large number of repeated categories do not need to be indexed, so that the effect is improved, the categories can be increased or decreased rapidly, and an index structure is not required to be constructed frequently and repeatedly.
Based on the above embodiment, multidimensional searching is performed in the database based on a plurality of extension matching field groups, so as to obtain a searching result of each extension matching field group, including steps S540 to S550, and each step is specifically as follows.
S540, performing a dimension search in the database based on an extended matching field set.
S550, multi-dimensional search is carried out in the database based on a plurality of expansion matching field groups, wherein if an index structure matched with the expansion matching field exists in the database, a search result is determined based on the similarity between a text vector of the index structure and the expansion matching field, and if an unstructured text vector matched with the expansion matching field exists in the database, fuzzy matching is carried out on the unstructured text vector to obtain the search result.
A dimension search is performed in the database based on a set of extended matching fields. A multi-dimensional search is performed in the database based on the plurality of extended matching field sets. If the index structure is retrieved according to the extension matching field in the extension matching field group, a nearest neighbor algorithm (for example HNSW) is used to calculate the similarity between the extension matching field and the text vector in the index structure, and the text vector with the highest similarity is used as the retrieval result of the extension matching field. If the unstructured text vector is retrieved according to the extended matching field, fuzzy matching is performed on the unstructured text vector, and the fuzzy matching allows a certain degree of error in character string comparison. The fuzzy matching takes text vectors in the conditions of misspelling, homonyms, hyponyms and the like as matched text vectors, so that the retrieval flexibility and recall rate are improved. If the extended matching field is matched with the category field, the accurate or fuzzy matching is directly carried out.
According to the invention, multidimensional searching is carried out according to a plurality of expansion matching field groups, so that the searching speed and the searching comprehensiveness are improved. Similarity matching and fuzzy matching are introduced in the retrieval, so that the retrieval precision is improved.
The data retrieval device provided by the invention is described below, and the data retrieval device described below and the data retrieval method described above can be referred to correspondingly to each other.
As shown in fig. 2, a data retrieval apparatus includes a feature information determination module 201 for acquiring feature information of a query sentence of a user based on a semantic analysis result of the query sentence by a language model, the feature information including a query intention, key information, and context information.
The matching module 202 is configured to obtain a plurality of matching field sets of the query statement from a field-topic mapping table of the database based on the feature information, where the field-topic mapping table includes mapping relationships between preset topics of the plurality of databases and fields of the database, and the matching field sets are in one-to-one correspondence with the preset topics.
And the expansion module 203 is configured to expand the matching fields in each matching field set to obtain a plurality of expanded matching field sets of the query statement.
The retrieving module 204 is configured to perform multidimensional retrieval in the database based on the multiple extended matching field sets, so as to obtain a retrieval result of each extended matching field set.
And the fusion module 205 is configured to perform weighted fusion on the search result based on the weight of the extended matching field set, so as to obtain search data of the query sentence.
According to the data retrieval device provided by the embodiment of the invention, the matching fields of the query statement are obtained through the characteristic information, so that the accurate matching according to the query intention, the key information and the context information is realized, and the accuracy of the matching field group is improved. And a matching field group is obtained according to the field-topic mapping table, so that the quick positioning of the query statement is realized, and the efficiency and the accuracy of data retrieval are improved. And the multidimensional search is carried out according to the extended matching field group, so that the search is carried out according to the entity of the matching field group, and the accuracy and the efficiency of data search are improved. The retrieval results are weighted and fused according to the weight of the extended matching field group, so that the weight condition of each retrieval result can be accurately obtained, and a user can conveniently and quickly find the retrieval data most relevant to the query statement.
In one embodiment, the matching module 202 is configured to obtain a plurality of matching field sequences of the query statement in the field-topic mapping table based on the feature information, obtain correlations between the matching fields and the query statement in each matching field sequence, sort the matching fields in order of the correlations from big to small to obtain sorted matching field sequences, and use at least one matching field arranged at the forefront in each sorted matching field sequence as a matching field group.
In one embodiment, the fusion module 205 is further configured to update the set of matching fields based on each ordered sequence of matching fields when the search score of the search data is lower than the set score, perform iterative search based on the updated set of matching fields until the search score is greater than or equal to the set score, or the number of iterative searches reaches the set number of times, and take the search data with the highest search score as the final search data.
In one embodiment, the expansion module 203 is configured to perform named entity recognition analysis on each matching field of the matching field groups to obtain associated fields of the matching fields, and obtain each expanded matching field group based on all matching fields in each matching field group and associated fields of the matching fields.
In one embodiment, the fusion module 205 is configured to obtain a basic weight of each set of extended matching fields based on a preset theme, determine an importance level of each set of extended matching fields based on the feature information, and adjust the basic weight based on the importance level to obtain a weight of the set of extended matching fields.
In one embodiment, the retrieval module 204 is further configured to perform data cleansing on the initial text participating in retrieval in the initial database to unify the format of the initial text, convert the cleansed initial text into a text vector, construct an index structure of the text vector if the number of repetitions of the text vector in the initial database is lower than a set number of repetitions, and obtain the database based on the index structure and the text vector.
In one embodiment, the retrieval module 204 is configured to perform a dimension retrieval in the database based on one set of extended matching fields, perform a multi-dimension retrieval in the database based on a plurality of sets of extended matching fields, determine a retrieval result based on a similarity between a text vector of the index structure and the extended matching field if an index structure matching the extended matching fields exists in the database, and perform fuzzy matching on the unstructured text vector if an unstructured text vector matching the extended matching fields exists in the database, so as to obtain the retrieval result.
Fig. 3 illustrates a physical schematic diagram of an electronic device, which may include a processor 310, a communication interface (Communications Interface), a memory 330, and a communication bus 340, as shown in fig. 3, where the processor 310, the communication interface 320, and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to execute a data retrieval method, where the method includes obtaining feature information of a query statement based on a language model for a semantic analysis result of the query statement of a user, where the feature information includes query intention, key information and context information, obtaining a plurality of matching field groups of the query statement in a field-topic mapping table of a database based on the feature information, where the field-topic mapping table includes mapping relationships between preset topics of the database and fields of the database, where the matching field groups correspond to the preset topics one by one, expanding matching fields in each matching field group to obtain a plurality of expanded matching field groups of the query statement, performing multidimensional retrieval in the database based on the plurality of expanded matching field groups to obtain a retrieval result of each expanded matching field group, and performing weighted fusion on the retrieval result based on weights of the expanded matching field groups to obtain the retrieval data of the query statement.
Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.
In yet another aspect, the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program when executed by a processor is implemented to perform the data retrieval method provided by the above methods, where the method includes obtaining feature information of a query statement based on a semantic analysis result of the query statement by a language model, the feature information including a query intention, key information, and context information, obtaining a plurality of matching field groups of the query statement in a field-topic mapping table of a database based on the feature information, the field-topic mapping table including mapping relationships between preset topics of the plurality of databases and fields of the database, the matching field groups corresponding to the preset topics one by one, expanding the matching fields in each matching field group to obtain a plurality of expanded matching field groups of the query statement, performing multidimensional retrieval in the database based on the plurality of expanded matching field groups to obtain a retrieval result of each expanded matching field group, and performing weighted fusion on the retrieval result based on the weights of the expanded matching field groups to obtain the retrieval data of the query statement.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims (9)

1.一种数据检索方法,其特征在于,包括:1. A data retrieval method, comprising: 基于语言模型对用户的查询语句的语义分析结果获取所述查询语句的特征信息,所述特征信息包括查询意图、关键信息和上下文信息;Acquire feature information of the query statement based on the semantic analysis result of the user's query statement by the language model, wherein the feature information includes query intent, key information and context information; 基于所述特征信息在数据库的字段-主题映射表中获取所述查询语句的多个匹配字段组;所述字段-主题映射表包括多个所述数据库的预设主题和数据库的字段的映射关系;所述匹配字段组与所述预设主题一一对应,所述字段-主题映射表用于描述所述数据库中各字段的语义和适用场景;所述数据库是基于以下步骤确定的:Based on the feature information, multiple matching field groups of the query statement are obtained in a field-topic mapping table of the database; the field-topic mapping table includes mapping relationships between multiple preset topics of the database and fields of the database; the matching field groups correspond to the preset topics one by one, and the field-topic mapping table is used to describe the semantics and applicable scenarios of each field in the database; the database is determined based on the following steps: 对初始数据库中参与检索的初始文本进行数据清洗,以统一所述初始文本的格式;Performing data cleaning on the initial texts involved in the retrieval in the initial database to unify the format of the initial texts; 将清洗后的所述初始文本转换为文本向量;Convert the cleaned initial text into a text vector; 若所述文本向量在所述初始数据库中的重复次数低于设定重复次数,构建所述文本向量的索引结构;所述索引结构是基于层次可导航的小世界HNSW算法构建的;If the number of repetitions of the text vector in the initial database is lower than the set number of repetitions, constructing an index structure of the text vector; the index structure is constructed based on a hierarchically navigable small world HNSW algorithm; 基于所述索引结构和所述文本向量得到所述数据库;Obtaining the database based on the index structure and the text vector; 对每个所述匹配字段组中的匹配字段进行扩展,得到所述查询语句的多个扩展匹配字段组;Expanding the matching fields in each of the matching field groups to obtain multiple expanded matching field groups of the query statement; 基于多个所述扩展匹配字段组在所述数据库中进行多维度检索,得到每个所述扩展匹配字段组的检索结果;Perform a multi-dimensional search in the database based on the plurality of extended matching field groups to obtain a search result for each of the extended matching field groups; 基于扩展匹配字段组的权重对所述检索结果进行加权融合,得到所述查询语句的检索数据。The search results are weighted and fused based on the weights of the extended matching field groups to obtain the search data of the query statement. 2.根据权利要求1所述的数据检索方法,其特征在于,所述基于所述特征信息在数据库的字段-主题映射表中获取所述查询语句的多个匹配字段组,包括:2. The data retrieval method according to claim 1, characterized in that the step of obtaining a plurality of matching field groups of the query statement in a field-subject mapping table of a database based on the characteristic information comprises: 基于所述特征信息在所述字段-主题映射表中获取所述查询语句的多个匹配字段序列;Acquire multiple matching field sequences of the query statement in the field-topic mapping table based on the feature information; 获取每个所述匹配字段序列中所述匹配字段与所述查询语句的相关性,将所述匹配字段按照相关性从大到小的顺序进行排序,得到排序后的匹配字段序列;Obtaining the correlation between the matching field and the query statement in each matching field sequence, and sorting the matching fields in descending order of correlation to obtain a sorted matching field sequence; 在每个所述排序后的匹配字段序列中,将排在最前面的至少一个匹配字段作为所述匹配字段组。In each of the sorted matching field sequences, at least one matching field that is at the front is used as the matching field group. 3.根据权利要求2所述的数据检索方法,其特征在于,所述基于扩展匹配字段组的权重对所述检索结果进行加权融合,得到所述查询语句的检索数据之后,还包括:3. The data retrieval method according to claim 2, characterized in that after the retrieval results are weighted and merged based on the weights of the extended matching field groups to obtain the retrieval data of the query statement, the method further comprises: 当所述检索数据的检索评分低于设定分数时,基于每个所述排序后的匹配字段序列对所述匹配字段组进行更新;When the search score of the search data is lower than a set score, updating the matching field group based on each of the sorted matching field sequences; 基于更新后的匹配字段组进行迭代检索,直至所述检索评分大于或者等于所述设定分数,或者迭代检索的次数达到设定次数;Performing iterative search based on the updated matching field group until the search score is greater than or equal to the set score, or the number of iterative searches reaches the set number; 将检索评分最高的检索数据作为最终的检索数据。The retrieval data with the highest retrieval score is taken as the final retrieval data. 4.根据权利要求1所述的数据检索方法,其特征在于,所述对每个所述匹配字段组中的匹配字段进行扩展,得到所述查询语句的多个扩展匹配字段组,包括:4. The data retrieval method according to claim 1, wherein the step of expanding the matching fields in each matching field group to obtain a plurality of expanded matching field groups of the query statement comprises: 对所述匹配字段组的每个所述匹配字段进行命名实体识别分析,得到匹配字段的关联字段;Performing named entity recognition analysis on each of the matching fields in the matching field group to obtain associated fields of the matching field; 基于每个所述匹配字段组中的所有所述匹配字段和所述匹配字段的关联字段得到每个所述扩展匹配字段组。Each of the extended matching field groups is obtained based on all of the matching fields in each of the matching field groups and associated fields of the matching fields. 5.根据权利要求1所述的数据检索方法,其特征在于,所述扩展匹配字段组的权重是基于以下步骤确定的:5. The data retrieval method according to claim 1, wherein the weight of the extended matching field group is determined based on the following steps: 基于所述预设主题获取每个所述扩展匹配字段组的基础权重;Obtaining a basic weight of each of the extended matching field groups based on the preset subject; 基于所述特征信息确定每个所述扩展匹配字段组的重要等级;determining an importance level of each of the extended matching field groups based on the feature information; 基于所述重要等级对所述基础权重进行调整,得到所述扩展匹配字段组的权重。The basic weight is adjusted based on the importance level to obtain the weight of the extended matching field group. 6.根据权利要求1所述的数据检索方法,其特征在于,所述基于多个所述扩展匹配字段组在所述数据库中进行多维度检索,得到每个所述扩展匹配字段组的检索结果,包括:6. The data retrieval method according to claim 1, characterized in that the step of performing a multi-dimensional search in the database based on the plurality of the extended matching field groups to obtain a search result for each of the extended matching field groups comprises: 基于一个所述扩展匹配字段组在所述数据库中进行一个维度检索;Perform a dimensional search in the database based on one of the extended matching field groups; 基于多个所述扩展匹配字段组在所述数据库中进行多维度检索;其中,若所述数据库中存在与扩展匹配字段匹配的索引结构,则基于所述索引结构的文本向量与所述扩展匹配字段的相似度确定检索结果;若所述数据库中存在与所述扩展匹配字段匹配的非结构化文本向量,对所述非结构化文本向量进行模糊匹配,得到所述检索结果。A multi-dimensional search is performed in the database based on a plurality of the extended matching field groups; wherein, if an index structure matching the extended matching field exists in the database, a search result is determined based on the similarity between a text vector of the index structure and the extended matching field; and if an unstructured text vector matching the extended matching field exists in the database, a fuzzy match is performed on the unstructured text vector to obtain the search result. 7.一种数据检索装置,其特征在于,包括:7. A data retrieval device, comprising: 特征信息确定模块,用于基于语言模型对用户的查询语句的语义分析结果获取所述查询语句的特征信息,所述特征信息包括查询意图、关键信息和上下文信息;A feature information determination module, used to obtain feature information of a query statement based on a semantic analysis result of a user's query statement by a language model, wherein the feature information includes query intent, key information, and context information; 匹配模块,用于基于所述特征信息在数据库的字段-主题映射表中获取所述查询语句的多个匹配字段组;所述字段-主题映射表包括多个所述数据库的预设主题和数据库的字段的映射关系;所述匹配字段组与所述预设主题一一对应,所述字段-主题映射表用于描述所述数据库中各字段的语义和适用场景;A matching module, used to obtain multiple matching field groups of the query statement in a field-topic mapping table of the database based on the feature information; the field-topic mapping table includes mapping relationships between multiple preset topics of the database and fields of the database; the matching field groups correspond to the preset topics one by one, and the field-topic mapping table is used to describe the semantics and applicable scenarios of each field in the database; 扩展模块,用于对每个所述匹配字段组中的匹配字段进行扩展,得到所述查询语句的多个扩展匹配字段组;An expansion module, used for expanding the matching fields in each of the matching field groups to obtain multiple extended matching field groups of the query statement; 检索模块,用于基于多个所述扩展匹配字段组在所述数据库中进行多维度检索,得到每个所述扩展匹配字段组的检索结果;A retrieval module, configured to perform a multi-dimensional search in the database based on the plurality of extended matching field groups, and obtain a search result for each of the extended matching field groups; 融合模块,用于基于扩展匹配字段组的权重对所述检索结果进行加权融合,得到所述查询语句的检索数据;A fusion module, used for performing weighted fusion on the search results based on the weights of the extended matching field groups to obtain the search data of the query statement; 所述匹配模块,还用于对初始数据库中参与检索的初始文本进行数据清洗,以统一所述初始文本的格式;将清洗后的所述初始文本转换为文本向量;若所述文本向量在所述初始数据库中的重复次数低于设定重复次数,构建所述文本向量的索引结构;所述索引结构是基于层次可导航的小世界HNSW算法构建的;基于所述索引结构和所述文本向量得到所述数据库。The matching module is also used to clean the initial text involved in the retrieval in the initial database to unify the format of the initial text; convert the cleaned initial text into a text vector; if the number of repetitions of the text vector in the initial database is lower than the set number of repetitions, construct an index structure of the text vector; the index structure is constructed based on a hierarchically navigable small world HNSW algorithm; and obtain the database based on the index structure and the text vector. 8.一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至6任一项所述数据检索方法。8. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the data retrieval method according to any one of claims 1 to 6 when executing the computer program. 9.一种非暂态计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述数据检索方法。9. A non-transitory computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the data retrieval method according to any one of claims 1 to 6 is implemented.
CN202411162580.4A 2024-08-23 2024-08-23 Data retrieval method, device, electronic equipment and storage medium Active CN118673101B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411162580.4A CN118673101B (en) 2024-08-23 2024-08-23 Data retrieval method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411162580.4A CN118673101B (en) 2024-08-23 2024-08-23 Data retrieval method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN118673101A CN118673101A (en) 2024-09-20
CN118673101B true CN118673101B (en) 2025-01-07

Family

ID=92724814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411162580.4A Active CN118673101B (en) 2024-08-23 2024-08-23 Data retrieval method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118673101B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015762A (en) * 2019-05-30 2020-12-01 广州慧睿思通信息科技有限公司 Case retrieval method and device, computer equipment and storage medium
CN112035598A (en) * 2020-11-03 2020-12-04 北京淇瑀信息科技有限公司 Intelligent semantic retrieval method and system and electronic equipment
CN118193714A (en) * 2024-05-17 2024-06-14 山东浪潮科学研究院有限公司 Dynamic adaptation question-answering system and method based on hierarchical structure and retrieval enhancement

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102090237B1 (en) * 2018-07-31 2020-03-17 주식회사 포티투마루 Method, system and computer program for knowledge extension based on triple-semantic
CN116881436B (en) * 2023-08-09 2025-08-19 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Knowledge graph-based document retrieval method, system, terminal and storage medium
CN117633202A (en) * 2023-11-23 2024-03-01 中国船舶集团有限公司系统工程研究院 An unstructured data processing method, device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015762A (en) * 2019-05-30 2020-12-01 广州慧睿思通信息科技有限公司 Case retrieval method and device, computer equipment and storage medium
CN112035598A (en) * 2020-11-03 2020-12-04 北京淇瑀信息科技有限公司 Intelligent semantic retrieval method and system and electronic equipment
CN118193714A (en) * 2024-05-17 2024-06-14 山东浪潮科学研究院有限公司 Dynamic adaptation question-answering system and method based on hierarchical structure and retrieval enhancement

Also Published As

Publication number Publication date
CN118673101A (en) 2024-09-20

Similar Documents

Publication Publication Date Title
CN108804521B (en) Knowledge graph-based question-answering method and agricultural encyclopedia question-answering system
US8341159B2 (en) Creating taxonomies and training data for document categorization
CN108959461B (en) An Entity Linking Method Based on Graph Model
CN104239513B (en) A Semantic Retrieval Method for Domain Data
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
CN108132927B (en) Keyword extraction method for combining graph structure and node association
CN109829104A (en) Pseudo-linear filter model information search method and system based on semantic similarity
CN112559684A (en) Keyword extraction and information retrieval method
CN108509521B (en) An Image Retrieval Method for Automatically Generated Text Index
CN112328800A (en) System and method for automatically generating programming specification question answers
CN111581368A (en) Intelligent expert recommendation-oriented user image drawing method based on convolutional neural network
CN115563313A (en) Semantic retrieval system for literature and books based on knowledge graph
CN107291895A (en) A kind of quick stratification document searching method
CN117891838B (en) Large model retrieval enhancement generation method and device
CN112199461A (en) Document retrieval method, apparatus, medium and device based on block index structure
CN115757726A (en) A cold start method and device for an intelligent question answering system oriented to a specific field
CN112860898A (en) Short text box clustering method, system, equipment and storage medium
CN112989813A (en) Scientific and technological resource relation extraction method and device based on pre-training language model
CN110728135A (en) Text theme indexing method and device, electronic equipment and computer storage medium
CN112711944A (en) Word segmentation method and system and word segmentation device generation method and system
CN116401344A (en) Method and device for searching table according to question
CN112417170A (en) Relation linking method for incomplete knowledge graph
CN112507097B (en) Method for improving generalization capability of question-answering system
Afuan et al. A new approach in query expansion methods for improving information retrieval
CN113516202A (en) Webpage accurate classification method for CBL feature extraction and denoising

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant