[go: up one dir, main page]

CN119829730A - Data query method and device, storage medium and electronic equipment - Google Patents

Data query method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN119829730A
CN119829730A CN202411763921.3A CN202411763921A CN119829730A CN 119829730 A CN119829730 A CN 119829730A CN 202411763921 A CN202411763921 A CN 202411763921A CN 119829730 A CN119829730 A CN 119829730A
Authority
CN
China
Prior art keywords
vector
data
target
query
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411763921.3A
Other languages
Chinese (zh)
Inventor
刘烜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp Qingdao Branch
Original Assignee
China Construction Bank Corp Qingdao Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp Qingdao Branch filed Critical China Construction Bank Corp Qingdao Branch
Priority to CN202411763921.3A priority Critical patent/CN119829730A/en
Publication of CN119829730A publication Critical patent/CN119829730A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data query method and device, a storage medium and electronic equipment, wherein the method comprises the steps of obtaining a query request, responding to the query request, converting the query request into vector representation to obtain a query vector, searching vectors with matching degree larger than a preset threshold value between the query vector and the query vector from a target vector library to obtain a first vector set, executing sorting operation on the vectors included in the first vector set to obtain a target vector set, and searching target data matched with the query request from a target database based on the target vector set. The problem of low data query accuracy in the related technology is solved, and the effect of improving the data query accuracy is achieved.

Description

Data query method and device, storage medium and electronic equipment
Technical Field
The embodiment of the application relates to the field of computers, in particular to a data query method and device, a storage medium and electronic equipment.
Background
In the age of information explosion, knowledge search technology is an important tool for people to acquire, understand and utilize information. The traditional knowledge search method, such as a full text search technology based on keyword matching, can quickly search out documents containing specified keywords, but has obvious limitations on the accuracy and efficiency, and has the problems of inaccurate queried results and low correlation of query results.
Disclosure of Invention
The embodiment of the application provides a data query method and device, a storage medium and electronic equipment, which are used for at least solving the problem of low data query accuracy in the related technology.
According to one embodiment of the application, a data query method is provided, which comprises the steps of obtaining a query request, converting the query request into a vector representation in response to the query request to obtain a query vector, wherein the query vector is used for representing semantic information of the query request from N semantic dimensions, N is a natural number greater than or equal to 1, searching vectors with matching degree between the query vector and the query vector greater than a preset threshold value from a target vector library to obtain a first vector set, wherein the vector representation of data in the target database in M semantic dimensions is included in the target vector library, M is a natural number greater than or equal to 1, performing a sorting operation on the vectors included in the first vector set to obtain a target vector set, and searching the target data matched with the query request from the target database based on the target vector set.
In an exemplary embodiment, before searching the vector with the matching degree between the query vector and the target vector library to obtain the first vector set, the method further comprises determining a vector representation of each data according to the target database when the target database comprises a plurality of data, wherein the target vector library is obtained by extracting keywords and key phrases of the data, converting the keywords and the key phrases into vector representations to obtain a plurality of key vectors, and constructing the vector representation of the data based on the plurality of key vectors.
In an exemplary embodiment, constructing a vector representation of the data based on the plurality of key vectors includes determining M of the semantic dimensions from a data type and a data source of the data in the target database, dividing the plurality of key vectors of the data into M sets of sub-vectors according to the M of the semantic dimensions, and determining the M sets of sub-vectors as vector representations of the data.
In an exemplary embodiment, searching vectors with the matching degree with the query vector being greater than a preset threshold value from a target vector library to obtain a first vector set, wherein searching vectors matched with N semantic dimensions from the target vector library to obtain an initial vector set, and determining a vector set with the matching degree with the query vector being greater than the preset threshold value from the initial vector set as the first vector set.
In an exemplary embodiment, performing a sorting operation on the vectors included in the first vector set to obtain a target vector set includes performing a sorting operation on the vectors included in the first vector set according to a matching degree between the vectors included in the first vector set and the query vector to obtain a second vector set, where the sorting operation includes selecting the sorting operation or interpolating the sorting operation, and performing a filtering operation on the vectors in the second vector set to obtain the target vector set, where a filtering condition of the filtering operation includes at least one of vector dimension, vector size, vector direction, vector attribute, and association between vectors.
In an exemplary embodiment, searching for target data matched with the query request from the target database based on the target vector set includes converting a plurality of target vectors included in the target vector set into data in a target format to obtain a plurality of first data, wherein the target format is the same as a format of data to be queried in the query request, searching for data matched with the plurality of first data from the target database to obtain a data list, and performing the sorting operation on the data list based on a degree of correlation between the data included in the data list to obtain target data.
In an exemplary embodiment, after converting the plurality of target vectors included in the set of target vectors into the data in the target format to obtain the plurality of first data, the method further includes generating structural data according to an association relationship between the plurality of first data, where the structural data includes keywords of the plurality of first data and associations between the keywords of the plurality of first data, and displaying the structural data through a target client.
According to another embodiment of the application, a data query device is provided, which comprises an acquisition module, a conversion module and a sorting module, wherein the acquisition module is used for acquiring a query request, the query request is used for requesting query data from a target database, the conversion module is used for responding to the query request and converting the query request into vector representations to obtain query vectors, the query vectors are used for representing semantic information of the query request from N semantic dimensions, the N is a natural number greater than or equal to 1, the first search module is used for searching vectors with matching degree between the query vectors and the target vector library greater than a preset threshold value to obtain a first vector set, the vector representations of the data in the target database in M semantic dimensions are included in the target vector library, the M is a natural number greater than or equal to 1, the sorting module is used for executing sorting operation on the vectors included in the first vector set to obtain a target vector set, and the second search module is used for searching the target data matched with the query request from the target database based on the target vector set.
According to a further embodiment of the application, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.
According to a further embodiment of the present application, there is also provided a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a further embodiment of the application, there is also provided an electronic device comprising a memory having stored therein a computer program, and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
According to the application, the acquired query request is converted into vector representation, namely a query vector, then vectors matched with N semantic dimensions in the query vector are searched from a target vector library, then the vectors with the matching degree higher than a preset threshold value with the query vector are further screened out to obtain a first vector set, the first vector set is ordered according to the matching degree to obtain a second vector set, the vectors in the second vector set are filtered, finally the target vector set is determined, the vectors in the target vector set are finally converted into target format data with the same format as the query request, a plurality of first data are formed, and the content corresponding to the first data is searched in the target database to obtain target data matched with the query request. By introducing the calculation of the vector similarity, the problem of low data query accuracy in the related technology can be solved, and the effect of improving the data query accuracy is achieved.
Drawings
FIG. 1 is a schematic diagram of a hardware environment of a data query method according to an embodiment of the present application;
FIG. 2 is a flow chart of a data query method according to an embodiment of the application;
FIG. 3 is a schematic diagram of a process for constructing a target vector library according to an embodiment of the present application;
FIG. 4 is a flow chart of a data query method according to an embodiment of the application;
Fig. 5 is a block diagram of a data query device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
The method embodiments provided in the embodiments of the present application may be executed in a server apparatus or similar computing device. Taking the operation on a server device as an example, fig. 1 is a schematic diagram of a hardware environment of a data query method according to an embodiment of the present application. As shown in fig. 1, the server device may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU, a programmable logic device FPGA, or the like processing means) and a memory 104 for storing data, wherein the server device may further include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those of ordinary skill in the art that the architecture shown in fig. 1 is merely illustrative and is not intended to limit the architecture of the server apparatus described above. For example, the server device may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a data query method in an embodiment of the present application, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, implement the method described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located with respect to the processor 102, which may be connected to the server device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of a server device. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
In this embodiment, a data query method is provided, fig. 2 is a flowchart of a data query method according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:
Step S202, acquiring a query request, wherein the query request is used for requesting to query data from a target database;
alternatively, the query request in this embodiment refers to a search command sent by the user to the system through natural language or specific query language, and is specific information or a question that the user wants to obtain from the target database, including but not limited to keywords, phrases, question forms or complex query logic.
Step S204, responding to the query request, converting the query request into vector representation to obtain a query vector, wherein the query vector is used for representing semantic information of the query request from N semantic dimensions, and N is a natural number greater than or equal to 1;
Optionally, the conversion of the query request into a vector representation in the present embodiment involves natural language processing (Natural Language Processing, abbreviated as NLP) techniques, including word embedding, sentence vector, or semantic vector generation.
Optionally, the semantic dimension in the embodiment includes, but is not limited to, technical field, research direction and application scene.
Optionally, the query vector in this embodiment can query semantic information contained in the request, including but not limited to meaning of keywords, context, user intent, and the like.
Step S206, searching vectors with matching degree between the target vector library and the query vector being greater than a preset threshold value to obtain a first vector set, wherein the target vector library comprises vector representations of data in the target database in M semantic dimensions, and M is a natural number greater than or equal to 1;
Optionally, the matching degree in this embodiment is a similarity measure between the query vector and the vectors in the target vector library, and may be obtained by cosine similarity, euclidean distance, or other vector similarity calculation.
Optionally, the preset threshold in this embodiment is a set similarity criterion, and the matching is considered successful only if the similarity between the query vector and the vectors in the library is higher than this threshold.
Step S208, executing a sorting operation on vectors included in the first vector set to obtain a target vector set;
optionally, the sorting operation in this embodiment is a process of sorting the vectors in the first vector set according to the matching degree with the query vector, where the vector with high matching degree is ranked in front, including but not limited to selecting sorting, interpolation sorting.
Step S210, searching target data matched with the query request from the target database based on the target vector set.
Optionally, the target data in this embodiment refers to a document or an information item that is found from the target database according to the target vector set and that matches the query request, and involves reversely converting the vector representation into text content, or directly using metadata of the vector to locate and retrieve specific data in the target database.
Through the steps, the obtained query request is converted into vector representation to obtain a query vector, then a vector with the matching degree larger than a threshold value is searched from a target vector library to obtain a first vector set, the first vector set is subjected to sorting operation to obtain a target vector set, and finally target data is obtained based on the target vector set. The semantic information of the query request is captured through vector representation for retrieval, so that the problem of low data query accuracy in the related technology is solved, and the effect of improving the data query accuracy is achieved.
In an exemplary embodiment, before searching the vector with the matching degree between the query vector and the target vector library to obtain the first vector set, the method further comprises determining a vector representation of each data according to the target database when the target database comprises a plurality of data, wherein the target vector library is obtained by extracting keywords and key phrases of the data, converting the keywords and the key phrases into vector representations to obtain a plurality of key vectors, and constructing the vector representation of the data based on the plurality of key vectors.
Optionally, the target database in this embodiment is used to store data including, but not limited to, academic papers, patent documents, technical reports, news articles, social media posts, and the like.
Optionally, the keywords and key phrases in this embodiment are identified and extracted from the text content of each data by natural language processing techniques, and words or phrases of significant importance, including but not limited to, core topics, concepts or entities that can reflect the data. For example, for a research paper on machine learning, keywords may include "algorithm", "dataset", "model", "deep learning", etc., while key phrases may include "neural network optimization", "overfitting problem", etc.
Alternatively, in this embodiment, the conversion of keywords and key phrases to vector representations may be performed using pre-trained deep learning models (e.g., BERT, roBERTa, etc.) to encode the keywords and key phrases, converting them to low-dimensional dense vectors.
Alternatively, the target vector library in this embodiment includes, but is not limited to, being stored in a tree structure, a graph structure, and a table.
Through the steps, for each data, the keywords and key phrases thereof are extracted, the keywords and key phrases are converted into key vectors, and then vector representations of the data are constructed based on the key vectors. The construction process of the key vector ensures that the vector representation can reflect key information of data, and a target vector library is constructed based on the key vector representation, so that a solid foundation is provided for subsequent high-precision knowledge search, and the accuracy of subsequent retrieval is improved.
In an exemplary embodiment, constructing a vector representation of the data based on the plurality of key vectors includes determining M of the semantic dimensions from a data type and a data source of the data in the target database, dividing the plurality of key vectors of the data into M sets of sub-vectors according to the M of the semantic dimensions, and determining the M sets of sub-vectors as vector representations of the data.
Alternatively, the data type in this embodiment represents a specific form or category of data in the target database, including but not limited to academic papers, patent documents, news articles, book chapters, and the like.
Alternatively, the data sources in this embodiment are different publishers, institutions, or platforms from which the data entries come, including but not limited to academic papers from specific universities, patent literature from authoritative patent offices, stories for well-known news media, and the like.
Optionally, determining M semantic dimensions in the present embodiment includes, but is not limited to, being identified by way of cluster analysis, topic model, and the like.
For example, fig. 3 is a schematic diagram of a process for constructing a target vector library, as shown in fig. 3, and the process includes the following steps:
Step S302, extracting keywords and key phrases of knowledge information in a target database, wherein the target database comprises a plurality of knowledge information, such as papers and articles, and the keywords and the key phrases can be extracted by natural language processing, keyword extraction and other technologies;
Step S304, converting the key words and key phrases into item vectors (corresponding to the key vectors) through a preset language model, wherein the preset language model can be BERT or RoBERTa;
step S306, determining semantic dimensions from data types and data sources of data in a target database, wherein the effect of determining the semantic dimensions can be realized by utilizing cluster analysis and a topic model;
step S308, dividing the item vector into N sub-vectors according to the semantic dimension;
step S310, storing the sub-vectors in the index nodes to obtain a target vector library stored in the form of a tree structure.
Through the steps, M semantic dimensions are determined, key vectors are divided into sub-vectors according to the dimensions, and then the sub-vectors are determined to be vector representations of data. Through dimension division, multiple semantic features of the data can be identified and processed, and subsequent data query is facilitated.
In an exemplary embodiment, searching vectors with the matching degree with the query vector being greater than a preset threshold value from a target vector library to obtain a first vector set, wherein searching vectors matched with N semantic dimensions from the target vector library to obtain an initial vector set, and determining a vector set with the matching degree with the query vector being greater than the preset threshold value from the initial vector set as the first vector set.
Optionally, the matching degree in this embodiment is a similarity measure between the query vector and the vectors in the target vector library, and may be obtained by cosine similarity, euclidean distance, or other vector similarity calculation.
Optionally, the preset threshold in this embodiment is a set similarity criterion, and the matching is considered successful only if the similarity between the query vector and the vectors in the library is higher than this threshold.
Through the steps, the vectors matched with the N semantic dimensions are searched to obtain an initial vector set, and then the vectors with the matching degree larger than the threshold value are further screened to be the first vector set, so that irrelevant results can be effectively filtered, and the retrieval efficiency and accuracy are improved.
In an exemplary embodiment, performing a sorting operation on the vectors included in the first vector set to obtain a target vector set includes performing a sorting operation on the vectors included in the first vector set according to a matching degree between the vectors included in the first vector set and the query vector to obtain a second vector set, where the sorting operation includes selecting the sorting operation or interpolating the sorting operation, and performing a filtering operation on the vectors in the second vector set to obtain the target vector set, where a filtering condition of the filtering operation includes at least one of vector dimension, vector size, vector direction, vector attribute, and association between vectors.
Alternatively, the sorting operation in this embodiment refers to a process of rearranging vectors according to the matching degree of the vectors in the first vector set and the query vector, and sorting may be performed by using different algorithms, including but not limited to selecting sorting, inserting sorting, fast sorting, and merging sorting.
Optionally, the filtering operation in this embodiment is to further screen a vector set that better meets the user requirement according to a certain filtering condition on the basis of the second vector set.
Optionally, the filtering conditions in this embodiment may be set according to specific application scenarios and user requirements, including but not limited to vector dimensions, vector sizes, vector directions, vector attributes, associations between vectors, and the like.
Optionally, the vector attributes in this embodiment include, but are not limited to, source, type, time, etc. For example, data vectors from unreliable sources or that do not match the user's desired time frame are filtered out.
Through the steps, the sorting operation is executed on the first vector set based on the matching degree, then the filtering operation is executed to obtain the target vector set, the sorting and filtering operation further optimizes the retrieval result, and the fact that the finally output data set has higher matching degree with the query request is ensured.
In an exemplary embodiment, searching for target data matched with the query request from the target database based on the target vector set includes converting a plurality of target vectors included in the target vector set into data in a target format to obtain a plurality of first data, wherein the target format is the same as a format of data to be queried in the query request, searching for data matched with the plurality of first data from the target database to obtain a data list, and performing the sorting operation on the data list based on a degree of correlation between the data included in the data list to obtain target data.
Optionally, the target format in this embodiment refers to the original format of the data to be queried in the query request, including but not limited to text, HTML, PDF, word documents, etc.
Alternatively, in this embodiment, converting the vectors in the target vector set into the data in the target format refers to initially recovering or reconstructing the original data information or content corresponding to the vectors.
Alternatively, in this embodiment, searching for data matching with the plurality of first data from the target database means that data corresponding to the first data in the original format content is found in the database.
Optionally, the association degree in this embodiment is calculated by matching keywords between the binding vectors and semantic association.
Through the steps, the target vector set is converted into the data in the target format, then the data matched with the data in the target format is searched from the database to obtain the target data, so that the original data corresponding to the target vector set can be obtained according to the vector reverse direction, and the target data corresponding to the query request can be further obtained.
In an exemplary embodiment, after converting the plurality of target vectors included in the set of target vectors into the data in the target format to obtain the plurality of first data, the method further includes generating structural data according to an association relationship between the plurality of first data, where the structural data includes keywords of the plurality of first data and associations between the keywords of the plurality of first data, and displaying the structural data through a target client.
Optionally, the association relationship in this embodiment refers to a relationship between the plurality of first data in terms of semantics, theme, time, geographic location, and the like. For example, when a plurality of first data all relate to a topic of "artificial intelligence," there is a topic association between them, and when they refer to the same research effort or data source, there is a reference relationship association.
By way of a specific example, the above method is described in the context of high-precision knowledge search of academic paper databases, FIG. 4 is a schematic flow chart of a data query method, as shown in FIG. 4, the flow chart comprising the steps of:
Step S402, converting the query request input by the user into a vector representation through natural language processing to obtain a query vector, for example, the query request input by the user is "deep learning application in natural language processing in 2010-2020", the query request is converted into a query vector, the query vector reflects semantic information of the query request on N semantic dimensions, and N can be 5 (for example, technical field, research method, application scene, time range, author information);
Step S404, searching an item vector with the matching degree between the item vector and the query vector being greater than a preset threshold value from a target vector library by using a preset model, wherein the preset model can be a model based on an approximate nearest neighbor search (Approximate Nearest Neighbor, ANN) algorithm or a BERT model search to obtain the first vector set, and if the BERT model is used, the query vector can be a 768-dimensional or 1024-dimensional dense vector;
Step S406, calculating the matching degree between the vectors in the first vector set and the query vector, calculating the matching degree by using cosine similarity or Euclidean distance and other modes, and executing sorting operation on the vectors included in the first vector set according to the matching degree to obtain a second vector set;
step S408, a filtering operation is performed on the second vector set to obtain a target vector set, and the vectors in the target vector set are ranked again based on the matching degree, for example, the vectors with release time not in 2010-2020 are filtered to obtain the target vector set, and then the ranking is performed again according to the matching degree of the vectors in the target vector set obtained after the filtering and the query vector;
Step S410, converting vectors in the target vector set into text data corresponding to the vectors, and searching data corresponding to the text data from a target database to obtain a data list;
Step S412, sorting again according to the matching degree of the vector in the data list and the keywords 'deep learning' and 'natural language processing', obtaining target data and displaying.
It should be noted that, from the description of the above embodiments, those skilled in the art will clearly understand that the method according to the above embodiments may be implemented by software plus a necessary general hardware platform, and of course may also be implemented by hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiment also provides a data query device, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 5 is a block diagram of a data query device according to an embodiment of the present application, as shown in fig. 5, the device includes:
an obtaining module 502, configured to obtain a query request, where the query request is used to request to query data from a target database;
A conversion module 504, configured to respond to the query request, convert the query request into a vector representation, and obtain a query vector, where the query vector is used to represent semantic information of the query request from N semantic dimensions, where N is a natural number greater than or equal to 1;
A first searching module 506, configured to search a target vector library for a vector having a matching degree with the query vector greater than a preset threshold value, to obtain a first vector set, where the target vector library includes vector representations of data in the target database in M semantic dimensions, where M is a natural number greater than or equal to 1;
A sorting module 508, configured to perform a sorting operation on vectors included in the first vector set to obtain a target vector set;
A second searching module 510, configured to search, based on the set of target vectors, target data matching the query request from the target database.
In an exemplary embodiment, the first lookup module 506 further includes, for each of the plurality of data included in the target database, determining a vector representation of each of the data by a first extraction unit configured to extract a keyword and a key phrase of the data, a first conversion unit configured to convert each of the keyword and the key phrase into a vector representation to obtain a plurality of key vectors, and a first construction module configured to construct a vector representation of the data based on the plurality of key vectors.
In an exemplary embodiment, the first lookup module 506 further includes a first determining unit configured to determine M semantic dimensions from a data type and a data source of the data in the target database, a first dividing unit configured to divide the plurality of key vectors of the data into M sets of sub-vectors according to the M semantic dimensions, and a second determining unit configured to determine the M sets of sub-vectors as vector representations of the data.
In an exemplary embodiment, the first searching module 506 further includes a first searching unit configured to search vectors matching N semantic dimensions from the target vector library to obtain an initial vector set, and a third determining unit configured to determine a vector set, which has a matching degree with the query vector greater than a preset threshold, in the initial vector set as the first vector set.
In an exemplary embodiment, the first lookup module 506 further includes a first sorting unit configured to perform a sorting operation on the vectors included in the first vector set according to a matching degree between the vectors included in the first vector set and the query vector to obtain a second vector set, where the sorting operation includes a selection sorting operation or an interpolation sorting operation, and a first filtering unit configured to perform a filtering operation on the vectors in the second vector set to obtain the target vector set, where a filtering condition of the filtering operation includes at least one of a vector dimension, a vector size, a vector direction, a vector attribute, and an association between vectors.
In an exemplary embodiment, the second lookup module 510 further includes a second conversion unit configured to convert a plurality of target vectors included in the set of target vectors into data in a target format to obtain a plurality of first data, where the target format is the same as a format of data to be queried in the query request, a second lookup unit configured to lookup data matching the plurality of first data from the target database to obtain a data list, and perform the sorting operation on the data list based on a degree of association between the data included in the data list to obtain target data.
In an exemplary embodiment, the second search module 510 further includes a first generating unit configured to generate structural data according to an association relationship between a plurality of the first data, where the structural data includes a plurality of keywords of the first data and an association between a plurality of keywords of the first data, and a first displaying unit configured to display the structural data through a target client.
It should be noted that each of the above modules may be implemented by software or hardware, and the latter may be implemented by, but not limited to, the above modules all being located in the same processor, or each of the above modules being located in different processors in any combination.
Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
In an exemplary embodiment, the computer readable storage medium may include, but is not limited to, a U disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, etc. various media in which a computer program may be stored.
An embodiment of the application also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
In an exemplary embodiment, the electronic device may further include a transmission device connected to the processor, and an input/output device connected to the processor.
Embodiments of the application also provide a computer program product comprising a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.
Embodiments of the present application also provide another computer program product comprising a non-volatile computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.
Embodiments of the present application also provide a computer program comprising computer instructions stored in a computer-readable storage medium, a processor of a computer device reading the computer instructions from the computer-readable storage medium, the computer instructions being executable by a burial device to cause the computer device to perform the steps of any of the method embodiments described above.
Specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the exemplary implementation, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present application should be included in the protection scope of the present application.

Claims (10)

1.一种数据查询方法,其特征在于,所述方法包括:1. A data query method, characterized in that the method comprises: 获取查询请求,其中,所述查询请求用于请求从目标数据库中查询数据;Obtaining a query request, wherein the query request is used to request to query data from a target database; 响应所述查询请求,将所述查询请求转换为向量表示,得到查询向量,其中,所述查询向量用于从N个语义维度表示所述查询请求的语义信息,所述N是大于或等于1的自然数;In response to the query request, convert the query request into a vector representation to obtain a query vector, wherein the query vector is used to represent semantic information of the query request from N semantic dimensions, where N is a natural number greater than or equal to 1; 从目标向量库中查找与所述查询向量之间的匹配度大于预设阈值的向量,得到第一向量集合,其中,所述目标向量库中包括所述目标数据库中的数据在M个所述语义维度的向量表示,所述M是大于或等于1的自然数;Searching for vectors whose matching degree with the query vector is greater than a preset threshold from a target vector library to obtain a first vector set, wherein the target vector library includes vector representations of the data in the target database in M semantic dimensions, where M is a natural number greater than or equal to 1; 对所述第一向量集合中包括的向量执行排序操作,得到目标向量集合;Performing a sorting operation on the vectors included in the first vector set to obtain a target vector set; 基于所述目标向量集合从所述目标数据库中查找与所述查询请求匹配的目标数据。The target data matching the query request is searched from the target database based on the target vector set. 2.根据权利要求1所述的方法,其特征在于,在从目标向量库中查找与所述查询向量之间的匹配度大于预设阈值的向量,得到第一向量集合之前,所述方法还包括:2. The method according to claim 1, characterized in that before searching the target vector library for vectors whose matching degree with the query vector is greater than a preset threshold to obtain the first vector set, the method further comprises: 在所述目标数据库中包括多个所述数据的情况下,针对每个所述数据,均通过以下步骤确定每个所述数据的向量表示,得到所述目标向量库:In the case where the target database includes a plurality of the data, for each of the data, the vector representation of each of the data is determined by the following steps to obtain the target vector library: 提取所述数据的关键词和关键短语;extracting keywords and key phrases from said data; 将所述关键词和所述关键短语均转换为向量表示,得到多个关键向量;Convert the keywords and key phrases into vector representations to obtain multiple key vectors; 基于多个所述关键向量构建所述数据的向量表示。A vector representation of the data is constructed based on a plurality of the key vectors. 3.根据权利要求2所述的方法,其特征在于,基于多个所述关键向量构建所述数据的向量表示,包括:3. The method according to claim 2, characterized in that constructing a vector representation of the data based on a plurality of the key vectors comprises: 从所述目标数据库中数据的数据类型和数据来源中确定M个所述语义维度;Determine M semantic dimensions from the data types and data sources of the data in the target database; 按照M个所述语义维度,将所述数据的多个所述关键向量划分为M组子向量;According to the M semantic dimensions, the multiple key vectors of the data are divided into M groups of sub-vectors; 将M组所述子向量确定为所述数据的向量表示。M groups of the sub-vectors are determined as vector representations of the data. 4.根据权利要求1所述的方法,其特征在于,从目标向量库中查找与所述查询向量之间的匹配度大于预设阈值的向量,得到第一向量集合,包括:4. The method according to claim 1, characterized in that searching the target vector library for vectors whose matching degree with the query vector is greater than a preset threshold to obtain the first vector set comprises: 从所述目标向量库中查找与N个所述语义维度匹配的向量,得到初始向量集合;Searching for vectors matching the N semantic dimensions from the target vector library to obtain an initial vector set; 将所述初始向量集合中与所述查询向量之间的匹配度大于预设阈值的向量集合确定为所述第一向量集合。A vector set in the initial vector set whose matching degree with the query vector is greater than a preset threshold is determined as the first vector set. 5.根据权利要求4所述的方法,其特征在于,对所述第一向量集合中包括的向量执行排序操作,得到目标向量集合,包括:5. The method according to claim 4, characterized in that performing a sorting operation on the vectors included in the first vector set to obtain a target vector set comprises: 按照所述第一向量集合中包括的向量与所述查询向量之间的匹配度对所述第一向量集合中包括的向量执行排序操作,得到第二向量集合,其中,所述排序操作包括选择排序操作,或者插值排序操作;Performing a sorting operation on the vectors included in the first vector set according to the matching degree between the vectors included in the first vector set and the query vector to obtain a second vector set, wherein the sorting operation includes a selection sorting operation or an interpolation sorting operation; 对所述第二向量集合中的向量执行过滤操作,得到所述目标向量集合,其中,所述过滤操作的过滤条件包括以下至少之一:向量维度,向量大小,向量方向,向量属性,向量之间的关联。Perform a filtering operation on the vectors in the second vector set to obtain the target vector set, wherein the filtering condition of the filtering operation includes at least one of the following: vector dimension, vector size, vector direction, vector attribute, and association between vectors. 6.根据权利要求1所述的方法,其特征在于,基于所述目标向量集合从所述目标数据库中查找与所述查询请求匹配的目标数据,包括:6. The method according to claim 1, characterized in that searching the target data matching the query request from the target database based on the target vector set comprises: 将所述目标向量集合中包括的多个目标向量转换为目标格式的数据,得到多个第一数据,其中,所述目标格式是与所述查询请求中的待查询数据的格式相同;Converting a plurality of target vectors included in the target vector set into data in a target format to obtain a plurality of first data, wherein the target format is the same as the format of the data to be queried in the query request; 从所述目标数据库中查找与多个所述第一数据匹配的数据,得到数据列表;Searching the target database for data matching the plurality of first data to obtain a data list; 基于所述数据列表中包括的数据之间的关联度,对所述数据列表执行所述排序操作,得到目标数据。Based on the association degree between the data included in the data list, the sorting operation is performed on the data list to obtain target data. 7.根据权利要求6所述的方法,其特征在于,将所述目标向量集合中包括的多个目标向量转换为目标格式的数据,得到多个第一数据之后,所述方法还包括:7. The method according to claim 6, characterized in that after converting the multiple target vectors included in the target vector set into data in a target format to obtain multiple first data, the method further comprises: 按照多个所述第一数据之间的关联关系生成结构数据,其中,所述结构数据中包括多个所述第一数据的关键词,以及多个所述第一数据的关键词之间的关联;generating structure data according to associations between the plurality of first data, wherein the structure data includes keywords of the plurality of first data and associations between the keywords of the plurality of first data; 通过目标客户端展示所述结构数据。The structure data is presented via a target client. 8.一种数据查询装置,其特征在于,包括:8. A data query device, comprising: 获取模块,用于获取查询请求,其中,所述查询请求用于请求从目标数据库中查询数据;An acquisition module, used to acquire a query request, wherein the query request is used to request to query data from a target database; 转换模块,用于响应所述查询请求,将所述查询请求转换为向量表示,得到查询向量,其中,所述查询向量用于从N个语义维度表示所述查询请求的语义信息,所述N是大于或等于1的自然数;a conversion module, configured to respond to the query request, convert the query request into a vector representation, and obtain a query vector, wherein the query vector is used to represent the semantic information of the query request from N semantic dimensions, where N is a natural number greater than or equal to 1; 第一查找模块,用于从目标向量库中查找与所述查询向量之间的匹配度大于预设阈值的向量,得到第一向量集合,其中,所述目标向量库中包括所述目标数据库中的数据在M个所述语义维度的向量表示,所述M是大于或等于1的自然数;A first search module is used to search a vector whose matching degree with the query vector is greater than a preset threshold from a target vector library to obtain a first vector set, wherein the target vector library includes vector representations of the data in the target database in M semantic dimensions, where M is a natural number greater than or equal to 1; 排序模块,用于对所述第一向量集合中包括的向量执行排序操作,得到目标向量集合;A sorting module, configured to perform a sorting operation on the vectors included in the first vector set to obtain a target vector set; 第二查找模块,用于基于所述目标向量集合从所述目标数据库中查找与所述查询请求匹配的目标数据。The second search module is configured to search the target database for target data matching the query request based on the target vector set. 9.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,其中,所述计算机程序被处理器执行时实现所述权利要求1至7任一项中所述的方法的步骤。9. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, wherein the computer program implements the steps of the method described in any one of claims 1 to 7 when executed by a processor. 10.一种电子设备,包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现所述权利要求1至7任一项中所述的方法的步骤。10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method described in any one of claims 1 to 7 when executing the computer program.
CN202411763921.3A 2024-12-03 2024-12-03 Data query method and device, storage medium and electronic equipment Pending CN119829730A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411763921.3A CN119829730A (en) 2024-12-03 2024-12-03 Data query method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411763921.3A CN119829730A (en) 2024-12-03 2024-12-03 Data query method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN119829730A true CN119829730A (en) 2025-04-15

Family

ID=95307114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411763921.3A Pending CN119829730A (en) 2024-12-03 2024-12-03 Data query method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN119829730A (en)

Similar Documents

Publication Publication Date Title
CN111753060B (en) Information retrieval method, apparatus, device and computer readable storage medium
US20220261427A1 (en) Methods and system for semantic search in large databases
US10289717B2 (en) Semantic search apparatus and method using mobile terminal
US10482146B2 (en) Systems and methods for automatic customization of content filtering
CN109829104A (en) Pseudo-linear filter model information search method and system based on semantic similarity
CN111190997A (en) A Question Answering System Implementation Method Using Neural Networks and Machine Learning Sorting Algorithms
CN113806588B (en) Method and device for searching videos
CN113515589B (en) Data recommendation method, device, equipment and medium
US20250086215A1 (en) Large language model-based information retrieval for large datasets
CN113505196A (en) Part-of-speech-based text retrieval method and device, electronic equipment and storage medium
CN110727769A (en) Corpus generation method and device, and man-machine interaction processing method and device
CN111859079B (en) Information search method, device, computer equipment and storage medium
CN110674087A (en) File query method, device and computer-readable storage medium
CN118132791A (en) Image retrieval method, device, equipment, readable storage medium and product
CN111752922A (en) Method and device for establishing knowledge database and realizing knowledge query
CN112988952B (en) Multi-level-length text vector retrieval method and device and electronic equipment
CN117390169A (en) Form data question-answering method, device, equipment and storage medium
CN118245568A (en) Question and answer method and device based on large model, electronic equipment and storage medium
CN119357366B (en) Large model retrieval method, device, equipment and storage medium based on priori atlas
CN112347289B (en) Image management method and terminal
CN115293127A (en) Contract document information comparison method, device and system
CN115270777A (en) A method, device and system for extracting contract document information
CN114385777A (en) Text data processing method and device, computer equipment and storage medium
CN118503381A (en) Method and system for searching and generating combined strong language dialogue
CN119829730A (en) Data query method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination