[go: up one dir, main page]

CN113076447A - Data retrieval method and device - Google Patents

Data retrieval method and device Download PDF

Info

Publication number
CN113076447A
CN113076447A CN202110291851.6A CN202110291851A CN113076447A CN 113076447 A CN113076447 A CN 113076447A CN 202110291851 A CN202110291851 A CN 202110291851A CN 113076447 A CN113076447 A CN 113076447A
Authority
CN
China
Prior art keywords
node
retrieval
nodes
feature vector
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110291851.6A
Other languages
Chinese (zh)
Other versions
CN113076447B (en
Inventor
高彬
施文祥
肖智鹏
李洁
王兴达
倪振云
占超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Wodong Tianjun Information Technology Co Ltd
Priority to CN202110291851.6A priority Critical patent/CN113076447B/en
Publication of CN113076447A publication Critical patent/CN113076447A/en
Application granted granted Critical
Publication of CN113076447B publication Critical patent/CN113076447B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data retrieval method and device, and relates to the technical field of computers. One embodiment of the method comprises: receiving retrieval data, and performing feature processing on the retrieval data to obtain retrieval feature vectors corresponding to the retrieval data; determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; and sequencing the similarity between each target node in the target node set and the retrieval feature vector, and determining a retrieval result according to the sequencing result. The embodiment reduces the calculated amount, reduces the occupied space of the memory, improves the retrieval accuracy, improves the retrieval efficiency and expands the retrieval application scene.

Description

Data retrieval method and device
Technical Field
The invention relates to the technical field of computers, in particular to a data retrieval method and device.
Background
Vectorized data retrieval can be divided into accurate retrieval and approximate retrieval, the essence of the accurate retrieval is linear search, and the linear search finds vector information which is most matched with a target vector by traversing all vectors in the whole vector space; the approximate retrieval is to convert the original search in the whole high-dimensional vector space into the search in a small-range space or a relatively low-dimensional space by means of clustering, dimension reduction or coding and the like.
The prior art has at least the following problems:
the existing vectorization data retrieval method has the technical problems of large calculated amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data retrieval method and apparatus, which can reduce the amount of computation, reduce the occupied space of a memory, improve the retrieval accuracy, improve the retrieval efficiency, and expand the retrieval application scenarios.
To achieve the above object, according to a first aspect of embodiments of the present invention, there is provided a data retrieval method, including:
receiving retrieval data, and performing feature processing on the retrieval data to obtain retrieval feature vectors corresponding to the retrieval data;
determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes;
and sequencing the similarity between each target node in the target node set and the retrieval feature vector, and determining a retrieval result according to the sequencing result.
Further, the step of constructing the topological graph index according to the included angle threshold, the second quantity threshold and the similarity among the plurality of feature vector nodes includes:
the method comprises the steps of obtaining a plurality of storage data, carrying out feature extraction on the plurality of storage data to obtain a plurality of feature vector nodes corresponding to the plurality of storage data, and determining the similarity among the plurality of feature vector nodes;
determining a candidate node corresponding to each eigenvector node according to the included angle threshold, the second quantity threshold and the similarity among the plurality of eigenvector nodes; determining a connection graph between each feature vector node and the corresponding candidate node; the included angle threshold value indicates a threshold value corresponding to an included angle formed by each feature vector node and a connecting line between any two feature vector nodes;
and constructing a topological graph index according to the connection graph corresponding to each feature vector node.
Further, determining a candidate node corresponding to each feature vector node according to the included angle threshold, the second quantity threshold and the similarity between the plurality of feature vector nodes, and further comprising:
determining a plurality of levels of adjacent nodes corresponding to each feature vector node according to the similarity among the plurality of feature vector nodes;
and determining candidate nodes corresponding to each feature vector node from the multi-level adjacent nodes according to the included angle threshold and the second quantity threshold.
Further, determining similarity between a plurality of feature vector nodes further comprises:
and clustering the plurality of feature vector nodes, and determining the similarity among the plurality of feature vector nodes according to the clustering result.
Further, receiving node operation information, and determining a node operation type corresponding to the node operation information;
if the node operation type is an added node, determining a connection graph between the added node and the corresponding candidate node, and updating the topological graph index;
if the node operation type is a deleted node, updating the deleted node in the topological graph index into a pseudo node; wherein the dummy node is not placed in the set of target nodes;
and if the node operation type is a modified node, setting the node before modification as a deleted node, setting the modified node as a newly added node, and updating the topological graph index.
Further, querying the topological graph index from the navigation node, and determining the target node set according to the first quantity threshold and the similarity between the query node queried by the navigation node and the retrieval feature vector further includes:
constructing a target node set;
taking the navigation node as a starting point, inquiring the topological graph index according to a greedy search algorithm, and placing the inquired inquiry node in a target node set; and updating the target nodes in the target node set according to the first quantity threshold and the similarity between the inquired query node and the retrieval feature vector until the sum of the similarities between each target node in the target node set and the retrieval feature vector is not increased any more.
Further, the number of the navigation nodes is at least one, and at least one navigation node is communicated with all the feature vector nodes in the topological graph index.
According to a second aspect of the embodiments of the present invention, there is provided a data retrieval apparatus including:
the receiving module is used for receiving the retrieval data and performing feature processing on the retrieval data to obtain retrieval feature vectors corresponding to the retrieval data;
the target node set determining module is used for determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to the first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes;
and the sequencing module is used for sequencing the similarity between each target node in the target node set and the retrieval characteristic vector and determining a retrieval result according to the sequencing result.
According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including:
one or more processors;
a storage device for storing one or more programs,
when executed by one or more processors, cause the one or more processors to implement any of the data retrieval methods described above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable medium, on which a computer program is stored, which when executed by a processor, implements any one of the data retrieval methods described above.
One embodiment of the above invention has the following advantages or benefits: because the retrieval data is received, the retrieval data is subjected to feature processing to obtain retrieval feature vectors corresponding to the retrieval data; determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; the similarity between each target node in the target node set and the retrieval feature vector is sequenced, and the retrieval result is determined according to the sequencing result, so that the technical problems of large calculation amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes in the existing vectorized data retrieval method are solved, and the technical effects of reducing the calculation amount, reducing the occupied space of the memory, improving the retrieval accuracy, improving the retrieval efficiency and expanding the applicable scenes for retrieval are achieved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of a main flow of a data retrieval method according to a first embodiment of the present invention;
FIG. 2a is a schematic diagram of a main flow of a data retrieval method according to a second embodiment of the present invention;
FIG. 2b is a schematic diagram of a main flow of updating the topological graph index in the method of FIG. 2 a;
FIG. 2c is a schematic diagram of a data retrieval process performed in the method of FIG. 2 a;
FIG. 3 is a schematic diagram of the main modules of a data retrieval device provided in accordance with an embodiment of the present invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of a data retrieval method according to a first embodiment of the present invention; as shown in fig. 1, the data retrieval method provided by the embodiment of the present invention mainly includes:
and step S101, receiving the search data, and performing characteristic processing on the search data to obtain a search characteristic vector corresponding to the search data.
Specifically, the search data is user input data, and the search device determines search result data that is the same as or similar to the input data according to the input data.
Any one of the existing methods can be adopted in the process of performing feature processing on the retrieval data to obtain the corresponding feature vector. Through the arrangement, the search result can be determined according to the similarity between the search characteristic vector corresponding to the search data and the characteristic vector corresponding to the stored data, and the search efficiency and the search accuracy are improved.
Step S102, navigation nodes are determined from the topological graph index, the topological graph index is inquired from the navigation nodes, and a target node set is determined according to a first quantity threshold value and the similarity between the inquired nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold value, a second quantity threshold value and the similarity among the plurality of characteristic vector nodes.
Specifically, according to the embodiment of the present invention, the similarity between any two feature vectors may be measured by using the existing methods such as cosine angle, inner product, hamming distance, euclidean distance, and the like.
Through the arrangement, the topological graph index is inquired according to the retrieval characteristic vector corresponding to the retrieval data, and then the target node is determined according to the number (the first number threshold) of the target nodes and the similarity, so that the technical problems of large calculation amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes in the conventional vectorized data retrieval method are solved, and further the purposes of reducing the calculation amount, reducing the occupied space of the memory, improving the retrieval accuracy, improving the retrieval efficiency and expanding the retrieval applicable scenes are achieved.
Specifically, according to the embodiment of the present invention, the step of constructing the topological graph index according to the included angle threshold, the second quantity threshold, and the similarity between the plurality of feature vector nodes includes:
the method comprises the steps of obtaining a plurality of storage data, carrying out feature extraction on the plurality of storage data to obtain a plurality of feature vector nodes corresponding to the plurality of storage data, and determining the similarity among the plurality of feature vector nodes;
determining a candidate node corresponding to each eigenvector node according to the included angle threshold, the second quantity threshold and the similarity among the plurality of eigenvector nodes; determining a connection graph between each feature vector node and the corresponding candidate node; the included angle threshold value indicates a threshold value corresponding to an included angle formed by each feature vector node and a connecting line between any two feature vector nodes;
and constructing a topological graph index according to the connection graph corresponding to each feature vector node.
By the arrangement, a topological graph index is constructed according to the stored data in the database; firstly, according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; the candidate nodes indicate nodes with higher similarity to the feature vector nodes, and in the process of determining the candidate nodes of each feature vector node, the nodes with higher similarity corresponding to each feature vector node are screened according to the second quantity threshold and the included angle threshold, so that the number of the nodes in the subsequent topological graph index is reduced, and the subsequent retrieval efficiency is improved.
Further, according to the embodiment of the present invention, the determining a candidate node corresponding to each feature vector node according to the included angle threshold, the second quantity threshold, and the similarity between the plurality of feature vector nodes further includes:
determining a plurality of levels of adjacent nodes corresponding to each feature vector node according to the similarity among the plurality of feature vector nodes;
and determining candidate nodes corresponding to each feature vector node from the multi-level adjacent nodes according to the included angle threshold and the second quantity threshold.
According to the embodiment of the invention, the candidate node is determined based on the multi-level adjacent nodes, and the selection idea of the multi-level adjacent nodes is as follows: firstly, determining a plurality of first adjacent nodes (namely nodes with higher similarity with the feature vector nodes) of a certain feature vector node; respectively determining second adjacent nodes corresponding to the plurality of first adjacent nodes; until a multi-level neighbor node is determined. Through the arrangement, the coverage range of the candidate nodes is ensured, and the retrieval efficiency is improved.
Exemplarily, according to the embodiment of the present invention, the determining the similarity between the plurality of feature vector nodes further includes:
and clustering the plurality of feature vector nodes, and determining the similarity among the plurality of feature vector nodes according to the clustering result.
According to a specific implementation manner of the embodiment of the present invention, a k-nearest neighbor method may be used to perform clustering processing on a plurality of feature vector nodes, and a KNN graph is determined, where the KNN graph indicates that the distance of each feature vector node may be used to represent the similarity between the plurality of feature vector nodes.
Preferably, according to an embodiment of the present invention, the method further includes:
receiving node operation information, and determining a node operation type corresponding to the node operation information;
if the node operation type is an added node, determining a connection graph between the added node and the corresponding candidate node, and updating the topological graph index;
if the node operation type is a deleted node, updating the deleted node in the topological graph index into a pseudo node; wherein the dummy node is not placed in the set of target nodes;
and if the node operation type is a modified node, setting the node before modification as a deleted node, setting the modified node as a newly added node, and updating the topological graph index.
Through the setting, scenes of adding, modifying and deleting the stored data are added; the original topological index map is only required to be updated, the index map does not need to be reconstructed, a large amount of expenses caused by reconstruction of the index due to change of stored data are reduced, and the efficiency of obtaining the updated topological map index is higher.
Further, according to an embodiment of the present invention, the querying the topological graph index from the navigation node, and determining the target node set according to the first number threshold and the similarity between the query node queried by the navigation node and the retrieval feature vector further includes:
constructing a target node set;
taking the navigation node as a starting point, inquiring the topological graph index according to a greedy search algorithm, and placing the inquired inquiry node in a target node set; and updating the target nodes in the target node set according to the first quantity threshold and the similarity between the inquired query node and the retrieval feature vector until the sum of the similarities between each target node in the target node set and the retrieval feature vector is not increased any more.
Specifically, the number of target nodes that can be placed in the set of target nodes is defined by a first number threshold. The replacement of existing target nodes in the target node set (that is, when target nodes placed in the target node set reach a first number threshold and then target nodes are newly added later, the original target nodes in the corresponding number need to be deleted), and the judgment needs to be performed according to the sum of the similarity between each target node in the target node set and the retrieval feature vector. Through the arrangement, the calculation amount is reduced, the retrieval accuracy is improved, and the retrieval efficiency is improved.
Specifically, according to the embodiment of the present invention, the number of the navigation nodes is at least one, and at least one navigation node connects all the feature vector nodes in the topological graph index.
According to a specific implementation manner of the embodiment of the invention, the navigation nodes can be randomly selected from the topological graph index, and can also be fixedly selected, the number of the navigation nodes can be one or more, and the condition that at least one selected navigation node can be communicated with all the feature vector nodes in the topological graph index is only required to be met, so that the navigation nodes can be ensured to inquire all the feature vector nodes in the topological graph index, and the retrieval efficiency is improved.
And S103, sequencing the similarity between each target node in the target node set and the retrieval feature vector, and determining a retrieval result according to the sequencing result.
Specifically, after the similarity between each target node in the target node set and the retrieval feature vector is sequenced, the stored data corresponding to one or more target nodes with higher similarity is determined as the retrieval result according to the retrieval number threshold.
According to the technical scheme of the embodiment of the invention, the retrieval data is received and subjected to feature processing to obtain retrieval feature vectors corresponding to the retrieval data; determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; the similarity between each target node in the target node set and the retrieval feature vector is sequenced, and the retrieval result is determined according to the sequencing result, so that the technical problems of large calculation amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes in the existing vectorized data retrieval method are solved, and the technical effects of reducing the calculation amount, reducing the occupied space of the memory, improving the retrieval accuracy, improving the retrieval efficiency and expanding the applicable scenes for retrieval are achieved.
FIG. 2a is a schematic diagram of a main flow of a data retrieval method according to a second embodiment of the present invention; as shown in fig. 2a, the data retrieval method provided by the embodiment of the present invention mainly includes:
step S201, obtaining a plurality of storage data, and performing feature extraction on the plurality of storage data to obtain a plurality of feature vector nodes corresponding to the plurality of storage data.
Specifically, any one of the existing methods may be adopted to perform feature processing on the mass storage data, and determine a plurality of feature vector nodes, so as to facilitate subsequent vectorization retrieval.
Step S202, clustering processing is carried out on the plurality of feature vector nodes, and the similarity among the plurality of feature vector nodes is determined according to the clustering processing result.
Specifically, according to a specific implementation manner of the embodiment of the present invention, a k-nearest neighbor method may be used to perform clustering processing on a plurality of feature vector nodes, so as to obtain a KNN (k-nearest neighbor, classification algorithm, also called proximity algorithm) graph corresponding to a plurality of stored data, where the KNN graph indicates that a distance of each feature vector node may be used to represent a similarity between the plurality of feature vector nodes.
Step S203, determining a plurality of levels of adjacent nodes corresponding to each feature vector node according to the similarity among the plurality of feature vector nodes; and determining candidate nodes corresponding to each feature vector node from the multi-level adjacent nodes according to the included angle threshold and the second quantity threshold.
Through the above arrangement, according to the spatial distribution of the satellites indicated by the SSG (Satellite System Graph), taking the eigenvector node a as an example, the eigenvector node a is analogous to the ground station, and the nodes adjacent to the eigenvector node a are analogous to the satellites, wherein the second quantity threshold is analogous to the constraint condition of the number of the satellites, and the included angle threshold is analogous to the constraint condition of the included angles formed by any two satellites and the ground station; the determined candidate nodes are analogous to the observed satellites.
According to the embodiment of the invention, the candidate node is determined based on the multi-level adjacent nodes, and the selection idea of the multi-level adjacent nodes is as follows: firstly, determining a plurality of first adjacent nodes (namely nodes with higher similarity with the feature vector nodes) of a certain feature vector node; respectively determining second adjacent nodes corresponding to the plurality of first adjacent nodes; until a multi-level neighbor node is determined. Through the arrangement, the coverage range of the candidate nodes is ensured, and the retrieval efficiency is improved.
By the arrangement, a topological graph index is constructed according to the stored data in the database; firstly, according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; the candidate nodes indicate nodes with higher similarity to the feature vector nodes, and in the process of determining the candidate nodes of each feature vector node, the nodes with higher similarity corresponding to each feature vector node are screened according to the second quantity threshold and the included angle threshold, so that the number of the nodes in the subsequent topological graph index is reduced, and the subsequent retrieval efficiency is improved.
According to a specific implementation manner of the embodiment of the present invention, the eigenvector adjacent to the eigenvector node a is 500, the second quantity threshold is 200, and the included angle threshold is 30 °, it is determined that, among the 500 eigenvector nodes adjacent to the eigenvector node a, the included angle between any two eigenvector nodes and a is greater than or equal to 30 °, and the eigenvector nodes whose total number is not greater than 200 are candidate nodes corresponding to the eigenvector node a. The above values are merely examples, and the specific settings may be adjusted according to actual situations.
Step S204, determining a connection graph between each feature vector node and the corresponding candidate node; and constructing a topological graph index according to the connection graph corresponding to each feature vector node.
Specifically, the candidate nodes are respectively connected with the feature vector nodes A to obtain a connection graph, and the topological graph index is determined according to the connection graph corresponding to each feature vector node.
Step S205, receiving the node operation information, determining a node operation type corresponding to the node operation information, and updating the topology map index according to the node operation type.
As shown in fig. 2b, the receiving the node operation information, determining the node operation type corresponding to the node operation information, and updating the topology index according to the node operation type includes the following steps:
step S2051 is to receive the node operation information and determine the node operation type corresponding to the node operation information. If the node operation type is to add a node, step S2052 is executed; if the node operation type is a delete node, step S2053 is executed; if the node operation type is a modified node, step S2054 is executed.
Step S2052 is to determine a connection map between the added node and the corresponding candidate node, and update the topology map index.
Specifically, a candidate node corresponding to the newly added node is determined, a connection graph corresponding to the newly added node and the candidate node corresponding to the newly added node is further determined, and then the connection graph corresponding to the newly added node is mounted in the original topological graph index, so that the updated topological graph index is obtained.
According to a specific implementation manner of the embodiment of the present invention, the new node may be retrieved from the original topological graph index, and N (second quantity threshold) nodes closest to the new node (with the highest similarity) are retrieved as candidate nodes; meanwhile, if the newly added node can be used as a candidate node corresponding to other nodes, the connection graph corresponding to other nodes needs to be updated according to the newly added node, and then the topological graph index is updated by the updated connection graph.
Step S2053, the deleted node in the topological graph index is updated to be a pseudo node; wherein the dummy node is not placed in the set of target nodes.
For the node operation type of the deleted node, only the deleted node needs to be set as a pseudo node; during the process of searching, the navigation node can still inquire the pseudo node, only whether the pseudo node meets the requirement or not, and the pseudo node is not placed in the target node set.
Step S2054 is to set the node before modification as a deleted node, set the modified node as a newly added node, and update the topological graph index.
Through the setting, scenes of adding, modifying and deleting the stored data are added; the original topological index map is only required to be updated, the index map does not need to be reconstructed, a large amount of expenses caused by reconstruction of the index due to change of stored data are reduced, and the efficiency of obtaining the updated topological map index is higher.
And step S206, receiving the retrieval data, and performing feature processing on the retrieval data to obtain a retrieval feature vector corresponding to the retrieval data.
Step S207, determining navigation nodes from the topological graph index, and constructing a target node set; and with the navigation node as a starting point, querying the topological graph index according to a greedy search algorithm, and placing the queried query node in the target node set.
Specifically, according to the embodiment of the present invention, the similarity between any two feature vectors may be measured by using the existing methods such as cosine angle, inner product, hamming distance, euclidean distance, and the like.
The number of target nodes that can be placed in the set of target nodes is defined by a first number threshold. The replacement of existing target nodes in the target node set (that is, when target nodes placed in the target node set reach a first number threshold and then target nodes are newly added later, the original target nodes in the corresponding number need to be deleted), and the judgment needs to be performed according to the sum of the similarity between each target node in the target node set and the retrieval feature vector. Through the arrangement, the calculation amount is reduced, the retrieval accuracy is improved, and the retrieval efficiency is improved.
As shown in fig. 2c, black dots are navigation nodes that are fixedly set, dark gray dots are navigation nodes that are randomly determined, light gray dots are candidate nodes placed in a target node set in the query process, white dots are target nodes that are finally determined, and five-pointed stars are nodes corresponding to the retrieval data.
Taking navigation nodes (black dots and dark gray dots) as starting points, inquiring the topological graph index according to a greedy search algorithm, and placing inquired inquiry nodes (light gray dots) in a target node set; and updating the target nodes in the target node set according to the first quantity threshold and the similarity between the inquired query node and the retrieval feature vector until the sum of the similarities between each target node in the target node set and the retrieval feature vector is not increased any more.
And step S208, updating the target nodes in the target node set according to the first quantity threshold and the similarity between the inquired query node and the retrieval feature vector until the sum of the similarities between each target node in the target node set and the retrieval feature vector is not increased any more.
Through the arrangement, the topological graph index is inquired according to the retrieval characteristic vector corresponding to the retrieval data, and then the target node is determined according to the number (the first number threshold) of the target nodes and the similarity, so that the technical problems of large calculation amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes in the conventional vectorized data retrieval method are solved, and further the purposes of reducing the calculation amount, reducing the occupied space of the memory, improving the retrieval accuracy, improving the retrieval efficiency and expanding the retrieval applicable scenes are achieved.
Step S209, the similarity between each target node in the target node set and the retrieval feature vector is sequenced, and the retrieval result is determined according to the sequencing result.
Specifically, after the similarity between each target node in the target node set and the retrieval feature vector is sequenced, the stored data corresponding to one or more target nodes with higher similarity is determined as the retrieval result according to the retrieval number threshold.
According to the technical scheme of the embodiment of the invention, the retrieval data is received and subjected to feature processing to obtain retrieval feature vectors corresponding to the retrieval data; determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; the similarity between each target node in the target node set and the retrieval feature vector is sequenced, and the retrieval result is determined according to the sequencing result, so that the technical problems of large calculation amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes in the existing vectorized data retrieval method are solved, and the technical effects of reducing the calculation amount, reducing the occupied space of the memory, improving the retrieval accuracy, improving the retrieval efficiency and expanding the applicable scenes for retrieval are achieved.
FIG. 3 is a schematic diagram of the main modules of a data retrieval device provided in accordance with an embodiment of the present invention; as shown in fig. 3, the data retrieving apparatus 300 according to the embodiment of the present invention mainly includes:
the receiving module 301 is configured to receive search data, perform feature processing on the search data, and obtain a search feature vector corresponding to the search data.
Specifically, the search data is user input data, and the search device determines search result data that is the same as or similar to the input data according to the input data.
Any one of the existing methods can be adopted in the process of performing feature processing on the retrieval data to obtain the corresponding feature vector. Through the arrangement, the search result can be determined according to the similarity between the search characteristic vector corresponding to the search data and the characteristic vector corresponding to the stored data, and the search efficiency and the search accuracy are improved.
A target node set determining module 302, configured to determine a navigation node from the topological graph index, query the topological graph index from the navigation node, and determine a target node set according to a first quantity threshold and a similarity between a query node queried by the navigation node and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold value, a second quantity threshold value and the similarity among the plurality of characteristic vector nodes.
Specifically, according to the embodiment of the present invention, the similarity between any two feature vectors may be measured by using the existing methods such as cosine angle, inner product, hamming distance, euclidean distance, and the like.
Through the arrangement, the topological graph index is inquired according to the retrieval characteristic vector corresponding to the retrieval data, and then the target node is determined according to the number (the first number threshold) of the target nodes and the similarity, so that the technical problems of large calculation amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes in the conventional vectorized data retrieval method are solved, and further the purposes of reducing the calculation amount, reducing the occupied space of the memory, improving the retrieval accuracy, improving the retrieval efficiency and expanding the retrieval applicable scenes are achieved.
Specifically, according to the embodiment of the present invention, the data retrieving apparatus 300 further includes a topology index building module, configured to:
the method comprises the steps of obtaining a plurality of storage data, carrying out feature extraction on the plurality of storage data to obtain a plurality of feature vector nodes corresponding to the plurality of storage data, and determining the similarity among the plurality of feature vector nodes;
determining a candidate node corresponding to each eigenvector node according to the included angle threshold, the second quantity threshold and the similarity among the plurality of eigenvector nodes; determining a connection graph between each feature vector node and the corresponding candidate node; the included angle threshold value indicates a threshold value corresponding to an included angle formed by each feature vector node and a connecting line between any two feature vector nodes;
and constructing a topological graph index according to the connection graph corresponding to each feature vector node.
By the arrangement, a topological graph index is constructed according to the stored data in the database; firstly, according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; the candidate nodes indicate nodes with higher similarity to the feature vector nodes, and in the process of determining the candidate nodes of each feature vector node, the nodes with higher similarity corresponding to each feature vector node are screened according to the second quantity threshold and the included angle threshold, so that the number of the nodes in the subsequent topological graph index is reduced, and the subsequent retrieval efficiency is improved.
Further, according to an embodiment of the present invention, the topology index building module is further configured to:
determining a plurality of levels of adjacent nodes corresponding to each feature vector node according to the similarity among the plurality of feature vector nodes;
and determining candidate nodes corresponding to each feature vector node from the multi-level adjacent nodes according to the included angle threshold and the second quantity threshold.
According to the embodiment of the invention, the candidate node is determined based on the multi-level adjacent nodes, and the selection idea of the multi-level adjacent nodes is as follows: firstly, determining a plurality of first adjacent nodes (namely nodes with higher similarity with the feature vector nodes) of a certain feature vector node; respectively determining second adjacent nodes corresponding to the plurality of first adjacent nodes; until a multi-level neighbor node is determined. Through the arrangement, the coverage range of the candidate nodes is ensured, and the retrieval efficiency is improved.
Illustratively, according to an embodiment of the present invention, the topology index building module is further configured to:
and clustering the plurality of feature vector nodes, and determining the similarity among the plurality of feature vector nodes according to the clustering result.
According to a specific implementation manner of the embodiment of the present invention, a k-nearest neighbor method may be used to perform clustering processing on a plurality of feature vector nodes, and a KNN graph is determined, where the KNN graph indicates that the distance of each feature vector node may be used to represent the similarity between the plurality of feature vector nodes.
Preferably, according to an embodiment of the present invention, the data retrieving apparatus 300 further includes an updating module, configured to:
receiving node operation information, and determining a node operation type corresponding to the node operation information;
if the node operation type is an added node, determining a connection graph between the added node and the corresponding candidate node, and updating the topological graph index;
if the node operation type is a deleted node, updating the deleted node in the topological graph index into a pseudo node; wherein the dummy node is not placed in the set of target nodes;
and if the node operation type is a modified node, setting the node before modification as a deleted node, setting the modified node as a newly added node, and updating the topological graph index.
Through the setting, scenes of adding, modifying and deleting the stored data are added; the original topological index map is only required to be updated, the index map does not need to be reconstructed, a large amount of expenses caused by reconstruction of the index due to change of stored data are reduced, and the efficiency of obtaining the updated topological map index is higher.
Further, according to an embodiment of the present invention, the target node set determining module 302 is further configured to:
constructing a target node set;
taking the navigation node as a starting point, inquiring the topological graph index according to a greedy search algorithm, and placing the inquired inquiry node in a target node set; and updating the target nodes in the target node set according to the first quantity threshold and the similarity between the inquired query node and the retrieval feature vector until the sum of the similarities between each target node in the target node set and the retrieval feature vector is not increased any more.
Specifically, the number of target nodes that can be placed in the set of target nodes is defined by a first number threshold. The replacement of existing target nodes in the target node set (that is, when target nodes placed in the target node set reach a first number threshold and then target nodes are newly added later, the original target nodes in the corresponding number need to be deleted), and the judgment needs to be performed according to the sum of the similarity between each target node in the target node set and the retrieval feature vector. Through the arrangement, the calculation amount is reduced, the retrieval accuracy is improved, and the retrieval efficiency is improved.
Specifically, according to the embodiment of the present invention, the number of the navigation nodes is at least one, and at least one navigation node connects all the feature vector nodes in the topological graph index.
According to a specific implementation manner of the embodiment of the invention, the navigation nodes can be randomly selected from the topological graph index, and can also be fixedly selected, the number of the navigation nodes can be one or more, and the condition that at least one selected navigation node can be communicated with all the feature vector nodes in the topological graph index is only required to be met, so that the navigation nodes can be ensured to inquire all the feature vector nodes in the topological graph index, and the retrieval efficiency is improved.
And the sorting module 303 is configured to sort the similarity between each target node in the target node set and the search feature vector, and determine a search result according to the sorting result.
Specifically, after the similarity between each target node in the target node set and the retrieval feature vector is sequenced, the stored data corresponding to one or more target nodes with higher similarity is determined as the retrieval result according to the retrieval number threshold.
According to the technical scheme of the embodiment of the invention, the retrieval data is received and subjected to feature processing to obtain retrieval feature vectors corresponding to the retrieval data; determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; the similarity between each target node in the target node set and the retrieval feature vector is sequenced, and the retrieval result is determined according to the sequencing result, so that the technical problems of large calculation amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes in the existing vectorized data retrieval method are solved, and the technical effects of reducing the calculation amount, reducing the occupied space of the memory, improving the retrieval accuracy, improving the retrieval efficiency and expanding the applicable scenes for retrieval are achieved.
Fig. 4 shows an exemplary system architecture 400 to which the data retrieval method or data retrieval device of an embodiment of the present invention may be applied.
As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405 (this architecture is merely an example, and the components included in a particular architecture may be adapted according to application specific circumstances). The network 404 serves as a medium for providing communication links between the terminal devices 401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The terminal devices 401, 402, 403 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, data processing-like tools, data retrieval clients, social platform software, etc. (by way of example only).
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 405 may be a server that provides various services, such as a server (for example only) for (data retrieval/data processing) users using the terminal devices 401, 402, 403. The server may perform processing such as analysis on the received data search, and feed back the processing results (e.g., target node set, search results-just an example) to the terminal device.
It should be noted that the data retrieval method provided by the embodiment of the present invention is generally executed by the server 405, and accordingly, the data retrieval device is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use with a terminal device or server implementing an embodiment of the invention is shown. The terminal device or the server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a receiving module, a target node set determination module, and an ordering module. The names of the modules do not limit the modules themselves in some cases, for example, the receiving module may also be described as a "module for receiving the search data, performing feature processing on the search data, and obtaining a search feature vector corresponding to the search data".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: receiving retrieval data, and performing feature processing on the retrieval data to obtain retrieval feature vectors corresponding to the retrieval data; determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; and sequencing the similarity between each target node in the target node set and the retrieval feature vector, and determining a retrieval result according to the sequencing result.
According to the technical scheme of the embodiment of the invention, the retrieval data is received and subjected to feature processing to obtain retrieval feature vectors corresponding to the retrieval data; determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; the similarity between each target node in the target node set and the retrieval feature vector is sequenced, and the retrieval result is determined according to the sequencing result, so that the technical problems of large calculation amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes in the existing vectorized data retrieval method are solved, and the technical effects of reducing the calculation amount, reducing the occupied space of the memory, improving the retrieval accuracy, improving the retrieval efficiency and expanding the applicable scenes for retrieval are achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1.一种数据检索方法,其特征在于,包括:1. a data retrieval method, is characterized in that, comprises: 接收检索数据,对所述检索数据进行特征处理,得到所述检索数据对应的检索特征向量;Receiving retrieval data, performing feature processing on the retrieval data, and obtaining retrieval feature vectors corresponding to the retrieval data; 从拓扑图索引中确定导航节点,从所述导航节点起查询所述拓扑图索引,根据第一数量阈值以及所述导航节点查询的查询节点与所述检索特征向量之间的相似度,确定目标节点集合;其中,所述拓扑图索引是根据夹角阈值、第二数量阈值以及多个特征向量节点之间的相似度进行构建的;Determine the navigation node from the topology map index, query the topology map index from the navigation node, and determine the target according to the first number threshold and the similarity between the query node queried by the navigation node and the retrieval feature vector A node set; wherein, the topological map index is constructed according to an included angle threshold, a second quantity threshold and the similarity between multiple feature vector nodes; 对所述目标节点集合中各目标节点与所述检索特征向量之间的相似度进行排序,根据排序结果确定检索结果。Sort the similarity between each target node in the target node set and the retrieval feature vector, and determine the retrieval result according to the ranking result. 2.根据权利要求1所述的数据检索方法,其特征在于,根据夹角阈值、第二数量阈值以及多个特征向量节点之间的相似度构建所述拓扑图索引的步骤包括:2. The data retrieval method according to claim 1, wherein the step of constructing the topological map index according to an included angle threshold, a second quantity threshold and the similarity between multiple eigenvector nodes comprises: 获取多个存储数据,对所述多个存储数据进行特征提取,得到所述多个存储数据对应的多个特征向量节点,并确定所述多个特征向量节点之间的相似度;Acquire a plurality of stored data, perform feature extraction on the plurality of stored data, obtain a plurality of feature vector nodes corresponding to the plurality of stored data, and determine the similarity between the plurality of feature vector nodes; 根据所述夹角阈值、所述第二数量阈值以及所述多个特征向量节点之间的相似度,确定每个特征向量节点对应的候选节点;并确定每个特征向量节点与所对应的候选节点之间的连接图;其中,所述夹角阈值指示了每个特征向量节点分别与任意两个特征向量节点之间的连线所形成的夹角对应的阈值;According to the included angle threshold, the second quantity threshold and the similarity between the multiple feature vector nodes, determine the candidate node corresponding to each feature vector node; and determine each feature vector node and the corresponding candidate node A connection graph between nodes; wherein, the included angle threshold indicates the threshold corresponding to the included angle formed by each feature vector node and the connection line between any two feature vector nodes respectively; 根据每个特征向量节点对应的连接图构建所述拓扑图索引。The topology graph index is constructed according to the connection graph corresponding to each feature vector node. 3.根据权利要求2所述的数据检索方法,其特征在于,所述根据所述夹角阈值、所述第二数量阈值以及所述多个特征向量节点之间的相似度确定每个特征向量节点对应的候选节点,还包括:3 . The data retrieval method according to claim 2 , wherein each feature vector is determined according to the included angle threshold, the second quantity threshold and the similarity between the plurality of feature vector nodes. 4 . The candidate node corresponding to the node also includes: 根据所述多个特征向量节点之间的相似度确定每个特征向量节点对应的多级相邻节点;Determine the multi-level adjacent node corresponding to each feature vector node according to the similarity between the multiple feature vector nodes; 根据所述夹角阈值、所述第二数量阈值从所述多级相邻节点中确定每个特征向量节点对应的候选节点。A candidate node corresponding to each feature vector node is determined from the multi-level adjacent nodes according to the included angle threshold and the second quantity threshold. 4.根据权利要求2所述的数据检索方法,其特征在于,所述确定所述多个特征向量节点之间的相似度,还包括:4. The data retrieval method according to claim 2, wherein the determining the similarity between the plurality of feature vector nodes further comprises: 对所述多个特征向量节点进行聚类处理,根据聚类处理结果确定所述多个特征向量节点之间的相似度。Perform clustering processing on the plurality of feature vector nodes, and determine the similarity between the plurality of feature vector nodes according to the clustering processing result. 5.根据权利要求2所述的数据检索方法,其特征在于,5. data retrieval method according to claim 2, is characterized in that, 接收节点操作信息,确定所述节点操作信息对应的节点操作类型;receiving node operation information, and determining a node operation type corresponding to the node operation information; 若所述节点操作类型为增加节点,确定所述增加节点与所对应的候选节点之间的连接图,对所述拓扑图索引进行更新;If the node operation type is adding a node, determining a connection graph between the adding node and the corresponding candidate node, and updating the topology map index; 若所述节点操作类型为删除节点,将所述拓扑图索引中的所述删除节点更新为伪节点;其中,所述伪节点不被放置于所述目标节点集合中;If the node operation type is delete node, update the deleted node in the topology map index to a pseudo node; wherein, the pseudo node is not placed in the target node set; 若所述节点操作类型为修改节点,将修改前的节点设置为删除节点,将修改后的节点设置为新增节点,对所述拓扑图索引进行更新。If the node operation type is a modification node, the node before modification is set as a deleted node, the modified node is set as a new node, and the topology map index is updated. 6.根据权利要求1所述的数据检索方法,其特征在于,所述从所述导航节点起查询所述拓扑图索引,根据第一数量阈值以及所述导航节点查询的查询节点与所述检索特征向量之间的相似度,确定目标节点集合还包括:6 . The data retrieval method according to claim 1 , wherein the querying the topology map index from the navigation node is based on a first number threshold and the query node queried by the navigation node and the retrieval method. 7 . The similarity between the feature vectors to determine the target node set also includes: 构建目标节点集合;Build a set of target nodes; 以所述导航节点为起始点,根据贪婪搜索算法查询所述拓扑图索引,将所查询到的查询节点放置于所述目标节点集合中;根据所述第一数量阈值和所查询的查询节点与所述检索特征向量之间的相似度对所述目标节点集合中的目标节点进行更新,直至所述目标节点集合中的各目标节点与所述检索特征向量之间的相似度之和不再增加。Taking the navigation node as the starting point, query the topology map index according to the greedy search algorithm, and place the queried query node in the target node set; according to the first number threshold and the queried query node and The similarity between the retrieval feature vectors updates the target nodes in the target node set, until the sum of the similarities between each target node in the target node set and the retrieval feature vector no longer increases . 7.根据权利要求1所述的数据检索方法,其特征在于,7. The data retrieval method according to claim 1, wherein, 所述导航节点的数量为至少一个,所述至少一个导航节点连通所述拓扑图索引中的全部特征向量节点。The number of the navigation nodes is at least one, and the at least one navigation node connects all the feature vector nodes in the topology map index. 8.一种数据检索装置,其特征在于,包括:8. A data retrieval device, characterized in that, comprising: 接收模块,用于接收检索数据,对所述检索数据进行特征处理,得到所述检索数据对应的检索特征向量;a receiving module, configured to receive retrieval data, perform feature processing on the retrieval data, and obtain retrieval feature vectors corresponding to the retrieval data; 目标节点集合确定模块,用于从拓扑图索引中确定导航节点,从所述导航节点起查询所述拓扑图索引,根据第一数量阈值以及所述导航节点查询的查询节点与所述检索特征向量之间的相似度,确定目标节点集合;其中,所述拓扑图索引是根据夹角阈值、第二数量阈值以及多个特征向量节点之间的相似度进行构建的;The target node set determination module is used to determine the navigation node from the topology map index, query the topology map index from the navigation node, and query the query node and the retrieval feature vector according to the first number threshold and the navigation node query. The similarity between the two, to determine the target node set; wherein, the topological map index is constructed according to the angle threshold, the second quantity threshold and the similarity between multiple feature vector nodes; 排序模块,用于对所述目标节点集合中各目标节点与所述检索特征向量之间的相似度进行排序,根据排序结果确定检索结果。The sorting module is used for sorting the similarity between each target node in the target node set and the retrieval feature vector, and determining the retrieval result according to the sorting result. 9.一种电子设备,其特征在于,包括:9. An electronic device, characterized in that, comprising: 一个或多个处理器;one or more processors; 存储装置,用于存储一个或多个程序,storage means for storing one or more programs, 当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一所述的方法。The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7. 10.一种计算机可读介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现如权利要求1-7中任一所述的方法。10. A computer-readable medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method according to any one of claims 1-7 is implemented.
CN202110291851.6A 2021-03-18 2021-03-18 Data retrieval method and device Active CN113076447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110291851.6A CN113076447B (en) 2021-03-18 2021-03-18 Data retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110291851.6A CN113076447B (en) 2021-03-18 2021-03-18 Data retrieval method and device

Publications (2)

Publication Number Publication Date
CN113076447A true CN113076447A (en) 2021-07-06
CN113076447B CN113076447B (en) 2025-07-18

Family

ID=76613918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110291851.6A Active CN113076447B (en) 2021-03-18 2021-03-18 Data retrieval method and device

Country Status (1)

Country Link
CN (1) CN113076447B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817657A (en) * 2022-04-29 2022-07-29 上海徐毓智能科技有限公司 Data processing method to be retrieved, data retrieval method, electronic device and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063973A1 (en) * 2008-08-27 2010-03-11 International Business Machines Corporation Method and apparatus for identifying similar sub-graphs in a network
CN102693266A (en) * 2011-02-23 2012-09-26 哈曼贝克自动系统股份有限公司 Method of searching a data base, navigation device and method of generating an index structure
CN103712617A (en) * 2013-12-18 2014-04-09 北京工业大学 Visual-content-based method for establishing multi-level semantic map
US9436760B1 (en) * 2016-02-05 2016-09-06 Quid, Inc. Measuring accuracy of semantic graphs with exogenous datasets
CN110413848A (en) * 2019-07-19 2019-11-05 上海赜睿信息科技有限公司 A kind of data retrieval method, electronic equipment and computer readable storage medium
US20200004835A1 (en) * 2018-06-28 2020-01-02 Microsoft Technology Licensing, Llc Generating candidates for search using scoring/retrieval architecture
CN111008620A (en) * 2020-03-05 2020-04-14 支付宝(杭州)信息技术有限公司 Target user identification method and device, storage medium and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063973A1 (en) * 2008-08-27 2010-03-11 International Business Machines Corporation Method and apparatus for identifying similar sub-graphs in a network
CN102693266A (en) * 2011-02-23 2012-09-26 哈曼贝克自动系统股份有限公司 Method of searching a data base, navigation device and method of generating an index structure
CN103712617A (en) * 2013-12-18 2014-04-09 北京工业大学 Visual-content-based method for establishing multi-level semantic map
US9436760B1 (en) * 2016-02-05 2016-09-06 Quid, Inc. Measuring accuracy of semantic graphs with exogenous datasets
US20200004835A1 (en) * 2018-06-28 2020-01-02 Microsoft Technology Licensing, Llc Generating candidates for search using scoring/retrieval architecture
CN110413848A (en) * 2019-07-19 2019-11-05 上海赜睿信息科技有限公司 A kind of data retrieval method, electronic equipment and computer readable storage medium
CN111008620A (en) * 2020-03-05 2020-04-14 支付宝(杭州)信息技术有限公司 Target user identification method and device, storage medium and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FAISAL N. ABU-KHZAM: "Highly Scalable Parallel Search-Tree Algorithms: The Virtual Topology Approach", IEEE, 29 October 2015 (2015-10-29) *
HUNG-CHANG HSIAO: "On Optimizing Overlay Topologies for Search in Unstructured Peer-to-Peer Networks", IEEE, vol. 5, no. 23, 6 October 2011 (2011-10-06) *
宋宝燕;贾春杰;单晓欢;丁琳琳;丁兴艳;: "大规模标签图中的动态Top-K兴趣子图查询", 计算机应用, no. 02, 10 February 2018 (2018-02-10) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817657A (en) * 2022-04-29 2022-07-29 上海徐毓智能科技有限公司 Data processing method to be retrieved, data retrieval method, electronic device and medium
CN114817657B (en) * 2022-04-29 2025-10-03 上海徐毓智能科技有限公司 Method for processing data to be retrieved, data retrieval method, electronic device and medium

Also Published As

Publication number Publication date
CN113076447B (en) 2025-07-18

Similar Documents

Publication Publication Date Title
WO2020114022A1 (en) Knowledge base alignment method and apparatus, computer device and storage medium
US9719790B2 (en) Mapping uncertain geometries to graticules
CN111209347B (en) Method and device for clustering mixed attribute data
CN110222775B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN109471838B (en) Directory document operation method and device, electronic equipment and readable storage medium
CN113190730B (en) Block chain address classification method and device
CN110263209B (en) Method and apparatus for generating information
CN111044062A (en) Path planning and recommending method and device
CN111460237B (en) Data query method and device, readable storage medium and electronic equipment
CN113076447B (en) Data retrieval method and device
CN113486068A (en) Method and device for determining point of interest data, electronic equipment and computer readable medium
CN113761289B (en) Graph learning method, framework, computer system and readable storage medium
CN111626044A (en) Text generation method and device, electronic equipment and computer readable storage medium
CN113779370A (en) Address retrieval method and device
CN110647623B (en) Method and device for updating information
CN111582456A (en) Method, apparatus, device and medium for generating network model information
CN113780827B (en) Article screening method and device, electronic equipment and computer readable medium
CN118101749A (en) Information pushing method, device, equipment and medium
JP7121706B2 (en) Information processing device, information processing method, and information processing program
US9292555B2 (en) Information processing device
CN112580087B (en) Encryption data searching method and device, storage medium and electronic equipment
CN112101390B (en) Attribute information determining method, attribute information determining device and electronic equipment
CN109308299B (en) Method and apparatus for searching information
CN109508227B (en) Application analysis method and device, computing equipment and storage medium
CN113111132A (en) Method and device for identifying target user

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant