CN113076447A

CN113076447A - Data retrieval method and device

Info

Publication number: CN113076447A
Application number: CN202110291851.6A
Authority: CN
Inventors: 高彬; 施文祥; 肖智鹏; 李洁; 王兴达; 倪振云; 占超
Original assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2021-07-06
Anticipated expiration: 2041-03-18
Also published as: CN113076447B

Abstract

The invention discloses a data retrieval method and device, and relates to the technical field of computers. One embodiment of the method comprises: receiving retrieval data, and performing feature processing on the retrieval data to obtain retrieval feature vectors corresponding to the retrieval data; determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; and sequencing the similarity between each target node in the target node set and the retrieval feature vector, and determining a retrieval result according to the sequencing result. The embodiment reduces the calculated amount, reduces the occupied space of the memory, improves the retrieval accuracy, improves the retrieval efficiency and expands the retrieval application scene.

Description

Data retrieval method and device

Technical Field

The invention relates to the technical field of computers, in particular to a data retrieval method and device.

Background

Vectorized data retrieval can be divided into accurate retrieval and approximate retrieval, the essence of the accurate retrieval is linear search, and the linear search finds vector information which is most matched with a target vector by traversing all vectors in the whole vector space; the approximate retrieval is to convert the original search in the whole high-dimensional vector space into the search in a small-range space or a relatively low-dimensional space by means of clustering, dimension reduction or coding and the like.

The prior art has at least the following problems:

the existing vectorization data retrieval method has the technical problems of large calculated amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes.

Disclosure of Invention

In view of this, embodiments of the present invention provide a data retrieval method and apparatus, which can reduce the amount of computation, reduce the occupied space of a memory, improve the retrieval accuracy, improve the retrieval efficiency, and expand the retrieval application scenarios.

To achieve the above object, according to a first aspect of embodiments of the present invention, there is provided a data retrieval method, including:

receiving retrieval data, and performing feature processing on the retrieval data to obtain retrieval feature vectors corresponding to the retrieval data;

determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes;

and sequencing the similarity between each target node in the target node set and the retrieval feature vector, and determining a retrieval result according to the sequencing result.

Further, the step of constructing the topological graph index according to the included angle threshold, the second quantity threshold and the similarity among the plurality of feature vector nodes includes:

the method comprises the steps of obtaining a plurality of storage data, carrying out feature extraction on the plurality of storage data to obtain a plurality of feature vector nodes corresponding to the plurality of storage data, and determining the similarity among the plurality of feature vector nodes;

determining a candidate node corresponding to each eigenvector node according to the included angle threshold, the second quantity threshold and the similarity among the plurality of eigenvector nodes; determining a connection graph between each feature vector node and the corresponding candidate node; the included angle threshold value indicates a threshold value corresponding to an included angle formed by each feature vector node and a connecting line between any two feature vector nodes;

and constructing a topological graph index according to the connection graph corresponding to each feature vector node.

Further, determining a candidate node corresponding to each feature vector node according to the included angle threshold, the second quantity threshold and the similarity between the plurality of feature vector nodes, and further comprising:

determining a plurality of levels of adjacent nodes corresponding to each feature vector node according to the similarity among the plurality of feature vector nodes;

and determining candidate nodes corresponding to each feature vector node from the multi-level adjacent nodes according to the included angle threshold and the second quantity threshold.

Further, determining similarity between a plurality of feature vector nodes further comprises:

and clustering the plurality of feature vector nodes, and determining the similarity among the plurality of feature vector nodes according to the clustering result.

Further, receiving node operation information, and determining a node operation type corresponding to the node operation information;

if the node operation type is an added node, determining a connection graph between the added node and the corresponding candidate node, and updating the topological graph index;

if the node operation type is a deleted node, updating the deleted node in the topological graph index into a pseudo node; wherein the dummy node is not placed in the set of target nodes;

and if the node operation type is a modified node, setting the node before modification as a deleted node, setting the modified node as a newly added node, and updating the topological graph index.

Further, querying the topological graph index from the navigation node, and determining the target node set according to the first quantity threshold and the similarity between the query node queried by the navigation node and the retrieval feature vector further includes:

constructing a target node set;

taking the navigation node as a starting point, inquiring the topological graph index according to a greedy search algorithm, and placing the inquired inquiry node in a target node set; and updating the target nodes in the target node set according to the first quantity threshold and the similarity between the inquired query node and the retrieval feature vector until the sum of the similarities between each target node in the target node set and the retrieval feature vector is not increased any more.

Further, the number of the navigation nodes is at least one, and at least one navigation node is communicated with all the feature vector nodes in the topological graph index.

According to a second aspect of the embodiments of the present invention, there is provided a data retrieval apparatus including:

the receiving module is used for receiving the retrieval data and performing feature processing on the retrieval data to obtain retrieval feature vectors corresponding to the retrieval data;

the target node set determining module is used for determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to the first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes;

and the sequencing module is used for sequencing the similarity between each target node in the target node set and the retrieval characteristic vector and determining a retrieval result according to the sequencing result.

According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including:

one or more processors;

a storage device for storing one or more programs,

when executed by one or more processors, cause the one or more processors to implement any of the data retrieval methods described above.

According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable medium, on which a computer program is stored, which when executed by a processor, implements any one of the data retrieval methods described above.

One embodiment of the above invention has the following advantages or benefits: because the retrieval data is received, the retrieval data is subjected to feature processing to obtain retrieval feature vectors corresponding to the retrieval data; determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; the similarity between each target node in the target node set and the retrieval feature vector is sequenced, and the retrieval result is determined according to the sequencing result, so that the technical problems of large calculation amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes in the existing vectorized data retrieval method are solved, and the technical effects of reducing the calculation amount, reducing the occupied space of the memory, improving the retrieval accuracy, improving the retrieval efficiency and expanding the applicable scenes for retrieval are achieved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

fig. 1 is a schematic diagram of a main flow of a data retrieval method according to a first embodiment of the present invention;

FIG. 2a is a schematic diagram of a main flow of a data retrieval method according to a second embodiment of the present invention;

FIG. 2b is a schematic diagram of a main flow of updating the topological graph index in the method of FIG. 2 a;

FIG. 2c is a schematic diagram of a data retrieval process performed in the method of FIG. 2 a;

FIG. 3 is a schematic diagram of the main modules of a data retrieval device provided in accordance with an embodiment of the present invention;

FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of a main flow of a data retrieval method according to a first embodiment of the present invention; as shown in fig. 1, the data retrieval method provided by the embodiment of the present invention mainly includes:

and step S101, receiving the search data, and performing characteristic processing on the search data to obtain a search characteristic vector corresponding to the search data.

Specifically, the search data is user input data, and the search device determines search result data that is the same as or similar to the input data according to the input data.

Any one of the existing methods can be adopted in the process of performing feature processing on the retrieval data to obtain the corresponding feature vector. Through the arrangement, the search result can be determined according to the similarity between the search characteristic vector corresponding to the search data and the characteristic vector corresponding to the stored data, and the search efficiency and the search accuracy are improved.

Step S102, navigation nodes are determined from the topological graph index, the topological graph index is inquired from the navigation nodes, and a target node set is determined according to a first quantity threshold value and the similarity between the inquired nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold value, a second quantity threshold value and the similarity among the plurality of characteristic vector nodes.

Specifically, according to the embodiment of the present invention, the similarity between any two feature vectors may be measured by using the existing methods such as cosine angle, inner product, hamming distance, euclidean distance, and the like.

Through the arrangement, the topological graph index is inquired according to the retrieval characteristic vector corresponding to the retrieval data, and then the target node is determined according to the number (the first number threshold) of the target nodes and the similarity, so that the technical problems of large calculation amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes in the conventional vectorized data retrieval method are solved, and further the purposes of reducing the calculation amount, reducing the occupied space of the memory, improving the retrieval accuracy, improving the retrieval efficiency and expanding the retrieval applicable scenes are achieved.

Specifically, according to the embodiment of the present invention, the step of constructing the topological graph index according to the included angle threshold, the second quantity threshold, and the similarity between the plurality of feature vector nodes includes:

By the arrangement, a topological graph index is constructed according to the stored data in the database; firstly, according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; the candidate nodes indicate nodes with higher similarity to the feature vector nodes, and in the process of determining the candidate nodes of each feature vector node, the nodes with higher similarity corresponding to each feature vector node are screened according to the second quantity threshold and the included angle threshold, so that the number of the nodes in the subsequent topological graph index is reduced, and the subsequent retrieval efficiency is improved.

Further, according to the embodiment of the present invention, the determining a candidate node corresponding to each feature vector node according to the included angle threshold, the second quantity threshold, and the similarity between the plurality of feature vector nodes further includes:

According to the embodiment of the invention, the candidate node is determined based on the multi-level adjacent nodes, and the selection idea of the multi-level adjacent nodes is as follows: firstly, determining a plurality of first adjacent nodes (namely nodes with higher similarity with the feature vector nodes) of a certain feature vector node; respectively determining second adjacent nodes corresponding to the plurality of first adjacent nodes; until a multi-level neighbor node is determined. Through the arrangement, the coverage range of the candidate nodes is ensured, and the retrieval efficiency is improved.

Exemplarily, according to the embodiment of the present invention, the determining the similarity between the plurality of feature vector nodes further includes:

According to a specific implementation manner of the embodiment of the present invention, a k-nearest neighbor method may be used to perform clustering processing on a plurality of feature vector nodes, and a KNN graph is determined, where the KNN graph indicates that the distance of each feature vector node may be used to represent the similarity between the plurality of feature vector nodes.

Preferably, according to an embodiment of the present invention, the method further includes:

receiving node operation information, and determining a node operation type corresponding to the node operation information;

Through the setting, scenes of adding, modifying and deleting the stored data are added; the original topological index map is only required to be updated, the index map does not need to be reconstructed, a large amount of expenses caused by reconstruction of the index due to change of stored data are reduced, and the efficiency of obtaining the updated topological map index is higher.

Further, according to an embodiment of the present invention, the querying the topological graph index from the navigation node, and determining the target node set according to the first number threshold and the similarity between the query node queried by the navigation node and the retrieval feature vector further includes:

constructing a target node set;

Specifically, the number of target nodes that can be placed in the set of target nodes is defined by a first number threshold. The replacement of existing target nodes in the target node set (that is, when target nodes placed in the target node set reach a first number threshold and then target nodes are newly added later, the original target nodes in the corresponding number need to be deleted), and the judgment needs to be performed according to the sum of the similarity between each target node in the target node set and the retrieval feature vector. Through the arrangement, the calculation amount is reduced, the retrieval accuracy is improved, and the retrieval efficiency is improved.

Specifically, according to the embodiment of the present invention, the number of the navigation nodes is at least one, and at least one navigation node connects all the feature vector nodes in the topological graph index.

According to a specific implementation manner of the embodiment of the invention, the navigation nodes can be randomly selected from the topological graph index, and can also be fixedly selected, the number of the navigation nodes can be one or more, and the condition that at least one selected navigation node can be communicated with all the feature vector nodes in the topological graph index is only required to be met, so that the navigation nodes can be ensured to inquire all the feature vector nodes in the topological graph index, and the retrieval efficiency is improved.

And S103, sequencing the similarity between each target node in the target node set and the retrieval feature vector, and determining a retrieval result according to the sequencing result.

Specifically, after the similarity between each target node in the target node set and the retrieval feature vector is sequenced, the stored data corresponding to one or more target nodes with higher similarity is determined as the retrieval result according to the retrieval number threshold.

According to the technical scheme of the embodiment of the invention, the retrieval data is received and subjected to feature processing to obtain retrieval feature vectors corresponding to the retrieval data; determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; the similarity between each target node in the target node set and the retrieval feature vector is sequenced, and the retrieval result is determined according to the sequencing result, so that the technical problems of large calculation amount, large occupied memory, low retrieval accuracy, low retrieval efficiency and few applicable scenes in the existing vectorized data retrieval method are solved, and the technical effects of reducing the calculation amount, reducing the occupied space of the memory, improving the retrieval accuracy, improving the retrieval efficiency and expanding the applicable scenes for retrieval are achieved.

FIG. 2a is a schematic diagram of a main flow of a data retrieval method according to a second embodiment of the present invention; as shown in fig. 2a, the data retrieval method provided by the embodiment of the present invention mainly includes:

step S201, obtaining a plurality of storage data, and performing feature extraction on the plurality of storage data to obtain a plurality of feature vector nodes corresponding to the plurality of storage data.

Specifically, any one of the existing methods may be adopted to perform feature processing on the mass storage data, and determine a plurality of feature vector nodes, so as to facilitate subsequent vectorization retrieval.

Step S202, clustering processing is carried out on the plurality of feature vector nodes, and the similarity among the plurality of feature vector nodes is determined according to the clustering processing result.

Specifically, according to a specific implementation manner of the embodiment of the present invention, a k-nearest neighbor method may be used to perform clustering processing on a plurality of feature vector nodes, so as to obtain a KNN (k-nearest neighbor, classification algorithm, also called proximity algorithm) graph corresponding to a plurality of stored data, where the KNN graph indicates that a distance of each feature vector node may be used to represent a similarity between the plurality of feature vector nodes.

Step S203, determining a plurality of levels of adjacent nodes corresponding to each feature vector node according to the similarity among the plurality of feature vector nodes; and determining candidate nodes corresponding to each feature vector node from the multi-level adjacent nodes according to the included angle threshold and the second quantity threshold.

Through the above arrangement, according to the spatial distribution of the satellites indicated by the SSG (Satellite System Graph), taking the eigenvector node a as an example, the eigenvector node a is analogous to the ground station, and the nodes adjacent to the eigenvector node a are analogous to the satellites, wherein the second quantity threshold is analogous to the constraint condition of the number of the satellites, and the included angle threshold is analogous to the constraint condition of the included angles formed by any two satellites and the ground station; the determined candidate nodes are analogous to the observed satellites.

According to a specific implementation manner of the embodiment of the present invention, the eigenvector adjacent to the eigenvector node a is 500, the second quantity threshold is 200, and the included angle threshold is 30 °, it is determined that, among the 500 eigenvector nodes adjacent to the eigenvector node a, the included angle between any two eigenvector nodes and a is greater than or equal to 30 °, and the eigenvector nodes whose total number is not greater than 200 are candidate nodes corresponding to the eigenvector node a. The above values are merely examples, and the specific settings may be adjusted according to actual situations.

Step S204, determining a connection graph between each feature vector node and the corresponding candidate node; and constructing a topological graph index according to the connection graph corresponding to each feature vector node.

Specifically, the candidate nodes are respectively connected with the feature vector nodes A to obtain a connection graph, and the topological graph index is determined according to the connection graph corresponding to each feature vector node.

Step S205, receiving the node operation information, determining a node operation type corresponding to the node operation information, and updating the topology map index according to the node operation type.

As shown in fig. 2b, the receiving the node operation information, determining the node operation type corresponding to the node operation information, and updating the topology index according to the node operation type includes the following steps:

step S2051 is to receive the node operation information and determine the node operation type corresponding to the node operation information. If the node operation type is to add a node, step S2052 is executed; if the node operation type is a delete node, step S2053 is executed; if the node operation type is a modified node, step S2054 is executed.

Step S2052 is to determine a connection map between the added node and the corresponding candidate node, and update the topology map index.

Specifically, a candidate node corresponding to the newly added node is determined, a connection graph corresponding to the newly added node and the candidate node corresponding to the newly added node is further determined, and then the connection graph corresponding to the newly added node is mounted in the original topological graph index, so that the updated topological graph index is obtained.

According to a specific implementation manner of the embodiment of the present invention, the new node may be retrieved from the original topological graph index, and N (second quantity threshold) nodes closest to the new node (with the highest similarity) are retrieved as candidate nodes; meanwhile, if the newly added node can be used as a candidate node corresponding to other nodes, the connection graph corresponding to other nodes needs to be updated according to the newly added node, and then the topological graph index is updated by the updated connection graph.

Step S2053, the deleted node in the topological graph index is updated to be a pseudo node; wherein the dummy node is not placed in the set of target nodes.

For the node operation type of the deleted node, only the deleted node needs to be set as a pseudo node; during the process of searching, the navigation node can still inquire the pseudo node, only whether the pseudo node meets the requirement or not, and the pseudo node is not placed in the target node set.

Step S2054 is to set the node before modification as a deleted node, set the modified node as a newly added node, and update the topological graph index.

And step S206, receiving the retrieval data, and performing feature processing on the retrieval data to obtain a retrieval feature vector corresponding to the retrieval data.

Step S207, determining navigation nodes from the topological graph index, and constructing a target node set; and with the navigation node as a starting point, querying the topological graph index according to a greedy search algorithm, and placing the queried query node in the target node set.

The number of target nodes that can be placed in the set of target nodes is defined by a first number threshold. The replacement of existing target nodes in the target node set (that is, when target nodes placed in the target node set reach a first number threshold and then target nodes are newly added later, the original target nodes in the corresponding number need to be deleted), and the judgment needs to be performed according to the sum of the similarity between each target node in the target node set and the retrieval feature vector. Through the arrangement, the calculation amount is reduced, the retrieval accuracy is improved, and the retrieval efficiency is improved.

As shown in fig. 2c, black dots are navigation nodes that are fixedly set, dark gray dots are navigation nodes that are randomly determined, light gray dots are candidate nodes placed in a target node set in the query process, white dots are target nodes that are finally determined, and five-pointed stars are nodes corresponding to the retrieval data.

Taking navigation nodes (black dots and dark gray dots) as starting points, inquiring the topological graph index according to a greedy search algorithm, and placing inquired inquiry nodes (light gray dots) in a target node set; and updating the target nodes in the target node set according to the first quantity threshold and the similarity between the inquired query node and the retrieval feature vector until the sum of the similarities between each target node in the target node set and the retrieval feature vector is not increased any more.

And step S208, updating the target nodes in the target node set according to the first quantity threshold and the similarity between the inquired query node and the retrieval feature vector until the sum of the similarities between each target node in the target node set and the retrieval feature vector is not increased any more.

Step S209, the similarity between each target node in the target node set and the retrieval feature vector is sequenced, and the retrieval result is determined according to the sequencing result.

FIG. 3 is a schematic diagram of the main modules of a data retrieval device provided in accordance with an embodiment of the present invention; as shown in fig. 3, the data retrieving apparatus 300 according to the embodiment of the present invention mainly includes:

the receiving module 301 is configured to receive search data, perform feature processing on the search data, and obtain a search feature vector corresponding to the search data.

A target node set determining module 302, configured to determine a navigation node from the topological graph index, query the topological graph index from the navigation node, and determine a target node set according to a first quantity threshold and a similarity between a query node queried by the navigation node and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold value, a second quantity threshold value and the similarity among the plurality of characteristic vector nodes.

Specifically, according to the embodiment of the present invention, the data retrieving apparatus 300 further includes a topology index building module, configured to:

Further, according to an embodiment of the present invention, the topology index building module is further configured to:

Illustratively, according to an embodiment of the present invention, the topology index building module is further configured to:

Preferably, according to an embodiment of the present invention, the data retrieving apparatus 300 further includes an updating module, configured to:

Further, according to an embodiment of the present invention, the target node set determining module 302 is further configured to:

constructing a target node set;

And the sorting module 303 is configured to sort the similarity between each target node in the target node set and the search feature vector, and determine a search result according to the sorting result.

Fig. 4 shows an exemplary system architecture 400 to which the data retrieval method or data retrieval device of an embodiment of the present invention may be applied.

As shown in fig. 4, the system architecture 400 may include

terminal devices

401, 402, 403, a network 404, and a server 405 (this architecture is merely an example, and the components included in a particular architecture may be adapted according to application specific circumstances). The network 404 serves as a medium for providing communication links between the

terminal devices

401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use

terminal devices

401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The

terminal devices

401, 402, 403 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, data processing-like tools, data retrieval clients, social platform software, etc. (by way of example only).

The

terminal devices

401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 405 may be a server that provides various services, such as a server (for example only) for (data retrieval/data processing) users using the

terminal devices

401, 402, 403. The server may perform processing such as analysis on the received data search, and feed back the processing results (e.g., target node set, search results-just an example) to the terminal device.

It should be noted that the data retrieval method provided by the embodiment of the present invention is generally executed by the server 405, and accordingly, the data retrieval device is generally disposed in the server 405.

It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use with a terminal device or server implementing an embodiment of the invention is shown. The terminal device or the server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a receiving module, a target node set determination module, and an ordering module. The names of the modules do not limit the modules themselves in some cases, for example, the receiving module may also be described as a "module for receiving the search data, performing feature processing on the search data, and obtaining a search feature vector corresponding to the search data".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: receiving retrieval data, and performing feature processing on the retrieval data to obtain retrieval feature vectors corresponding to the retrieval data; determining navigation nodes from the topological graph index, inquiring the topological graph index from the navigation nodes, and determining a target node set according to a first quantity threshold and the similarity between the inquiry nodes inquired by the navigation nodes and the retrieval feature vector; the topological graph index is constructed according to an included angle threshold, a second quantity threshold and the similarity among a plurality of characteristic vector nodes; and sequencing the similarity between each target node in the target node set and the retrieval feature vector, and determining a retrieval result according to the sequencing result.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. a data retrieval method, is characterized in that, comprises:

Receiving retrieval data, performing feature processing on the retrieval data, and obtaining retrieval feature vectors corresponding to the retrieval data;

Determine the navigation node from the topology map index, query the topology map index from the navigation node, and determine the target according to the first number threshold and the similarity between the query node queried by the navigation node and the retrieval feature vector A node set; wherein, the topological map index is constructed according to an included angle threshold, a second quantity threshold and the similarity between multiple feature vector nodes;

Sort the similarity between each target node in the target node set and the retrieval feature vector, and determine the retrieval result according to the ranking result.

2. The data retrieval method according to claim 1, wherein the step of constructing the topological map index according to an included angle threshold, a second quantity threshold and the similarity between multiple eigenvector nodes comprises:

Acquire a plurality of stored data, perform feature extraction on the plurality of stored data, obtain a plurality of feature vector nodes corresponding to the plurality of stored data, and determine the similarity between the plurality of feature vector nodes;

According to the included angle threshold, the second quantity threshold and the similarity between the multiple feature vector nodes, determine the candidate node corresponding to each feature vector node; and determine each feature vector node and the corresponding candidate node A connection graph between nodes; wherein, the included angle threshold indicates the threshold corresponding to the included angle formed by each feature vector node and the connection line between any two feature vector nodes respectively;

The topology graph index is constructed according to the connection graph corresponding to each feature vector node.

3 . The data retrieval method according to claim 2 , wherein each feature vector is determined according to the included angle threshold, the second quantity threshold and the similarity between the plurality of feature vector nodes. 4 . The candidate node corresponding to the node also includes:

Determine the multi-level adjacent node corresponding to each feature vector node according to the similarity between the multiple feature vector nodes;

A candidate node corresponding to each feature vector node is determined from the multi-level adjacent nodes according to the included angle threshold and the second quantity threshold.

4. The data retrieval method according to claim 2, wherein the determining the similarity between the plurality of feature vector nodes further comprises:

Perform clustering processing on the plurality of feature vector nodes, and determine the similarity between the plurality of feature vector nodes according to the clustering processing result.

5. data retrieval method according to claim 2, is characterized in that,

If the node operation type is adding a node, determining a connection graph between the adding node and the corresponding candidate node, and updating the topology map index;

If the node operation type is delete node, update the deleted node in the topology map index to a pseudo node; wherein, the pseudo node is not placed in the target node set;

If the node operation type is a modification node, the node before modification is set as a deleted node, the modified node is set as a new node, and the topology map index is updated.

6 . The data retrieval method according to claim 1 , wherein the querying the topology map index from the navigation node is based on a first number threshold and the query node queried by the navigation node and the retrieval method. 7 . The similarity between the feature vectors to determine the target node set also includes:

Build a set of target nodes;

Taking the navigation node as the starting point, query the topology map index according to the greedy search algorithm, and place the queried query node in the target node set; according to the first number threshold and the queried query node and The similarity between the retrieval feature vectors updates the target nodes in the target node set, until the sum of the similarities between each target node in the target node set and the retrieval feature vector no longer increases .

7. The data retrieval method according to claim 1, wherein,

The number of the navigation nodes is at least one, and the at least one navigation node connects all the feature vector nodes in the topology map index.

8. A data retrieval device, characterized in that, comprising:

a receiving module, configured to receive retrieval data, perform feature processing on the retrieval data, and obtain retrieval feature vectors corresponding to the retrieval data;

The target node set determination module is used to determine the navigation node from the topology map index, query the topology map index from the navigation node, and query the query node and the retrieval feature vector according to the first number threshold and the navigation node query. The similarity between the two, to determine the target node set; wherein, the topological map index is constructed according to the angle threshold, the second quantity threshold and the similarity between multiple feature vector nodes;

The sorting module is used for sorting the similarity between each target node in the target node set and the retrieval feature vector, and determining the retrieval result according to the sorting result.

9. An electronic device, characterized in that, comprising:

one or more processors;

storage means for storing one or more programs,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.

10. A computer-readable medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method according to any one of claims 1-7 is implemented.