[go: up one dir, main page]

CN114566233B - Molecular screening method, device, electronic device and storage medium - Google Patents

Molecular screening method, device, electronic device and storage medium Download PDF

Info

Publication number
CN114566233B
CN114566233B CN202210155504.5A CN202210155504A CN114566233B CN 114566233 B CN114566233 B CN 114566233B CN 202210155504 A CN202210155504 A CN 202210155504A CN 114566233 B CN114566233 B CN 114566233B
Authority
CN
China
Prior art keywords
molecule
vertex
mapping
graph
mappings
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210155504.5A
Other languages
Chinese (zh)
Other versions
CN114566233A (en
Inventor
周景博
郑书豪
窦德景
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210155504.5A priority Critical patent/CN114566233B/en
Publication of CN114566233A publication Critical patent/CN114566233A/en
Application granted granted Critical
Publication of CN114566233B publication Critical patent/CN114566233B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/40Searching chemical structures or physicochemical data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs

Landscapes

  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提供了分子筛选的方法、装置、电子设备及存储介质,涉及人工智能技术领域中的深度学习技术领域、图论技术领域和生物与信息技术领域。其中方法为:获取待筛选分子的第一标签图和参考分子的第二标签图,每个待筛选分子与参考分子组成分子对,针对每个分子对,获取第一标签图和第二标签图的顶点之间的映射和映射之间的冲突信息,以生成分子对的映射图,对映射图进行采样,获取分子对的最大权重全连接子图,根据每个分子对的最大权重全连接子图,筛选出与参考分子相似度最大的待筛选分子,作为目标分子。提出了一种基于高斯玻色采样实现分子筛选的框架,从而能够应用高斯玻色采样高效地实现基于配体的药物虚拟筛选的问题,提高分子筛选的效率。

The present disclosure provides a method, device, electronic device and storage medium for molecular screening, and relates to the fields of deep learning technology, graph theory technology and biological and information technology in the field of artificial intelligence technology. The method is as follows: obtain a first label map of the molecule to be screened and a second label map of the reference molecule, each molecule to be screened and the reference molecule form a molecule pair, for each molecule pair, obtain the mapping between the vertices of the first label map and the second label map and the conflict information between the mappings to generate a mapping map of the molecule pair, sample the mapping map, obtain the maximum weight fully connected subgraph of the molecule pair, and according to the maximum weight fully connected subgraph of each molecule pair, screen out the molecule to be screened with the greatest similarity to the reference molecule as the target molecule. A framework for molecular screening based on Gaussian boson sampling is proposed, so that Gaussian boson sampling can be applied to efficiently realize the problem of ligand-based drug virtual screening and improve the efficiency of molecular screening.

Description

Molecular screening method, device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of deep learning, graph theory and biology and information in the technical field of artificial intelligence, in particular to a molecular screening method, a molecular screening device, an electronic device and a storage medium.
Background
The principle of the glass color sampling is that the same photon is sent into a quantum optical system, and an output state of the quantum optical system is measured by using a photon counter, the Gaussian glass color sampling is used as an extension of the glass color sampling, and can be applied to solving the problem of graph combination, however, how to better apply the Gaussian glass color sampling to solve the specific problem, and adding a landing scene of the Gaussian glass color sampling is still a huge challenge, wherein how to apply the Gaussian glass color sampling to efficiently realize ligand-based drug virtual screening, and how to improve the effect of molecular screening has become a problem to be solved urgently.
Disclosure of Invention
Provided are a method, an apparatus, an electronic device, and a storage medium for molecular screening.
According to a first aspect, a molecular screening method is provided, which comprises the steps of obtaining a first tag image of molecules to be screened in a molecular set to be screened and a second tag image of reference molecules, forming a molecular pair by each molecule to be screened and the reference molecules, obtaining mapping between peaks of the first tag image and the second tag image and conflict information between the mapping for each molecular pair, generating a mapping image of the molecular pair based on the conflict information between the mapping and the mapping, sampling the mapping image, obtaining a maximum weight full-connection subgraph of the molecular pair, and screening the molecules to be screened with the maximum similarity to the reference molecules from the molecular set to be screened according to the maximum weight full-connection subgraph of each molecular pair to be used as target molecules.
According to a second aspect, a molecular screening device is provided, which comprises an acquisition module, a mapping module and a sampling module, wherein the acquisition module is used for acquiring a first tag image of molecules to be screened and a second tag image of reference molecules in a set of molecules to be screened, the mapping module is used for forming a molecule pair by each molecule to be screened and the reference molecules, for each molecule pair, acquiring mapping between the peaks of the first tag image and the second tag image and conflict information between the mapping, and generating a mapping image of the molecule pair based on the conflict information between the mapping and the mapping, the sampling module is used for sampling the mapping image to acquire a maximum weight full-connection subgraph of the molecule pair, and the screening module is used for screening molecules to be screened with the maximum similarity with the reference molecules from the set of molecules to be screened according to the maximum weight full-connection subgraph of each molecule pair to be used as target molecules.
According to a third aspect there is provided an electronic device comprising at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of molecular screening of the first aspect of the present disclosure.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of molecular screening according to the first aspect of the present disclosure.
According to a fifth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of molecular screening according to the first aspect of the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow diagram of a method of molecular screening according to a first embodiment of the present disclosure;
FIG. 2 is a flow diagram of a method of molecular screening according to a second embodiment of the present disclosure;
FIG. 3 is a flow diagram of a method of molecular screening according to a third embodiment of the present disclosure;
FIG. 4 is a flow diagram of a method of molecular screening according to a fourth embodiment of the present disclosure;
FIG. 5 is a flow diagram of a method of molecular screening according to a fifth embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a tag map of a generated molecule in a method of molecular screening according to an embodiment of the present disclosure;
FIG. 7 is a flow diagram of a method of molecular screening according to a sixth embodiment of the present disclosure;
FIG. 8 is a flow diagram of a method of molecular screening according to a seventh embodiment of the present disclosure;
FIG. 9 is an overall schematic of a method of molecular screening according to an embodiment of the present disclosure;
FIG. 10 is a block diagram of an apparatus for molecular screening according to a first embodiment of the present disclosure;
FIG. 11 is a block diagram of an apparatus for molecular screening according to a second embodiment of the present disclosure;
Fig. 12 is a block diagram of an electronic device used to implement the methods of embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Artificial intelligence (ARTIFICIALINTELLIGENCE, AI for short) is a piece of technical science that studies, develops theories, methods, techniques and application systems for simulating, extending and expanding human intelligence. At present, the AI technology has the advantages of high automation degree, high accuracy and low cost, and is widely applied.
Deep learning (DEEP LEARNING, DL for short) is a new research direction in the field of machine learning (MACHINE LEARNING, ML for short), and learns the internal rules and presentation hierarchy of sample data, and the information obtained in the learning process is greatly helpful for explaining data such as text, images and sounds. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data. The method mainly comprises a convolutional operation-based neural network system, a multi-layer neuron-based self-coding neural network, and a deep confidence network for further optimizing the weight of the neural network by combining identification information by pre-training in a multi-layer self-coding neural network mode. Deep learning has achieved many results in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, speech, recommendation and personalization techniques, and other related fields. The deep learning makes the machine imitate the activities of human beings such as audio-visual and thinking, solves a plurality of complex pattern recognition problems, and makes the related technology of artificial intelligence greatly advanced.
Graph Theory (Graph Theory) is a branch of mathematics. It takes the graph as the study object. A graph in a graph theory is a graph formed of a number of given points and lines connecting the two points, and this graph is generally used to describe a specific relationship between something, where the points represent something, and the lines connecting the two points represent that there is such a relationship between the corresponding two things.
Biotechnology (Biotechnology) is a comprehensive technology based on life sciences, which uses the characteristics and functions of living beings (or biological tissues, cells and other components) to design, construct new substances or new lines with desired properties, and process products or provide services in combination with engineering principles. The information technology (Information Science) is a technology for acquiring, transmitting and processing research information, and is formed by combining computer technology, communication technology and microelectronic technology, namely, the information is processed by a computer, and modern electronic communication technology is used for acquiring, storing, processing, utilizing information and manufacturing related products, developing technology and new subjects of information service. Information technology and biotechnology are both high and new technologies, and the two technologies are not mutually related in new economy, but complement each other, so that the rapid development of the 21 st century economy is jointly promoted.
Methods, apparatuses, electronic devices, and storage media for molecular screening according to embodiments of the present disclosure are described below with reference to the accompanying drawings.
Fig. 1 is a flow diagram of a method of molecular screening according to a first embodiment of the present disclosure.
As shown in fig. 1, the method for molecular screening according to the embodiments of the present disclosure may specifically include the following steps:
s101, acquiring a first tag map of molecules to be screened in a set of molecules to be screened and a second tag map of reference molecules.
In particular, the main body of the molecular screening method according to the embodiments of the present disclosure may be a molecular screening apparatus provided by the embodiments of the present disclosure, where the molecular screening apparatus may be a hardware device having a data information processing capability and/or software necessary for driving the hardware device to operate. Alternatively, the execution body may include a workstation, a server, a computer, a user terminal, and other devices. The user terminal comprises, but is not limited to, a mobile phone, a computer, intelligent voice interaction equipment, intelligent household appliances, vehicle-mounted terminals and the like.
The molecular screening method of the embodiment of the disclosure can be applied to the realization of Ligand-based drug virtual screening (LVS for short), and potential drug molecules are screened from a molecular database according to the principle that molecules with similar structures tend to have similar attributes.
In the embodiment of the disclosure, a plurality of molecules in a molecular set to be screened (such as a molecular database) are used as the molecules to be screened, and each molecule to be screened is compared with a reference molecule, so as to obtain a target molecule with higher similarity with the reference molecule, wherein the molecule can be a drug molecule. In some embodiments, for each molecule to be screened in the set of molecules to be screened, a tag map of the molecule to be screened (molecule a) is obtained as a first tag map (G A), and a tag map of the reference molecule (molecule B) is obtained as a second tag map (G B), wherein the relationship of the tag map vertices and edges corresponds to the structure of the molecule, which can be regarded as an undirected map of features comprising the molecule.
S102, each molecule to be screened and the reference molecule form a molecule pair, conflict information between mapping between vertexes of the first label graph and the second label graph is obtained for each molecule pair, and a mapping graph of the molecule pair is generated based on the conflict information between the mapping.
In the embodiment of the disclosure, each molecule to be screened forms a molecule pair with a reference molecule, and for each molecule pair, a mapping between vertices of a first tag map of the molecule to be screened and a second tag map of the reference molecule in the molecule pair, for example, a mapping between vertex a 1 in the first tag map G A and vertex B 1 in the second tag map G B, is obtained, where the mapping may be understood as constructing a correspondence between vertex a 1 and vertex B 1. It should be noted that the obtained mapping should be all possible mappings existing between vertices of the first label graph and the second label graph. Furthermore, conflict information between any two of all possible mappings is obtained, which can be understood as if there is a conflict between two mappings, then the two mappings cannot exist at the same time. Embodiments of the present disclosure generate a map (G AB) of the molecular pair based on the acquired map and conflict information between the maps.
And S103, sampling the mapping graph to obtain a maximum weight full-connection subgraph of the molecular pair.
In the embodiment of the disclosure, the map G AB is an undirected graph, including a vertex set, an edge set and a weight set, and the map of the molecular pair is sampled based on gaussian glass color sampling (Gaussian Boson Sampling, abbreviated as GBS) to obtain a maximum weight connection sub-graph of the molecular pair, where the maximum weight full connection sub-graph with better effect can be obtained by greedy shrinkage, local expansion and sampling post-processing by using historical sampling information.
S104, according to the maximum weight full-connection subgraph of each molecule pair, selecting the molecule to be screened with the maximum similarity with the reference molecule from the molecule set to be screened, and taking the molecule to be screened as the target molecule.
In the embodiment of the disclosure, a maximum weight full-connection subgraph of each molecule pair is obtained, the similarity between the molecules to be screened and the reference molecule is obtained according to the maximum weight full-connection subgraphs, the molecules to be screened with the maximum similarity with the reference molecule are screened from the molecule set to be screened, and the molecule to be screened with the maximum similarity is taken as the target molecule.
In summary, in the molecular screening method according to the embodiment of the present disclosure, a first tag map of a molecule to be screened and a second tag map of a reference molecule in a set of molecules to be screened are obtained, each molecule to be screened and the reference molecule form a molecule pair, for each molecule pair, mapping between vertices of the first tag map and the second tag map and conflict information between the mapping are obtained, a map of the molecule pair is generated based on the conflict information between the mapping and the mapping, the map is sampled, a maximum weight full-connection sub-map of the molecule pair is obtained, and the molecule to be screened with the maximum similarity to the reference molecule is screened from the set of molecules to be screened as a target molecule according to the maximum weight full-connection sub-map of each molecule pair. According to the method, molecules are modeled into an undirected graph with labels (namely characteristics), a mapping graph of molecular pairs is generated based on the label graph, a maximum weight full-connection subgraph of the mapping graph is obtained by utilizing Gaussian glass color sampling, a similarity score of the molecular pairs is obtained based on the maximum weight full-connection subgraph, and a framework for realizing molecular screening based on Gaussian glass color sampling is provided, so that the problem of virtual screening of medicines based on ligands can be effectively realized by utilizing Gaussian glass color sampling, and the efficiency of molecular screening is improved.
Fig. 2 is a flow diagram of a method of molecular screening according to a second embodiment of the present disclosure.
As shown in fig. 2, the method for molecular screening according to the embodiment of the disclosure may specifically include the following steps based on the embodiment shown in fig. 1:
S201, acquiring a first tag map of molecules to be screened in a set of molecules to be screened and a second tag map of reference molecules.
The step S102 may specifically include the steps of:
s202, performing key feature matching on each first vertex in the first label graph and each second vertex in the second label graph respectively to obtain mapping between the first vertex and the second vertex.
In the embodiment of the present disclosure, each vertex in the first label graph G A is regarded as a first vertex Ai, each vertex in the second label graph G B is regarded as a second vertex Bj, a key feature of one of the first vertices Ai is obtained as a first key feature, and a key feature of one of the second vertices Bj is obtained as a second key feature, whether the attributes of the first key feature and the second key feature are the same is determined, and if at least one attribute of the key feature is the same, a mapping between the first vertex Ai and the second vertex Bj is generated.
The above-mentioned key feature is a feature of an atom or a ring corresponding to a vertex, and the feature is plural and may be classified into a key feature and a non-key feature.
S203, obtaining two maps without conflict from all the maps to form a map pair, and forming edges between the two maps included in the map pair.
In the embodiment of the disclosure, according to the conflict information of the acquired mappings, two mappings without conflict are acquired from all the mappings, and the two mappings are determined as a pair of mappings without conflict. In generating the map of the molecular pairs, edges are formed between the two maps corresponding to the map pairs, and it is understood that the edges are not formed between the maps where there is a conflict.
S204, obtaining the weight corresponding to the mapping.
In the embodiment of the disclosure, for each mapping, the weight corresponding to the mapping is obtained according to the atomic numbers and the characteristics of the first vertex and the second vertex corresponding to the mapping.
S205, generating a mapping diagram by mapping to the vertexes of the mapping diagram, edges between conflict-free mapping pairs and the weight of the mapping.
In the disclosed embodiment, each possible mapping is taken as a vertex of the mapping graph, edges are formed between conflict-free mapping pairs, and a weight set is formed according to the weight of each mapping, so that the mapping graph comprising the vertex set, the edge set and the weight set is formed.
S206, sampling the mapping graph to obtain a maximum weight full-connection subgraph of the molecular pair.
S207, selecting the molecules to be screened with the maximum similarity with the reference molecules from the molecule set to be screened according to the maximum weight full-connection subgraph of each molecule pair, and taking the molecules to be screened as target molecules.
In the embodiment of the present disclosure, step S201 is the same as step S101 in the above embodiment, and steps S206 to S207 are the same as steps S103 to S104, which are not described herein.
On the basis of the above embodiment, as shown in fig. 3, the "obtaining two maps without collision from all maps to form a map pair" in step S203 may include the following steps:
S301, selecting two maps from all the maps, and determining a first vertex and a second vertex which correspond to the two maps respectively.
In the embodiment of the disclosure, two mappings are arbitrarily selected from all mappings of the first label graph and the second label graph, and a first vertex and a second vertex corresponding to each of the two mappings are determined. For example, one mapping corresponds to first vertex A 1 and second vertex B 1, and the other mapping corresponds to first vertex A 2 and second vertex B 2.
S302, obtaining a first atom or a first ring represented by a first vertex corresponding to each of the two mappings.
In an embodiment of the present disclosure, the atoms characterized by the first vertex a 1 (i.e., the first atom and the first ring) and the atoms or rings characterized by a 2 (i.e., the first atom or the first ring) are obtained.
S303, obtaining a second atom or a second ring represented by a second vertex corresponding to each of the two mappings.
In an embodiment of the present disclosure, the atoms characterized by the second vertex B 1 (i.e., the second atom and the second ring) and the atoms or rings characterized by B 2 (i.e., the second atom or the second ring) are obtained.
S304, in response to that no atomic conflict and no distance conflict exist between the two first atoms or the first rings, and no atomic conflict and no distance conflict exist between the two second atoms or the second rings, determining that the two mappings are two mappings without conflict, and forming a mapping pair.
In an embodiment of the present disclosure, it is determined whether there is an atomic conflict and a distance conflict between two first atoms or first rings, and whether there is an atomic conflict and a distance conflict between two second atoms or second rings:
The atomic conflict can be determined by comparing two first atoms or first rings and two second atoms or second rings, and determining that the two mappings have no atomic conflict if the same atoms are not present in the two first atoms or first rings and the same atoms are not present in the two second atoms or second rings, which can be represented by the following formula:
Wherein A 1 and A 2 in equation one represent a single atom or a collection of atoms in a ring characterized by two first vertices, respectively, and B 1 and B 2 represent a single atom or a collection of atoms in a ring characterized by two second vertices, respectively.
The distance conflict may be determined by obtaining a first distance between two first atoms or first rings, obtaining a second distance between two second atoms or second rings, and determining that there is no distance conflict for the two mappings in response to a difference between the first distance and the second distance being less than or equal to a preset distance.
It will be appreciated that the similarity of the molecular pairs depends on the number of labels of the common substructure between the molecules, that is, the similarity of the molecular pairs is determined based on the maximum common substructure (Labelled Maximum Common Substructure, abbreviated as LMCS) between the molecular pairs, and the maximum weight full-connection sub-graph of the mapping graph of the molecular pairs is obtained by using GBS, so as to be equivalent to the maximum common substructure between the molecular pairs, and in order to ensure that the obtained maximum weight full-connection sub-graph can effectively correspond to the common substructure, the distances between two vertices mapped to two vertices in the same label graph are approximately the same as a condition without distance collision. For example, for two mappings, the difference between the distance between two atoms corresponding to the first vertex of the two mappings (first distance) and the distance between two atoms corresponding to the second vertex of the two mappings (second distance) does not exceed a preset distance. Can be expressed by the following formula two:
|dist(A1,A2)-dist(B1,B2)|>τ
Wherein A 1 and A 2 in equation one represent a single atom or ring characterized by two first vertices, respectively, B 1 and B 2 represent a single atom or ring characterized by two second vertices, respectively, the first and second distances may be calculated based on Euclidean distances, wherein the 3D coordinates of the atoms are taken as atomic coordinates, the 3D coordinates of the geometric center of the ring are taken as coordinates of the ring, and τ may be taken as coordinates of the ring in some embodiments Is arranged as
On the basis of the above embodiment, the step of "obtaining the weight corresponding to the mapping" in step S204 may include the steps of determining, for each mapping, a first vertex and a second vertex corresponding to the mapping, generating the weight of the mapping according to the number of atoms and the characteristics of each of the first vertex and the second vertex, and the characteristics being the characteristics of atoms or rings corresponding to the vertices.
On the basis of the above embodiment, as shown in fig. 4, the method for molecular screening according to the embodiment of the present disclosure further includes a process of generating the mapped weight according to the number of atoms and the characteristics of each of the first vertex and the second vertex, which may be implemented by:
S401, obtaining an average value of the atomic number of the first vertex and the atomic number of the second vertex.
In the embodiment of the disclosure, for each mapping, an average value of the number of atoms of the first vertex and the number of atoms of the second vertex corresponding to the mapping is obtained.
S402, obtaining the number of the features with the same attribute or value between the first vertex and the second vertex.
In the embodiment of the present disclosure, feature matching is performed on a feature corresponding to a first vertex (i.e., a feature of an atom or a ring represented by the first vertex) and a feature corresponding to a second vertex (i.e., a feature of an atom or a ring represented by the second vertex), where feature matching may be performed by comparing whether the attribute or the value of the feature is the same or not, so as to obtain the number of features with the same attribute or value between the first vertex and the second vertex.
S403, determining the weight of the mapping between the first vertex and the second vertex according to the average value and the attribute or the number of the same characteristics.
In the embodiment of the present disclosure, for each mapping, the weight between the first vertex and the second vertex is determined according to the obtained average value and the attribute or the number of features with the same value, and may be expressed by the formula three:
Wherein V AB in formula three is the vertex set of map G AB=(VAB,EAB,WAB), m i represents the ith map in the map, a represents the first vertex corresponding to the ith map, a i represents a single atom or a set of multiple atoms in a ring represented by the first vertex, B represents the second vertex corresponding to the ith map, B i represents a single atom or a set of multiple atoms in a ring represented by the second vertex, and L a∩Lb represents the number of features having the same attribute or value between the first vertex and the second vertex.
On the basis of the above embodiment, as shown in fig. 5, the method for molecular screening according to the embodiment of the present disclosure further includes a process of generating a tag map corresponding to a molecule, where the molecule may be a molecule to be screened or a reference molecule, and may be implemented by the following steps:
S501, generating an undirected graph of any molecule according to the molecular structure of any molecule, wherein the top point in the undirected graph corresponds to a single atom of the molecule or a ring containing a plurality of atoms, and the side in the undirected graph corresponds to a chemical bond in the molecule.
In the embodiment of the disclosure, an undirected graph can be generated by taking atoms as vertexes and bonds as edges, for a ring containing multiple atoms, the ring is contracted into a single vertex, and for two rings with common bonds, an auxiliary edge is added between corresponding vertexes in the graph, and as shown in fig. 6, a label graph G A and a label graph G B are constructed by molecules A and molecules B.
S502, extracting the characteristics of each atom in any molecule, and generating a characteristic label set of any molecule based on the extracted characteristics, wherein the characteristic label set comprises the characteristics of a single atom and the characteristics of a ring, and the characteristics of the ring are obtained by polymerizing the characteristics of a plurality of atoms contained in the ring.
In the embodiment of the present disclosure, as shown in table 1, the features of the atoms may include a plurality of atomic numbers, implicit H bonds, form charges, degrees, and the like, the features of each atom included in the molecule are extracted, the feature type of the extracted features is identified, and key feature tags of the extracted features are determined based on the feature type, wherein the key feature tags are used to indicate whether the features are key features, and a feature tag set is generated based on the extracted features, the feature types of the features, and the key feature tags of the features.
Among other things, the chemical and pharmaceutical characteristics of atoms are considered in embodiments of the present disclosure, and the characteristics are classified into two types, critical (C) and non-critical (NC) based on a "critical" mechanism, to facilitate mapping. Wherein the feature types include additive chemical features, non-additive chemical features, and pharmaceutical features, the non-additive chemical features and the pharmaceutical features being key features, the additive chemical features being non-key features.
TABLE 1 atomic feature and Key feature tag correspondence table
Wherein "(" drug ") is used to label a pharmaceutical feature.
The ring characteristics can be obtained by polymerizing the characteristics of atoms contained in the ring. For example, for additive features (i.e., implicit H-bonds, formal charge, and degrees), we add the corresponding values for each atom in each ring. For non-additive features (i.e., atomic number and attached chemical bonds), we maintain an ordered list. Finally, for the other seven pharmaceutical features, if any atom in the ring has that feature, we set its feature tag to "true" and if no atom has that feature, we set the feature tag to "false".
S503, generating a label graph corresponding to any molecule according to the undirected graph and the characteristic label set of any molecule, wherein any molecule is a molecule to be screened, the corresponding label graph is a first label graph, any molecule is a reference molecule, and the corresponding label graph is a second label graph.
In the disclosed embodiment, two molecules are denoted as a and B, and their corresponding tag graphs are denoted as G A=(VA,EA,LA) and G B=(VB,EB,LB), respectively), the tag graphs including a vertex set V, an edge set E, and a feature tag set L.
On the basis of the above embodiment, as shown in fig. 7, the step S207 of "selecting the molecule to be screened having the greatest similarity with the reference molecule from the set of molecules to be screened according to the maximum weight full-connection subgraph of each molecule pair" may include the following steps:
S701, obtaining the similarity between the label graphs of the molecules to be screened in the molecular pairs and the reference molecules based on the maximum full-connection graph of the molecular pairs.
S702, screening out the molecules to be screened with the maximum similarity with the reference molecule from the molecule set to be screened, and taking the molecules to be screened as target molecules.
On the basis of the above embodiment, as shown in fig. 8, the step of "obtaining the similarity between the tag maps of the molecules to be screened and the reference molecules in the pair of molecules" in the step S701 based on the maximum full-ligation map of the pair of molecules may include the following steps:
S801, a first total weight of a first tag map is acquired.
In the embodiment of the present disclosure, a sum of weights of all vertices in the first label graph is taken as a first total weight, where the weight of each vertex may be determined by using a feature number corresponding to the vertex and an atomic number corresponding to the vertex.
S802, obtaining a second total weight of the second label graph.
In the embodiment of the present disclosure, the sum of the weights of all the vertices in the second label graph is taken as the second total weight, where the weight of each vertex may be determined by using the feature number corresponding to the vertex and the atomic number corresponding to the vertex.
S803, determining a third total weight of the maximum weight fully-connected subgraph based on the weights of the vertexes included in the maximum weight fully-connected subgraph.
S804, determining the similarity between the first label and the second label graph according to the first total weight, the second total weight and the third total weight.
In the disclosed embodiment, the average value of Bunke and Shearer similarity measures may be used as the similarity score, which may be expressed by the equation four:
Wherein w v is represented as the weight w v=|Lv |+|v| of the vertex in the label graph, where |v| is the number of atoms contained in the vertex v, M represents the vertex set of the maximum weight fully-connected subgraph, ω mi is the same as ω mi in equation three, and represents the weight of the vertex in the maximum weight fully-connected subgraph. When A and B are the same molecule, sim (A, B) =1, when When Sim (a, B) =0. Otherwise, sim (A, B) ε (0, 1). It can thus correctly characterize the similarity of molecules.
In summary, in the molecular screening method according to the embodiment of the present disclosure, a first tag map of a molecule to be screened and a second tag map of a reference molecule in a set of molecules to be screened are obtained, each molecule to be screened and the reference molecule form a molecule pair, for each molecule pair, mapping between vertices of the first tag map and the second tag map and conflict information between the mapping are obtained, a map of the molecule pair is generated based on the conflict information between the mapping and the mapping, the map is sampled, a maximum weight full-connection sub-map of the molecule pair is obtained, and the molecule to be screened with the maximum similarity to the reference molecule is screened from the set of molecules to be screened as a target molecule according to the maximum weight full-connection sub-map of each molecule pair. According to the method, molecules are modeled into an undirected graph with labels (namely characteristics), a mapping graph of molecular pairs is generated based on the label graph, a maximum weight full-connection subgraph of the mapping graph is obtained by utilizing Gaussian glass color sampling, a similarity score of the molecular pairs is obtained based on the maximum weight full-connection subgraph, and a framework for realizing molecular screening based on Gaussian glass color sampling is provided, so that the problem of virtual screening of medicines based on ligands can be effectively realized by utilizing Gaussian glass color sampling, and the efficiency of molecular screening is improved.
For clarity of describing the molecular screening method according to the embodiments of the present disclosure, reference is now made to fig. 9, fig. 9 is an overall schematic diagram of the molecular screening method according to the embodiments of the present disclosure, as shown in fig. 9, a molecule a is used as a molecule to be screened, a first tag graph G A corresponding to the molecule a to be screened and a second tag graph G B corresponding to the reference molecule B are generated according to the structure of the molecule, respectively, a mapping graph G AB corresponding to the molecule a to be screened and the reference molecule B is established according to the first tag graph G A and the second tag graph G B, a maximum weight full-connection sub-graph (Maximum Weight Clique, abbreviated as MWC) of the mapping graph G AB is generated based on the GBS algorithm, and a target atom (gray solid circle as shown in fig. 9) is mapped back to the first tag graph and the second tag graph, i.e. a common sub-structure of the two molecules is determined, so as to generate a similarity score (i.e. similarity) of the first tag graph and the second tag graph.
Fig. 10 is a block diagram of an apparatus for molecular screening according to a first embodiment of the present disclosure.
As shown in fig. 10, the molecular screening apparatus 1000 according to the embodiment of the present disclosure includes an obtaining module 1001, a mapping module 1002, a sampling module 1003, and a screening module 1004.
An obtaining module 1001 is configured to obtain a first tag map of molecules to be screened in a set of molecules to be screened and a second tag map of reference molecules.
The mapping module 1002 is configured to form a molecular pair from each molecule to be screened and a reference molecule, obtain, for each molecular pair, mapping between vertices of the first label graph and the second label graph and conflict information between the mapping, and generate a mapping graph of the molecular pair based on the mapping and the conflict information between the mapping.
And the sampling module 1003 is used for sampling the mapping graph to acquire the maximum weight full-connection subgraph of the molecular pair.
And a screening module 1004, configured to screen the molecules to be screened with the highest similarity with the reference molecule from the set of molecules to be screened as target molecules according to the maximum weight full-connection subgraph of each molecule pair.
It should be noted that the explanation of the above embodiments of the method for molecular screening is also applicable to the device for molecular screening according to the embodiments of the present disclosure, and specific processes are not described herein.
In summary, the device for molecular screening according to the embodiments of the present disclosure obtains a first tag map of molecules to be screened and a second tag map of reference molecules in a set of molecules to be screened, each molecule to be screened and the reference molecules form a pair of molecules, for each pair of molecules, conflict information between mapping between vertices of the first tag map and the second tag map is obtained, a map of the pair of molecules is generated based on the conflict information between the mapping and the mapping, the map is sampled, a maximum weight full connection subgraph of the pair of molecules is obtained, and the molecule to be screened with the maximum similarity to the reference molecule is screened from the set of molecules to be screened as a target molecule according to the maximum weight full connection subgraph of each pair of molecules. According to the method, molecules are modeled into an undirected graph with labels (namely characteristics), a mapping graph of molecular pairs is generated based on the label graph, a maximum weight full-connection subgraph of the mapping graph is obtained by utilizing Gaussian glass color sampling, a similarity score of the molecular pairs is obtained based on the maximum weight full-connection subgraph, and a framework for realizing molecular screening based on Gaussian glass color sampling is provided, so that the problem of virtual screening of medicines based on ligands can be effectively realized by utilizing Gaussian glass color sampling, and the efficiency of molecular screening is improved.
Fig. 11 is a block diagram of an apparatus for molecular screening according to a second embodiment of the present disclosure.
As shown in fig. 11, the apparatus 1100 for molecular screening according to an embodiment of the present disclosure includes an obtaining module 1101, a mapping module 1102, a sampling module 1103, and a screening module 1104.
The obtaining module 1101 has the same structure and function as the obtaining module 1001 in the previous embodiment, the mapping module 1102 has the same structure and function as the mapping module 1002 in the previous embodiment, the sampling module 1103 has the same structure and function as the sampling module 1003 in the previous embodiment, and the filtering module 1104 has the same structure and function as the filtering module 1004 in the previous embodiment.
Further, the obtaining module 1101 includes a first generating sub-module 11011 configured to generate an undirected graph of any molecule according to a molecular structure of any molecule, wherein a vertex in the undirected graph corresponds to a single atom of the molecule or a ring containing a plurality of atoms, and an edge in the undirected graph corresponds to a chemical bond in the molecule, a second generating sub-module 11012 configured to extract a feature of each atom in any molecule and generate a feature tag set of any molecule based on the extracted feature, where the feature tag set includes a feature of the single atom and a feature of the ring, and the feature of the ring is obtained by polymerizing features of a plurality of atoms contained in the ring, and a third generating sub-module 11013 configured to generate a tag graph corresponding to any molecule according to the undirected graph and the feature tag set of any molecule, where any molecule is a molecule to be screened, the corresponding tag graph is a first tag graph, and any molecule is a reference molecule, and the corresponding tag graph is a second tag graph.
Further, the second generating submodule 11012 includes a first determining unit configured to identify a feature type of the extracted feature and determine a key feature tag of the extracted feature based on the feature type, the key feature tag indicating whether the feature is a key feature, and a second generating unit configured to generate a feature tag set based on the extracted feature, the feature type of the feature, and the key feature tag of the feature.
Further, the molecule is a drug molecule, and the feature types include an additive chemical feature, a non-additive chemical feature, and a pharmaceutical feature, the non-additive chemical feature and the pharmaceutical feature being key features, the additive chemical feature being a non-key feature.
Further, the mapping module 1102 includes a feature matching sub-module, configured to match each first vertex in the first label graph with each second vertex in the second label graph to obtain a mapping between the first vertex and the second vertex, a fourth generating sub-module, configured to obtain two maps without collision from all the maps to form a mapping pair, and form an edge between the two maps included in the mapping pair, a first obtaining sub-module, configured to obtain a weight corresponding to the mapping, and a fifth generating sub-module, configured to generate the mapping graph with the mapping as the vertex of the mapping graph, the edge between the mapping pair without collision, and the weight of the mapping.
Further, the fourth generation submodule comprises a second determination unit, a first acquisition unit and a second acquisition unit, wherein the second determination unit is used for selecting two mappings from all the mappings and determining a first vertex and a second vertex which are respectively corresponding to the two mappings, the first acquisition unit is used for acquiring a first atom or a first ring which are characterized by the first vertex which is respectively corresponding to the two mappings, the second acquisition unit is used for acquiring a second atom or a second ring which is characterized by the second vertex which is respectively corresponding to the two mappings, and the third determination unit is used for determining that the two mappings are collision-free and form a mapping pair if no atom conflict and no distance conflict exist between the two first atoms or the first rings and no atom conflict and no distance conflict exists between the two second atoms or the second rings.
Further, the third determining unit comprises an atom comparison subunit for atom comparing two first atoms or first rings and atom comparing two second atoms or second rings, and a second determining subunit for determining that no atomic conflict exists in the two mappings in response to the fact that the same atom does not exist in the two first atoms or the first rings and the same atom does not exist in the two second atoms or the second rings.
Further, the third determining unit comprises a first obtaining subunit, a second obtaining subunit and a second determining subunit, wherein the first obtaining subunit is used for obtaining a first distance between two first atoms or first rings, the second obtaining subunit is used for obtaining a second distance between two second atoms or second rings, and the second determining subunit is used for determining that no distance conflict exists between the two mappings in response to the difference value between the first distance and the second distance is smaller than or equal to a preset distance.
Further, the feature matching sub-module comprises a third obtaining unit, a judging unit and a second generating unit, wherein the third obtaining unit is used for obtaining first key features of one of the first vertexes and second key features of one of the second vertexes, the judging unit is used for judging whether the attributes of the first key features and the second key features are identical, and the second generating unit is used for generating a mapping between one of the first vertexes and one of the second vertexes if the attributes of at least one key feature are identical.
Further, the obtaining submodule comprises a third generating unit, a first generating unit and a second generating unit, wherein the third generating unit is used for determining a first vertex and a second vertex corresponding to each mapping, and generating the weight of the mapping according to the number of atoms and the characteristics of the first vertex and the second vertex, and the characteristics are the characteristics of atoms or rings corresponding to the vertices.
Further, the third generating unit comprises a third obtaining subunit, a fourth obtaining subunit and a third determining unit, wherein the third obtaining subunit is used for obtaining the average value of the atomic number of the first vertex and the atomic number of the second vertex, the fourth obtaining subunit is used for obtaining the number of the characteristics with the same attribute or value between the first vertex and the second vertex, and the third determining unit is used for determining the mapping weight between the first vertex and the second vertex according to the average value and the number of the characteristics with the same attribute or value.
Further, the sampling module 1103 includes a sampling sub-module, configured to sample the map based on a gaussian glass sampling algorithm, so as to obtain the maximum weight fully-connected subgraph.
Further, the screening module 1104 includes a second obtaining submodule, configured to obtain a similarity between a to-be-screened molecule in the molecular pair and a tag map of the reference molecule based on a maximum full-connection map of the molecular pair, and a screening submodule, configured to screen out a to-be-screened molecule with the maximum similarity with the reference molecule from the to-be-screened molecule set, as a target molecule.
Further, the second obtaining sub-module comprises a fourth obtaining unit, a fifth obtaining unit, a third determining unit and a fourth determining unit, wherein the fourth obtaining unit is used for obtaining the first total weight of the first label graph, the fifth obtaining unit is used for obtaining the second total weight of the second label graph, the third determining unit is used for determining the third total weight of the maximum weight full-connection sub-graph based on the weight of the vertex included in the maximum weight full-connection sub-graph, and the fourth determining unit is used for determining the similarity between the first label graph and the second label graph according to the first total weight, the second total weight and the third total weight.
In summary, the device for molecular screening according to the embodiments of the present disclosure obtains a first tag map of molecules to be screened and a second tag map of reference molecules in a set of molecules to be screened, each molecule to be screened and the reference molecules form a pair of molecules, for each pair of molecules, conflict information between mapping between vertices of the first tag map and the second tag map is obtained, a map of the pair of molecules is generated based on the conflict information between the mapping and the mapping, the map is sampled, a maximum weight full connection subgraph of the pair of molecules is obtained, and the molecule to be screened with the maximum similarity to the reference molecule is screened from the set of molecules to be screened as a target molecule according to the maximum weight full connection subgraph of each pair of molecules. According to the method, molecules are modeled into an undirected graph with labels (namely characteristics), a mapping graph of molecular pairs is generated based on the label graph, a maximum weight full-connection subgraph of the mapping graph is obtained by utilizing Gaussian glass color sampling, a similarity score of the molecular pairs is obtained based on the maximum weight full-connection subgraph, and a framework for realizing molecular screening based on Gaussian glass color sampling is provided, so that the problem of virtual screening of medicines based on ligands can be effectively realized by utilizing Gaussian glass color sampling, and the efficiency of molecular screening is improved.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 12 shows a schematic block diagram of an example electronic device 1200 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 12, the electronic device 1200 includes a computing unit 1201 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the electronic device 1200 may also be stored. The computing unit 1201, the ROM1202, and the RAM 1203 are connected to each other via a bus 1204. An input/output (I/O) interface 1205 is also connected to the bus 1204.
Various components in the electronic device 1200 are connected to the I/O interface 1205, including an input unit 1206, such as a keyboard, mouse, etc., an output unit 1207, such as various types of displays, speakers, etc., a storage unit 1208, such as a magnetic disk, optical disk, etc., and a communication unit 1209, such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the electronic device 1200 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunications networks.
The computing unit 1201 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The computing unit 1201 performs the various methods and processes described above, such as the methods of molecular screening shown in fig. 1-9. For example, in some embodiments, the method of molecular screening may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into the RAM1203 and executed by the computing unit 1201, one or more steps of the semantic parsing method described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured to perform the method of molecular screening in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user, for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), the Internet, and a blockchain network.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual PRIVATE SERVER" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
According to an embodiment of the present disclosure, the present disclosure further provides a computer program product comprising a computer program, wherein the computer program when executed by a processor realizes the steps of the method of molecular screening according to the embodiments of the present disclosure described above.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (31)

1.一种分子筛选的方法,包括:1. A method for molecular screening, comprising: 获取待筛选分子集中待筛选分子的第一标签图和参考分子的第二标签图,其中每个标签图为包括该标签图所对应分子的特征的无向图,所述无向图中的顶点对应所述分子的单个原子或包含多个原子的环,所述无向图中的边对应所述分子中的化学键;Obtaining a first label graph of the to-be-screened molecule in the to-be-screened molecule set and a second label graph of the reference molecule, wherein each label graph is an undirected graph including features of the molecule corresponding to the label graph, a vertex in the undirected graph corresponds to a single atom of the molecule or a ring containing multiple atoms, and an edge in the undirected graph corresponds to a chemical bond in the molecule; 每个所述待筛选分子与所述参考分子组成分子对,针对每个所述分子对,获取所述第一标签图和所述第二标签图的顶点之间的映射和所述映射之间的冲突信息,并基于所述映射和所述映射之间的冲突信息生成所述分子对的映射图;其中,所述顶点之间的映射对所述顶点之间进行关键特征匹配得到;所述映射之间的冲突信息,由两个映射各自对应的第一顶点所表征的第一原子或第一环之间的原子冲突和距离冲突,以及所述两个映射各自对应的第二顶点所表征的第二原子或第二环之间的原子冲突和距离冲突确定,所述第一顶点为所述第一标签图中的顶点,所述第二顶点为所述第二标签图中的顶点;Each of the molecules to be screened and the reference molecule form a molecule pair, and for each of the molecule pairs, the mapping between the vertices of the first label graph and the second label graph and the conflict information between the mappings are obtained, and a mapping graph of the molecule pair is generated based on the mapping and the conflict information between the mappings; wherein the mapping between the vertices is obtained by matching key features between the vertices; the conflict information between the mappings is determined by the atomic conflict and distance conflict between the first atoms or the first rings represented by the first vertices corresponding to the two mappings, and the atomic conflict and distance conflict between the second atoms or the second rings represented by the second vertices corresponding to the two mappings, the first vertex being the vertex in the first label graph, and the second vertex being the vertex in the second label graph; 对所述映射图进行采样,获取所述分子对的最大权重全连接子图;Sampling the mapping graph to obtain a maximum weighted fully connected subgraph of the molecule pair; 根据每个所述分子对的最大权重全连接子图,从所述待筛选分子集中筛选出与所述参考分子相似度最大的待筛选分子,作为目标分子。According to the maximum weighted fully connected subgraph of each of the molecular pairs, the molecule to be screened having the greatest similarity to the reference molecule is screened out from the set of molecules to be screened as the target molecule. 2.根据权利要求1所述的方法,其中,获取任一分子对应的标签图,所述任一分子为所述待筛选分子或为所述参考分子,包括:2. The method according to claim 1, wherein obtaining a label map corresponding to any molecule, wherein the any molecule is the molecule to be screened or the reference molecule, comprises: 根据所述任一分子的分子结构,生成所述任一分子的无向图,所述无向图中的顶点对应所述分子的单个原子或包含多个原子的环,所述无向图中的边对应所述分子中的化学键;According to the molecular structure of any one of the molecules, an undirected graph of the any one of the molecules is generated, wherein the vertices in the undirected graph correspond to single atoms or rings containing multiple atoms of the molecule, and the edges in the undirected graph correspond to chemical bonds in the molecule; 提取所述任一分子中每个原子的特征,并基于提取的所述特征生成所述任一分子的特征标签集,其中所述特征标签集中包括所述单个原子的特征和所述环的特征,所述环的特征由所述环包含的多个原子的特征聚合得到;Extracting features of each atom in any one of the molecules, and generating a feature tag set of any one of the molecules based on the extracted features, wherein the feature tag set includes features of the single atom and features of the ring, and the features of the ring are obtained by aggregating features of multiple atoms contained in the ring; 根据所述任一分子的无向图和特征标签集生成所述任一分子对应的标签图,其中,所述任一分子为所述待筛选分子,所述对应的标签图为所述第一标签图,所述任一分子为所述参考分子,所述对应的标签图为所述第二标签图。A label graph corresponding to any one of the molecules is generated according to the undirected graph and the feature label set of any one of the molecules, wherein any one of the molecules is the molecule to be screened, the corresponding label graph is the first label graph, any one of the molecules is the reference molecule, and the corresponding label graph is the second label graph. 3.根据权利要求2所述的方法,其中,所述基于提取的所述特征生成所述任一分子的特征标签集,包括:3. The method according to claim 2, wherein generating a feature tag set of any one molecule based on the extracted features comprises: 识别提取到的所述特征的特征类型,并基于所述特征类型确定提取的所述特征的关键特征标签,所述关键特征标签指示所述特征是否为关键特征;Identifying a feature type of the extracted feature, and determining a key feature tag of the extracted feature based on the feature type, wherein the key feature tag indicates whether the feature is a key feature; 基于提取的所述特征、所述特征的特征类型和所述特征的关键特征标签,生成所述特征标签集。The feature tag set is generated based on the extracted features, the feature types of the features and the key feature tags of the features. 4.根据权利要求3所述的方法,其中,所述分子为药物分子,所述特征类型包括加性化学特征、非加性化学特征和药学特征,所述非加性化学特征和药学特征为关键特征,所述加性化学特征为非关键特征。4. The method according to claim 3, wherein the molecule is a drug molecule, the feature types include additive chemical features, non-additive chemical features and pharmaceutical features, the non-additive chemical features and pharmaceutical features are key features, and the additive chemical features are non-key features. 5.根据权利要求1-4任一项所述的方法,其中,所述针对每个所述分子对,获取所述第一标签图和所述第二标签图的顶点之间的映射和所述映射之间的冲突信息,并基于所述映射和所述映射之间的冲突信息生成所述分子对的映射图,包括:5. The method according to any one of claims 1 to 4, wherein, for each of the molecular pairs, obtaining mappings between vertices of the first label graph and the second label graph and conflict information between the mappings, and generating a mapping graph of the molecular pair based on the mappings and the conflict information between the mappings, comprises: 将所述第一标签图中的每个第一顶点,分别与所述第二标签图中的每个第二顶点进行关键特征匹配,以获取所述第一顶点和所述第二顶点之间的映射;Match each first vertex in the first label graph with each second vertex in the second label graph for key features, so as to obtain a mapping between the first vertex and the second vertex; 从所有所述映射中获取无冲突的两个映射形成映射对,并在所述映射对包括的两个映射之间形成边;Obtain two non-conflicting mappings from all the mappings to form a mapping pair, and form an edge between the two mappings included in the mapping pair; 获取所述映射对应的权重;Obtaining a weight corresponding to the mapping; 以所述映射为所述映射图的顶点、所述无冲突的映射对之间的边和所述映射的权重生成所述映射图。The mapping graph is generated with the mappings as vertices of the mapping graph, the edges between the conflict-free mapping pairs, and the weights of the mappings. 6.根据权利要求5所述的方法,其中,所述从所有所述映射中获取无冲突的两个映射形成映射对,包括:6. The method according to claim 5, wherein the step of obtaining two non-conflicting mappings from all the mappings to form a mapping pair comprises: 从所有所述映射中选取两个映射,确定所述两个映射各自对应的第一顶点和第二顶点;Select two mappings from all the mappings, and determine the first vertex and the second vertex corresponding to the two mappings respectively; 获取所述两个映射各自对应的第一顶点所表征的第一原子或第一环;Obtaining the first atom or the first ring represented by the first vertices corresponding to the two mappings respectively; 获取所述两个映射各自对应的第二顶点所表征的第二原子或第二环;Obtaining a second atom or a second ring represented by a second vertex corresponding to each of the two mappings; 响应于两个所述第一原子或第一环之间未存在原子冲突和距离冲突,且两个所述第二原子或第二环之间未存在原子冲突和距离冲突,则确定所述两个映射为无冲突的两个映射,并形成所述映射对。In response to no atom conflict and no distance conflict between the two first atoms or first rings, and no atom conflict and no distance conflict between the two second atoms or second rings, the two mappings are determined to be two non-conflicting mappings, and the mapping pair is formed. 7.根据权利要求6所述的方法,其中,所述原子冲突的确定,包括:7. The method according to claim 6, wherein the determination of the atomic conflict comprises: 将两个所述第一原子或第一环进行原子比对,以及将两个所述第二原子或第二环进行原子比对;Performing an atomic alignment on the two first atoms or first rings, and performing an atomic alignment on the two second atoms or second rings; 响应于两个所述第一原子或第一环中未存在同一原子,且两个所述第二原子或第二环中未存在同一原子,则确定所述两个映射未存在原子冲突。In response to the same atom not existing in two of the first atoms or the first ring, and the same atom not existing in two of the second atoms or the second ring, it is determined that there is no atom conflict between the two mappings. 8.根据权利要求6所述的方法,其中,所述距离冲突的确定,包括:8. The method according to claim 6, wherein the determination of the distance conflict comprises: 获取两个所述第一原子或第一环之间的第一距离;Obtaining a first distance between two of the first atoms or first rings; 获取两个所述第二原子或第二环之间的第二距离;obtaining a second distance between two of the second atoms or second rings; 响应于所述第一距离与所述第二距离之间的差值小于或等于预设距离,则确定所述两个映射未存在距离冲突。In response to a difference between the first distance and the second distance being less than or equal to a preset distance, it is determined that there is no distance conflict between the two mappings. 9.根据权利要求5所述的方法,其中,基于关键特征匹配生成所述映射的过程,包括:9. The method according to claim 5, wherein the process of generating the mapping based on key feature matching comprises: 获取其中一个第一顶点的第一关键特征,以及其中一个第二顶点的第二关键特征;Obtaining a first key feature of one of the first vertices and a second key feature of one of the second vertices; 判断所述第一关键特征与所述第二关键特征的属性是否相同;Determining whether the attributes of the first key feature and the second key feature are the same; 若存在至少一个关键特征的属性相同,则生成所述其中一个第一顶点与所述其中一个第二顶点之间的所述映射。If there is at least one key feature with the same attribute, the mapping between the one of the first vertices and the one of the second vertices is generated. 10.根据权利要求5所述的方法,其中,所述获取所述映射对应的权重,包括:10. The method according to claim 5, wherein the obtaining the weight corresponding to the mapping comprises: 针对每个所述映射,确定所述映射对应的第一顶点和第二顶点,根据所述第一顶点和所述第二顶点各自的原子个数和特征,生成所述映射的权重,所述特征为所述顶点对应的原子或环的特征。For each of the mappings, a first vertex and a second vertex corresponding to the mapping are determined, and a weight of the mapping is generated according to the number of atoms and features of each of the first vertex and the second vertex, where the features are features of atoms or rings corresponding to the vertex. 11.根据权利要求10所述的方法,其中,所述根据所述第一顶点和所述第二顶点各自的原子个数和特征,生成所述映射的权重,包括:11. The method according to claim 10, wherein generating the weight of the mapping according to the number and characteristics of atoms of the first vertex and the second vertex respectively comprises: 获取所述第一顶点的原子个数和所述第二顶点的原子个数的平均值;Obtaining an average value of the number of atoms at the first vertex and the number of atoms at the second vertex; 获取所述第一顶点和所述第二顶点之间属性或值相同的特征的个数;Obtain the number of features having the same attributes or values between the first vertex and the second vertex; 根据所述平均值和所述属性或值相同的特征的个数,确定所述第一顶点和所述第二顶点之间的所述映射的权重。A weight of the mapping between the first vertex and the second vertex is determined according to the average value and the number of features having the same attribute or value. 12.根据权利要求1所述的方法,其中,所述对所述映射图进行采样,获取所述分子对的最大权重全连接子图,包括:12. The method according to claim 1, wherein sampling the mapping graph to obtain the maximum weighted fully connected subgraph of the molecule pair comprises: 基于高斯玻色采样算法对所述映射图进行采样,得到所述最大权重全连接子图。The mapping graph is sampled based on a Gaussian boson sampling algorithm to obtain the maximum weighted fully connected subgraph. 13.根据权利要求1所述的方法,其中,所述根据每个所述分子对的最大权重全连接子图,从所述待筛选分子集中筛选出与所述参考分子相似度最大的待筛选分子,作为目标分子,包括:13. The method according to claim 1, wherein the step of selecting the molecule to be screened having the greatest similarity to the reference molecule from the set of molecules to be screened as the target molecule according to the maximum weighted fully connected subgraph of each molecule pair comprises: 基于所述分子对的最大全连接图,获取所述分子对中所述待筛选分子和所述参考分子的标签图之间的相似度;Based on the maximum fully connected graph of the molecule pair, obtaining the similarity between the label graphs of the molecule to be screened and the reference molecule in the molecule pair; 从所述待筛选分子集中筛选出与所述参考分子相似度最大的待筛选分子,作为目标分子。The molecule to be screened having the greatest similarity to the reference molecule is screened out from the set of molecules to be screened as the target molecule. 14.根据权利要求13所述的方法,其中,所述基于所述分子对的最大全连接图,获取所述分子对中所述待筛选分子和所述参考分子的标签图之间的相似度,包括:14. The method according to claim 13, wherein the obtaining the similarity between the label graphs of the molecule to be screened and the reference molecule in the molecule pair based on the maximum fully connected graph of the molecule pair comprises: 获取所述第一标签图的第一总权重;Obtaining a first total weight of the first label graph; 获取所述第二标签图的第二总权重;Obtaining a second total weight of the second label graph; 基于所述最大权重全连接子图所包括顶点的权重,确定所述最大权重全连接子图的第三总权重;Determining a third total weight of the maximum weight fully connected subgraph based on the weights of the vertices included in the maximum weight fully connected subgraph; 根据所述第一总权重、所述第二总权重和所述第三总权重,确定所述第一标签和所述第二标签图之间的相似度。The similarity between the first label and the second label graph is determined according to the first total weight, the second total weight and the third total weight. 15.一种分子筛选的装置,包括:15. A molecular screening device, comprising: 获取模块,用于获取待筛选分子集中待筛选分子的第一标签图和参考分子的第二标签图,其中每个标签图为包括该标签图所对应分子的特征的无向图,所述无向图中的顶点对应所述分子的单个原子或包含多个原子的环,所述无向图中的边对应所述分子中的化学键;An acquisition module, used to acquire a first label graph of the to-be-screened molecule in the to-be-screened molecule set and a second label graph of the reference molecule, wherein each label graph is an undirected graph including features of the molecule corresponding to the label graph, a vertex in the undirected graph corresponds to a single atom of the molecule or a ring containing multiple atoms, and an edge in the undirected graph corresponds to a chemical bond in the molecule; 映射模块,用于每个所述待筛选分子与所述参考分子组成分子对,针对每个所述分子对,获取所述第一标签图和所述第二标签图的顶点之间的映射和所述映射之间的冲突信息,并基于所述映射和所述映射之间的冲突信息生成所述分子对的映射图;其中,所述顶点之间的映射对所述顶点之间进行关键特征匹配得到;所述映射之间的冲突信息,由两个映射各自对应的第一顶点所表征的第一原子或第一环之间的原子冲突和距离冲突,以及所述两个映射各自对应的第二顶点所表征的第二原子或第二环之间的原子冲突和距离冲突确定,所述第一顶点为所述第一标签图中的顶点,所述第二顶点为所述第二标签图中的顶点;A mapping module, used for each of the molecules to be screened and the reference molecule to form a molecule pair, for each of the molecule pairs, obtaining the mapping between the vertices of the first label map and the second label map and the conflict information between the mappings, and generating a mapping map of the molecule pair based on the mapping and the conflict information between the mappings; wherein the mapping between the vertices is obtained by matching key features between the vertices; the conflict information between the mappings is determined by the atomic conflict and distance conflict between the first atoms or the first rings represented by the first vertices corresponding to the two mappings, and the atomic conflict and distance conflict between the second atoms or the second rings represented by the second vertices corresponding to the two mappings, the first vertex is a vertex in the first label map, and the second vertex is a vertex in the second label map; 采样模块,用于对所述映射图进行采样,获取所述分子对的最大权重全连接子图;A sampling module, used for sampling the mapping graph to obtain a maximum weighted fully connected subgraph of the molecule pair; 筛选模块,用于根据每个所述分子对的最大权重全连接子图,从所述待筛选分子集中筛选出与所述参考分子相似度最大的待筛选分子,作为目标分子。The screening module is used to screen out the molecule to be screened with the greatest similarity to the reference molecule from the molecule set to be screened according to the maximum weighted fully connected subgraph of each molecule pair as the target molecule. 16.根据权利要求15所述的装置,其中,所述获取模块,包括:16. The device according to claim 15, wherein the acquisition module comprises: 第一生成子模块,用于根据任一分子的分子结构,生成所述任一分子的无向图,所述无向图中的顶点对应所述分子的单个原子或包含多个原子的环,所述无向图中的边对应所述分子中的化学键,所述任一分子为所述待筛选分子或为所述参考分子;A first generating submodule is used to generate an undirected graph of any molecule according to the molecular structure of any molecule, wherein the vertices in the undirected graph correspond to single atoms of the molecule or a ring containing multiple atoms, and the edges in the undirected graph correspond to chemical bonds in the molecule, and the any molecule is the molecule to be screened or the reference molecule; 第二生成子模块,提取所述任一分子中每个原子的特征,并基于提取的所述特征生成所述任一分子的特征标签集,其中所述特征标签集中包括所述单个原子的特征和所述环的特征,所述环的特征由所述环包含的多个原子的特征聚合得到;A second generation submodule extracts features of each atom in any one of the molecules, and generates a feature tag set of any one of the molecules based on the extracted features, wherein the feature tag set includes features of the single atom and features of the ring, and the features of the ring are obtained by aggregating features of multiple atoms contained in the ring; 第三生成子模块,用于根据所述任一分子的无向图和特征标签集生成所述任一分子对应的标签图,其中,所述任一分子为所述待筛选分子,所述对应的标签图为所述第一标签图,所述任一分子为所述参考分子,所述对应的标签图为所述第二标签图。The third generating submodule is used to generate a label graph corresponding to any molecule according to the undirected graph and feature label set of any molecule, wherein any molecule is the molecule to be screened, the corresponding label graph is the first label graph, and any molecule is the reference molecule, and the corresponding label graph is the second label graph. 17.根据权利要求16所述的装置,其中,所述第二生成子模块,包括:17. The apparatus according to claim 16, wherein the second generating submodule comprises: 第一确定单元,用于识别提取到的所述特征的特征类型,并基于所述特征类型确定提取的所述特征的关键特征标签,所述关键特征标签指示所述特征是否为关键特征;A first determining unit, configured to identify a feature type of the extracted feature, and determine a key feature tag of the extracted feature based on the feature type, wherein the key feature tag indicates whether the feature is a key feature; 第二生成单元,用于基于提取的所述特征、所述特征的特征类型和所述特征的关键特征标签,生成所述特征标签集。The second generating unit is used to generate the feature tag set based on the extracted features, feature types of the features and key feature tags of the features. 18.根据权利要求17所述的装置,其中,所述分子为药物分子,所述特征类型包括加性化学特征、非加性化学特征和药学特征,所述非加性化学特征和药学特征为关键特征,所述加性化学特征为非关键特征。18. The device according to claim 17, wherein the molecule is a drug molecule, the feature types include additive chemical features, non-additive chemical features and pharmaceutical features, the non-additive chemical features and pharmaceutical features are key features, and the additive chemical features are non-key features. 19.根据权利要求15-18任一项所述的装置,其中,所述映射模块,包括:19. The apparatus according to any one of claims 15 to 18, wherein the mapping module comprises: 特征匹配子模块,用于将所述第一标签图中的每个第一顶点,分别与所述第二标签图中的每个第二顶点进行关键特征匹配,以获取所述第一顶点和所述第二顶点之间的映射;A feature matching submodule, used for performing key feature matching on each first vertex in the first label graph with each second vertex in the second label graph, so as to obtain a mapping between the first vertex and the second vertex; 第四生成子模块,用于从所有所述映射中获取无冲突的两个映射形成映射对,并在所述映射对包括的两个映射之间形成边;A fourth generating submodule, configured to obtain two non-conflicting mappings from all the mappings to form a mapping pair, and to form an edge between the two mappings included in the mapping pair; 第一获取子模块,用于获取所述映射对应的权重;A first acquisition submodule, used to acquire a weight corresponding to the mapping; 第五生成子模块,用于以所述映射为所述映射图的顶点、所述无冲突的映射对之间的边和所述映射的权重生成所述映射图。A fifth generating submodule is used to generate the mapping graph with the mappings as vertices of the mapping graph, the edges between the conflict-free mapping pairs and the weights of the mappings. 20.根据权利要求19所述的装置,其中,所述第四生成子模块,包括:20. The apparatus according to claim 19, wherein the fourth generating submodule comprises: 第二确定单元,用于从所有所述映射中选取两个映射,确定所述两个映射各自对应的第一顶点和第二顶点;A second determining unit, configured to select two mappings from all the mappings, and determine first vertices and second vertices corresponding to the two mappings respectively; 第一获取单元,用于获取所述两个映射各自对应的第一顶点所表征的第一原子或第一环;A first acquisition unit, used for acquiring a first atom or a first ring represented by a first vertex corresponding to each of the two mappings; 第二获取单元,用于获取所述两个映射各自对应的第二顶点所表征的第二原子或第二环;A second acquisition unit, used for acquiring second atoms or second rings represented by second vertices corresponding to the two mappings; 第三确定单元,用于响应于两个所述第一原子或第一环之间未存在原子冲突和距离冲突,且两个所述第二原子或第二环之间未存在原子冲突和距离冲突,则确定所述两个映射为无冲突的两个映射,并形成所述映射对。The third determination unit is used to determine that the two mappings are two non-conflicting mappings and form the mapping pair in response to the fact that there is no atomic conflict and distance conflict between the two first atoms or first rings and there is no atomic conflict and distance conflict between the two second atoms or second rings. 21.根据权利要求20所述的装置,其中,所述第三确定单元,包括:21. The apparatus according to claim 20, wherein the third determining unit comprises: 原子比对子单元,用于将两个所述第一原子或第一环进行原子比对,以及将两个所述第二原子或第二环进行原子比对;an atomic alignment subunit, used for atomically aligning two of the first atoms or first rings, and atomically aligning two of the second atoms or second rings; 第二确定子单元,用于响应于两个所述第一原子或第一环中未存在同一原子,且两个所述第二原子或第二环中未存在同一原子,则确定所述两个映射未存在原子冲突。The second determining subunit is configured to determine that there is no atomic conflict between the two mappings in response to the fact that the same atom does not exist in the two first atoms or the first ring and the same atom does not exist in the two second atoms or the second ring. 22.根据权利要求20所述的装置,其中,所述第三确定单元,包括:22. The apparatus according to claim 20, wherein the third determining unit comprises: 第一获取子单元,用于获取两个所述第一原子或第一环之间的第一距离;A first acquisition subunit, used for acquiring a first distance between two of the first atoms or first rings; 第二获取子单元,用于获取两个所述第二原子或第二环之间的第二距离;A second acquisition subunit, used for acquiring a second distance between two of the second atoms or second rings; 第二确定子单元,用于响应于所述第一距离与所述第二距离之间的差值小于或等于预设距离,则确定所述两个映射未存在距离冲突。The second determining subunit is configured to determine that there is no distance conflict between the two mappings in response to a difference between the first distance and the second distance being less than or equal to a preset distance. 23.根据权利要求19所述的装置,其中,所述特征匹配子模块,包括:23. The apparatus according to claim 19, wherein the feature matching submodule comprises: 第三获取单元,用于获取其中一个第一顶点的第一关键特征,以及其中一个第二顶点的第二关键特征;A third acquisition unit, used to acquire a first key feature of one of the first vertices and a second key feature of one of the second vertices; 判断单元,用于判断所述第一关键特征与所述第二关键特征的属性是否相同;A judging unit, configured to judge whether the attributes of the first key feature and the second key feature are the same; 第二生成单元,用于若存在至少一个关键特征的属性相同,则生成所述其中一个第一顶点与所述其中一个第二顶点之间的所述映射。The second generating unit is configured to generate the mapping between the one of the first vertices and the one of the second vertices if there is at least one key feature with the same attribute. 24.根据权利要求19所述的装置,其中,所述获取子模块,包括:24. The device according to claim 19, wherein the acquisition submodule comprises: 第三生成单元,用于针对每个所述映射,确定所述映射对应的第一顶点和第二顶点,根据所述第一顶点和所述第二顶点各自的原子个数和特征,生成所述映射的权重,所述特征为所述顶点对应的原子或环的特征。The third generating unit is used to determine, for each of the mappings, a first vertex and a second vertex corresponding to the mapping, and generate a weight of the mapping according to the number of atoms and features of each of the first vertex and the second vertex, where the features are features of the atoms or rings corresponding to the vertex. 25.根据权利要求24所述的装置,其中,所述第三生成单元,包括:25. The apparatus according to claim 24, wherein the third generating unit comprises: 第三获取子单元,用于获取所述第一顶点的原子个数和所述第二顶点的原子个数的平均值;A third acquisition subunit, used to acquire an average value of the number of atoms at the first vertex and the number of atoms at the second vertex; 第四获取子单元,用于获取所述第一顶点和所述第二顶点之间属性或值相同的特征的个数;a fourth acquisition subunit, configured to acquire the number of features having the same attributes or values between the first vertex and the second vertex; 第三确定单元,用于根据所述平均值和所述属性或值相同的特征的个数,确定所述第一顶点和所述第二顶点之间的所述映射的权重。The third determining unit is used to determine the weight of the mapping between the first vertex and the second vertex according to the average value and the number of features having the same attribute or value. 26.根据权利要求15所述的装置,其中,所述采样模块,包括:26. The apparatus according to claim 15, wherein the sampling module comprises: 采样子模块,用于基于高斯玻色采样算法对所述映射图进行采样,得到所述最大权重全连接子图。The sampling submodule is used to sample the mapping graph based on a Gaussian boson sampling algorithm to obtain the maximum weight fully connected subgraph. 27.根据权利要求15所述的装置,其中,所述筛选模块,包括:27. The device according to claim 15, wherein the screening module comprises: 第二获取子模块,用于基于所述分子对的最大全连接图,获取所述分子对中所述待筛选分子和所述参考分子的标签图之间的相似度;A second acquisition submodule is used to acquire the similarity between the label graphs of the molecule to be screened and the reference molecule in the molecule pair based on the maximum fully connected graph of the molecule pair; 筛选子模块,用于从所述待筛选分子集中筛选出与所述参考分子相似度最大的待筛选分子,作为目标分子。The screening submodule is used to screen out the molecule to be screened with the greatest similarity to the reference molecule from the set of molecules to be screened as the target molecule. 28.根据权利要求27所述的装置,其中,所述第二获取子模块,包括:28. The device according to claim 27, wherein the second acquisition submodule comprises: 第四获取单元,用于获取所述第一标签图的第一总权重;A fourth acquisition unit, configured to acquire a first total weight of the first label graph; 第五获取单元,用于获取所述第二标签图的第二总权重;A fifth acquiring unit, configured to acquire a second total weight of the second label graph; 第三确定单元,用于基于所述最大权重全连接子图所包括顶点的权重,确定所述最大权重全连接子图的第三总权重;A third determining unit, configured to determine a third total weight of the maximum weight fully connected subgraph based on the weights of the vertices included in the maximum weight fully connected subgraph; 第四确定单元,用于根据所述第一总权重、所述第二总权重和所述第三总权重,确定所述第一标签和所述第二标签图之间的相似度。A fourth determining unit is used to determine the similarity between the first label and the second label map according to the first total weight, the second total weight and the third total weight. 29.一种电子设备,包括:29. An electronic device comprising: 至少一个处理器;以及at least one processor; and 与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein, 所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-14中任一项所述的方法。The memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to any one of claims 1 to 14. 30.一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1-14中任一项所述的方法。30. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to perform the method according to any one of claims 1-14. 31.一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-14中任一项所述方法的步骤。31. A computer program product comprising a computer program, which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 14.
CN202210155504.5A 2022-02-21 2022-02-21 Molecular screening method, device, electronic device and storage medium Active CN114566233B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210155504.5A CN114566233B (en) 2022-02-21 2022-02-21 Molecular screening method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210155504.5A CN114566233B (en) 2022-02-21 2022-02-21 Molecular screening method, device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN114566233A CN114566233A (en) 2022-05-31
CN114566233B true CN114566233B (en) 2025-04-11

Family

ID=81713741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210155504.5A Active CN114566233B (en) 2022-02-21 2022-02-21 Molecular screening method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114566233B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115116564B (en) * 2022-07-26 2022-11-25 之江实验室 Reverse virtual screening platform and method based on programmable quantum computing
CN117423379B (en) * 2023-12-19 2024-03-15 合肥微观纪元数字科技有限公司 Molecular screening method and related device adopting quantum computation
CN117954002B (en) * 2024-01-05 2024-10-22 苏州腾迈医药科技有限公司 Method, device and medium for displaying molecular pair relation
CN117912601B (en) * 2024-01-26 2024-11-01 苏州腾迈医药科技有限公司 Method and device for displaying molecular conformation and medium
CN117828374B (en) * 2024-03-06 2024-05-07 北京玻色量子科技有限公司 Molecular similarity calculation method and device based on optical quantum computer

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695702A (en) * 2020-06-16 2020-09-22 腾讯科技(深圳)有限公司 Training method, device, equipment and storage medium of molecular generation model
CN112201313A (en) * 2020-09-15 2021-01-08 北京晶派科技有限公司 Automatic small molecule drug screening method and computing equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2381382B1 (en) * 2003-10-14 2017-12-06 Verseon Method and apparatus for analysis of molecular configurations and combinations
US20180268937A1 (en) * 2015-09-24 2018-09-20 Caris Science, Inc. Method, apparatus, and computer program product for analyzing biological data
US20190010533A1 (en) * 2017-06-05 2019-01-10 The Methodist Hospital System Methods for screening and selecting target agents from molecular databases
US20220101972A1 (en) * 2020-09-25 2022-03-31 Accenture Global Solutions Limited Machine learning systems for automated pharmaceutical molecule identification
CN112397157B (en) * 2020-10-28 2024-11-01 星药科技(北京)有限公司 Molecular generation method based on sub-graph-variation self-coding structure
CN113327644B (en) * 2021-04-09 2024-05-14 中山大学 Drug-target interaction prediction method based on deep embedding learning of graph and sequence

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695702A (en) * 2020-06-16 2020-09-22 腾讯科技(深圳)有限公司 Training method, device, equipment and storage medium of molecular generation model
CN112201313A (en) * 2020-09-15 2021-01-08 北京晶派科技有限公司 Automatic small molecule drug screening method and computing equipment

Also Published As

Publication number Publication date
CN114566233A (en) 2022-05-31

Similar Documents

Publication Publication Date Title
CN114566233B (en) Molecular screening method, device, electronic device and storage medium
EP4040401A1 (en) Image processing method and apparatus, device and storage medium
CN114550177B (en) Image processing method, text recognition method and device
CN112560862A (en) Text recognition method, device and electronic device
CN114549874A (en) Training method of multi-target image-text matching model, image-text retrieval method and device
CN113780098B (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN112633276A (en) Training method, recognition method, device, equipment and medium
CN112860993B (en) Method, device, equipment, storage medium and program product for classifying points of interest
CN112862006B (en) Training method, device and electronic equipment for image depth information acquisition model
CN114490998B (en) Text information extraction method and device, electronic equipment and storage medium
CN114648676A (en) Point cloud processing model training and point cloud instance segmentation method and device
CN115359308B (en) Model training method, device, equipment, storage medium and program for identifying difficult cases
CN113033194A (en) Training method, device, equipment and storage medium of semantic representation graph model
CN110807379A (en) A semantic recognition method, device, and computer storage medium
WO2022257614A1 (en) Training method and apparatus for object detection model, and image detection method and apparatus
CN114399784A (en) An automatic identification method and device based on CAD drawings
CN113343981A (en) Visual feature enhanced character recognition method, device and equipment
CN111862030A (en) A face composite image detection method, device, electronic device and storage medium
CN114359932A (en) Text detection method, text recognition method and device
CN111768005A (en) Training method, device, electronic device and storage medium for lightweight detection model
CN112784102A (en) Video retrieval method and device and electronic equipment
CN113723405A (en) Method and device for determining area outline and electronic equipment
CN116311298A (en) Information generation method, information processing device, electronic equipment and medium
CN114398434A (en) Structured information extraction method, device, electronic device and storage medium
CN114973333A (en) Human interaction detection method, device, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant