CN110299209B - Similar medical record search method, device, device and readable storage medium - Google Patents
Similar medical record search method, device, device and readable storage medium Download PDFInfo
- Publication number
- CN110299209B CN110299209B CN201910557217.5A CN201910557217A CN110299209B CN 110299209 B CN110299209 B CN 110299209B CN 201910557217 A CN201910557217 A CN 201910557217A CN 110299209 B CN110299209 B CN 110299209B
- Authority
- CN
- China
- Prior art keywords
- type
- similarity
- data
- subgraph
- graph structure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
 
- 
        - G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
 
- 
        - G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
 
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Public Health (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- General Physics & Mathematics (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供一种相似病历查找方法、装置、设备及可读存储介质,通过获取查询病历数据和多个历史病历数据;获取查询病历数据对应的查询图结构数据,以及各历史病历数据对应的历史图结构数据,其中,查询图结构数据和历史图结构数据都包括第一类子图和第二类子图,第二类子图的中间节点和叶子节点是对第一类子图进行特征识别得到的;根据根节点相似度、第一类子图相似度和第二类子图相似度,获取各历史图结构数据与查询图结构数据的相似程度;根据预设选择规则和相似程度,确定查询病历数据的相似病历查找结果,从而提取出病历数据中固有和可识别的子图,在比较中对相应子图内容的关联性进行度量,提高了相似病历查找的准确性。
The present invention provides a similar medical record search method, device, equipment and readable storage medium. By acquiring query medical record data and a plurality of historical medical record data; Graph structure data, wherein the query graph structure data and the historical graph structure data both include the first type of subgraph and the second type of subgraph, and the intermediate nodes and leaf nodes of the second type of subgraph are used to identify the features of the first type of subgraph Obtained; according to the similarity of the root node, the similarity of the first type of sub-graph and the similarity of the second type of sub-graph, the similarity degree of each historical graph structure data and the query graph structure data is obtained; according to the preset selection rules and similarity, determine The search results of similar medical records in the medical record data are inquired, and the inherent and identifiable subgraphs in the medical record data are extracted, and the correlation of the contents of the corresponding subgraphs is measured in the comparison, which improves the accuracy of similar medical records search.
Description
技术领域technical field
本发明涉及信息处理技术领域,尤其涉及一种相似病历查找方法、装置、设备及可读存储介质。The present invention relates to the technical field of information processing, and in particular, to a method, apparatus, device and readable storage medium for searching similar medical records.
背景技术Background technique
在医疗领域,相似病历检索在科研、临床上具有重大意义。例如在患者就诊时,医生可以快速查找与该患者相似的病历,并能及时通过相似病历的诊疗路径及效果做出有效的判断;或者,医生在针对某份病历进行病历分析或撰写病历报告时,可以通过借鉴具有一定相似度的历史病历,从中获取一些可参考的诊断意见与治疗方法;或者,在临床科研中,某些情况下需要从某份病历作为起始点,寻找更多的相似病历进行研究讨论。In the medical field, the retrieval of similar medical records is of great significance in scientific research and clinical practice. For example, when a patient visits a doctor, the doctor can quickly find the medical records similar to the patient, and can make effective judgments in time through the diagnosis and treatment path and effect of the similar medical records; , it is possible to obtain some reference diagnostic opinions and treatment methods by referring to historical medical records with a certain degree of similarity; or, in clinical research, in some cases, it is necessary to use a certain medical record as a starting point to find more similar medical records. Conduct research discussions.
目前的病历匹配检索方式,通常是对病历全文信息的检索。例如,以发热、呼吸不畅为关键词进行检索,可以将预存储病历中带有发热、呼吸不畅这两个关键词的所有病历都检索到。The current medical record matching retrieval method is usually the retrieval of full text information of medical records. For example, by retrieving fever and dyspnea as keywords, all medical records with the two keywords of fever and dyspnea in the pre-stored medical records can be retrieved.
但是,由于同样的症状其对应的疾病相差甚大,现有的相似病历检索方式的准确性不高。However, because the corresponding diseases of the same symptoms are quite different, the accuracy of the existing similar medical record retrieval methods is not high.
发明内容SUMMARY OF THE INVENTION
本发明实施例提供一种相似病历查找方法、装置、设备及可读存储介质,提高了相似病历查找的准确性和可靠性。Embodiments of the present invention provide a similar medical record search method, device, device and readable storage medium, which improve the accuracy and reliability of similar medical record search.
根据本发明的第一方面,提供一种相似病历查找方法,包括:According to a first aspect of the present invention, a method for searching similar medical records is provided, comprising:
获取查询病历数据和多个历史病历数据;Obtain and query medical record data and multiple historical medical record data;
获取所述查询病历数据对应的查询图结构数据,以及各所述历史病历数据对应的历史图结构数据,其中,所述查询图结构数据和所述历史图结构数据都包括第一类子图和第二类子图,所述第一类子图的中间节点为病历字段类别,所述第二类子图的中间节点和叶子节点是对所述第一类子图进行特征识别得到的;Obtain the query graph structure data corresponding to the query medical record data, and the history graph structure data corresponding to each of the historical medical record data, wherein the query graph structure data and the historical graph structure data both include the first type of subgraph and The second type of subgraph, the middle node of the first type of subgraph is a medical record field category, and the middle node and the leaf node of the second type of subgraph are obtained by performing feature recognition on the first type of subgraph;
根据各所述历史图结构数据与所述查询图结构数据中根节点相似度、第一类子图相似度和第二类子图相似度,获取各所述历史图结构数据与所述查询图结构数据的相似程度;Obtain each of the historical graph structural data and the query graph structure according to the similarity of the root node, the similarity of the first type of subgraph and the similarity of the second type of subgraph in each of the historical graph structural data and the query graph structural data the similarity of the data;
根据预设选择规则和所述相似程度,在所述多个历史病历数据中确定所述查询病历数据的相似病历查找结果,其中,所述相似病历查找结果对应的所述历史图结构数据,具有满足所述预设选择规则的所述相似程度。According to the preset selection rule and the similarity degree, a similar medical record search result of the query medical record data is determined from the plurality of historical medical record data, wherein the historical graph structure data corresponding to the similar medical record search result has The degree of similarity that satisfies the preset selection rule.
根据本发明的第二方面,提供一种相似病历查找装置,包括:According to a second aspect of the present invention, there is provided a similar medical record search device, comprising:
病历获取模块,用于获取查询病历数据和多个历史病历数据;The medical record acquisition module is used to acquire and query medical record data and multiple historical medical record data;
图结构化模块,用于获取所述查询病历数据对应的查询图结构数据,以及各所述历史病历数据对应的历史图结构数据,其中,所述查询图结构数据和所述历史图结构数据都包括第一类子图和第二类子图,所述第一类子图的中间节点为病历字段类别,所述第二类子图的中间节点和叶子节点是对所述第一类子图进行特征识别得到的;The graph structure module is used to obtain the query graph structure data corresponding to the query medical record data, and the historical graph structure data corresponding to each of the historical medical record data, wherein the query graph structure data and the historical graph structure data are both Including a first type of subgraph and a second type of subgraph, the middle node of the first type of subgraph is the medical record field category, and the middle node and leaf node of the second type of subgraph are related to the first type of subgraph. obtained by feature identification;
处理模块,用于根据各所述历史图结构数据与所述查询图结构数据中根节点相似度、第一类子图相似度和第二类子图相似度,获取各所述历史图结构数据与所述查询图结构数据的相似程度;The processing module is configured to obtain each of the historical graph structural data and the similarity of each of the historical graph structural data according to the similarity of the root node, the similarity of the first type of subgraph and the similarity of the second type of subgraph in each of the historical graph structural data and the query graph structural data. the similarity of the query graph structure data;
选择模块,用于根据预设选择规则和所述相似程度,在所述多个历史病历数据中确定所述查询病历数据的相似病历查找结果,其中,所述相似病历查找结果对应的所述历史图结构数据,具有满足所述预设选择规则的所述相似程度。A selection module, configured to determine a similar medical record search result of the query medical record data in the plurality of historical medical record data according to a preset selection rule and the similarity degree, wherein the history corresponding to the similar medical record search result The graph structure data has the similarity degree satisfying the preset selection rule.
根据本发明的第三方面,提供一种设备,包括:存储器、处理器以及计算机程序,所述计算机程序存储在所述存储器中,所述处理器运行所述计算机程序执行本发明第一方面及第一方面各种可能设计的所述相似病历查找方法。According to a third aspect of the present invention, a device is provided, comprising: a memory, a processor, and a computer program, where the computer program is stored in the memory, and the processor executes the computer program to execute the first aspect of the present invention and a computer program. In the first aspect, various possible designs of the similar medical record searching method are provided.
根据本发明的第四方面,提供一种可读存储介质,所述可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时用于实现本发明第一方面及第一方面各种可能设计的所述相似病历查找方法。According to a fourth aspect of the present invention, there is provided a readable storage medium, where a computer program is stored in the readable storage medium, and when the computer program is executed by a processor, is used to implement the first aspect and each of the first aspect of the present invention A possible design of the similar medical record finding method.
本发明提供的一种相似病历查找方法、装置、设备及可读存储介质,通过获取查询病历数据和多个历史病历数据;获取所述查询病历数据对应的查询图结构数据,以及各所述历史病历数据对应的历史图结构数据,其中,所述查询图结构数据和所述历史图结构数据都包括第一类子图和第二类子图,所述第一类子图的中间节点为病历字段类别,所述第二类子图的中间节点和叶子节点是对所述第一类子图进行特征识别得到的;根据各所述历史图结构数据与所述查询图结构数据中根节点相似度、第一类子图相似度和第二类子图相似度,获取各所述历史图结构数据与所述查询图结构数据的相似程度;根据预设选择规则和所述相似程度,在所述多个历史病历数据中确定所述查询病历数据的相似病历查找结果,从而提取出查询病历数据中固有的子图和可识别得到的子图,对查询病历数据和历史病历数据病历中的相应子图中数据的关联性进行度量,提高了相似病历查找的准确性。The present invention provides a similar medical record search method, device, equipment and readable storage medium, by acquiring query medical record data and a plurality of historical medical record data; Historical graph structure data corresponding to medical record data, wherein both the query graph structural data and the historical graph structural data include a first type of subgraph and a second type of subgraph, and the intermediate node of the first type of subgraph is a medical record Field category, the intermediate nodes and leaf nodes of the second type of subgraph are obtained by feature identification of the first type of subgraph; according to the similarity between the root node in each of the historical graph structure data and the query graph structure data , the similarity of the first type of sub-graph and the similarity of the second type of sub-graph, to obtain the similarity degree of each of the historical graph structure data and the query map structure data; according to the preset selection rule and the similarity degree, in the The similar medical record search results of the query medical record data are determined in a plurality of historical medical record data, so as to extract the inherent subgraphs and identifiable subgraphs in the query medical record data, and compare the corresponding subgraphs in the query medical record data and the historical medical record data medical records. The correlation of the data in the graph is measured, which improves the accuracy of finding similar medical records.
附图说明Description of drawings
图1是本发明实施例提供的一种相似病历查找方法流程示意图;1 is a schematic flowchart of a method for searching similar medical records provided by an embodiment of the present invention;
图2是本发明实施例提供的一种查询图结构数据的示意图;2 is a schematic diagram of a query graph structure data provided by an embodiment of the present invention;
图3是本发明实施例提供的一种第一类子图和第二类子图的示意图;3 is a schematic diagram of a first type of subgraph and a second type of subgraph provided by an embodiment of the present invention;
图4是本发明实施例提供的一种图1中步骤S103可选的实施例流程示意图;FIG. 4 is a schematic flowchart of an optional embodiment of step S103 in FIG. 1 according to an embodiment of the present invention;
图5是本发明实施例提供的一种相似病历查找装置结构示意图;5 is a schematic structural diagram of a similar medical record search device provided by an embodiment of the present invention;
图6是本发明实施例提供的一种设备的硬件结构示意图。FIG. 6 is a schematic diagram of a hardware structure of a device provided by an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments It is only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。The terms "first", "second" and the like in the description and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrated or described herein.
应当理解,在本发明的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that, in various embodiments of the present invention, the size of the sequence numbers of each process does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be used in the embodiments of the present invention. Implementation constitutes any limitation.
应当理解,在本发明中,“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to Those steps or elements that are expressly listed may instead include other steps or elements that are not expressly listed or are inherent to the process, method, product or apparatus.
应当理解,在本发明中,“多个”是指两个或两个以上。It should be understood that, in the present invention, "plurality" refers to two or more.
应当理解,在本发明中,“与A对应的B”、“与A相对应的B”、“A与B相对应”或者“B与A相对应”,表示B与A相关联,根据A可以确定B。根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其他信息确定B。A与B的匹配,是A与B的相似度大于或等于预设的阈值。It should be understood that in the present invention, "B corresponding to A", "B corresponding to A", "A corresponds to B" or "B corresponds to A" means that B is associated with A, according to A B can be determined. Determining B based on A does not mean determining B based only on A, but also determining B based on A and/or other information. The matching between A and B means that the similarity between A and B is greater than or equal to a preset threshold.
取决于语境,如在此所使用的“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。"If" as used herein may be interpreted as "at" or "when" or "in response to determining" or "in response to detecting," depending on the context.
下面以具体地实施例对本发明的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solutions of the present invention will be described in detail below with specific examples. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
在一份病历数据中,通常包括主诊断信息、多个病历固有的字段,及字段内容。这些字段内容是病患或医生对标准规格病历中字段名填写的内容,字段名例如包括:性别、年龄、科室、就诊时间、诊疗类型、医嘱、主诉、现病史、体格检查、辅助检查。在目前的病历匹配检索方式中,通常是对病历全文信息或这些字段内容进行检索。例如,以“发热”、“呼吸不畅”为关键词进行检索,可以将预存储病历中带有发热、呼吸不畅这两个关键词的所有病历都检索到。也可以指定将主诉中包含关键词“发热”的病历作为查找结果。然而,同样的主诉可能对应完全不同的症状和疾病,例如感冒导致的发热和过敏导致的发热,其治疗方式和诊断方式相差甚远,不具有相似性病历比较的价值。因此现有的相似病历检索方式的准确性不高。A piece of medical record data usually includes main diagnostic information, fields inherent in multiple medical records, and field contents. The content of these fields is the content filled in by the patient or doctor on the field names in the standard medical records, for example, the field names include: gender, age, department, consultation time, type of diagnosis and treatment, doctor's order, chief complaint, history of present illness, physical examination, and auxiliary examination. In the current medical record matching retrieval method, the full text information of the medical record or the content of these fields is usually retrieved. For example, by searching with the keywords "fever" and "breathing", all medical records with the two keywords of fever and shortness of breath in the pre-stored medical records can be retrieved. It is also possible to specify that the medical records containing the keyword "fever" in the main complaint will be the search results. However, the same chief complaint may correspond to completely different symptoms and diseases, such as fever caused by a cold and fever caused by allergies, and the treatment and diagnosis methods are so different that they do not have the value of comparing similar medical records. Therefore, the accuracy of the existing similar medical record retrieval methods is not high.
为了解决现有的相似病历检索方式的准确性不高的问题,本发明实施例提供一种相似病历查找方法,通过构建具有第一类子图和第二类子图的查询图结构数据和所述历史图结构数据,第二类子图是对所述第一类子图进行特征识别得到的,最后根据根节点相似度、第一类子图相似度和第二类子图相似度,确定查询病历数据和历史病历数据之间的相似度,提高了相似病历查找的准确性。In order to solve the problem of low accuracy of the existing similar medical record retrieval methods, the embodiment of the present invention provides a similar medical record retrieval method. Describe the historical graph structure data, the second type of subgraph is obtained by feature identification of the first type of subgraph, and finally according to the similarity of the root node, the similarity of the first type of subgraph and the similarity of the second type of subgraph, determine Querying the similarity between medical record data and historical medical record data improves the accuracy of finding similar medical records.
参见图1,是本发明实施例提供的一种相似病历查找方法流程示意图,图1所示方法的执行主体可以是软件和/或硬件装置,例如可以理解为是服务器。图1所示的方法包括步骤S101至步骤S104,具体如下:Referring to FIG. 1 , it is a schematic flowchart of a method for searching similar medical records provided by an embodiment of the present invention. The execution body of the method shown in FIG. 1 may be software and/or hardware devices, such as a server. The method shown in FIG. 1 includes steps S101 to S104, and the details are as follows:
S101,获取查询病历数据和多个历史病历数据。S101, obtaining and querying medical record data and multiple historical medical record data.
例如,服务器在接收到需要进行相似病历查找的查询病历数据时,获取多个历史病历数据。历史病历数据可以是病历库中预先存储的,也可以是从分布式存储单元中获取的。在一些实施例中,可以是将与查询病历数据相关的病历数据作为历史病历数据。例如,服务器获取查询病历数据;然后根据所述查询病历数据,确定全文检索关键词。全文检索关键词例如是从查询病历数据中提取得到的例如“感冒”、“咳嗽”、“XX药物过敏”等词语。最后根据所述全文检索关键在病历库中进行病历全文查找,得到包含所述全文检索关键词的多个历史病历数据。For example, the server acquires multiple historical medical record data when receiving the queried medical record data that needs to be searched for similar medical records. The historical medical record data may be pre-stored in the medical record database, or may be acquired from a distributed storage unit. In some embodiments, the medical record data related to the queried medical record data may be used as the historical medical record data. For example, the server obtains the queried medical record data; and then determines the full-text search keyword according to the queried medical record data. The full-text search keywords are, for example, words such as "cold", "cough", and "XX drug allergy" extracted from the query medical record data. Finally, according to the full-text search key, a full-text search of medical records is performed in the medical record database, and a plurality of historical medical record data including the full-text search key are obtained.
S102,获取所述查询病历数据对应的查询图结构数据,以及各所述历史病历数据对应的历史图结构数据,其中,所述查询图结构数据和所述历史图结构数据都包括第一类子图和第二类子图,所述第一类子图的中间节点为病历字段类别,所述第二类子图的中间节点和叶子节点是对所述第一类子图进行特征识别得到的。S102, acquiring query graph structure data corresponding to the query medical record data, and historical graph structure data corresponding to each of the historical medical record data, wherein both the query graph structure data and the historical graph structure data include the first type of subclasses graph and the second type of subgraph, the middle node of the first type of subgraph is the medical record field category, and the middle node and leaf node of the second type of subgraph are obtained by feature recognition of the first type of subgraph .
例如利用特征工程结构化病历数据,对查询病历数据和历史病历数据分别生成图结构数据体,得到查询图结构数据和历史图结构数据。其中,历史图结构数据可以是以获取查询图结构数据同样的方式结构化得到的,也可以是预先从外部获取并与历史病历数据对应存储的,在此不做限定。本实施例以查询图结构数据的获取过程为例进行说明。For example, using feature engineering to structure medical record data, a graph structure data body is generated for the query medical record data and the historical medical record data respectively, and the query graph structure data and the historical graph structure data are obtained. The historical graph structure data may be structured in the same way as the query graph structure data is obtained, or may be obtained from the outside in advance and stored corresponding to the historical medical record data, which is not limited herein. This embodiment is described by taking the acquisition process of query graph structure data as an example.
参见图2,是本发明实施例提供的一种查询图结构数据的示意图。本实施例以查询病历数据对应的查询图结构数据为例,图2中箭头从子节点指向各自的父节点,根节点为主诊断。Referring to FIG. 2 , it is a schematic diagram of querying graph structure data according to an embodiment of the present invention. In this embodiment, the query graph structure data corresponding to the query medical record data is taken as an example. In FIG. 2 , the arrows point from the child nodes to the respective parent nodes, and the root node is the main diagnosis.
通常在查询病历数据中,包括主诊断、基本信息以及基本病况三大类数据。主诊断为文本类型的数据。基本信息例如可以包括需要医生填入的项目:年龄和性别。基本病况例如可以包括需要医生填入的项目:医嘱、体格检查、诊疗类型、主诉、辅助检查、科室、现病史以及就诊时间。Usually in the query of medical record data, it includes three categories of data: main diagnosis, basic information and basic medical conditions. The primary diagnosis is text-type data. The basic information may include, for example, items to be filled in by the doctor: age and gender. The basic condition may include, for example, items that need to be filled in by the doctor: doctor's order, physical examination, type of diagnosis and treatment, chief complaint, auxiliary examination, department, history of current illness, and visit time.
例如,可将所述查询病历的主诊断作为根节点的描述信息。而根节点之下的中间节点可以分为两类,第一类中间节点可以是查询病历数据中的字段类别,例如基本信息以及基本病况,参见图2。第二类中间节点可以是预设的特征,例如是想要从医生填入的项目内容中进行识别的特征。如图2所示,第二类中间节点例如可以包括:过敏、手术、药物、人群属性、症状、疾病、体征、检验、检查。For example, the main diagnosis of the queried medical record can be used as the description information of the root node. The intermediate nodes under the root node can be divided into two categories. The first type of intermediate nodes can be field categories in the query medical record data, such as basic information and basic medical conditions, see FIG. 2 . The second type of intermediate node may be a preset feature, for example, a feature that is to be identified from the item content filled in by the doctor. As shown in FIG. 2 , the second type of intermediate nodes may include, for example, allergies, operations, medicines, population attributes, symptoms, diseases, signs, tests, and examinations.
对于第一类中间节点,可以是将所述查询病历包括的病历字段类别作为第一类中间节点,并将所述字段类别包括的字段分别作为所述字段类别的第一类叶子节点,将所述字段对应的描述信息,作为所述第一类叶子节点的描述信息。例如字段类别为“基本信息”,则将“基本信息”作为第一类中间节点,并将“年龄”和“性别”分别作为该第一类中间节点之下的2个第一类叶子节点。而第一类中间节点“基本病况”的叶子节点则包括“医嘱”、“体格检查”、“诊疗类型”、“主诉”、“辅助检查”、“科室”、“现病史”以及“就诊时间”8个第一类叶子节点,参见图2所示的图结构数据。For the first type of intermediate node, the medical record field category included in the query medical record may be used as the first type of intermediate node, and the fields included in the field category may be regarded as the first type of leaf nodes of the field category, respectively. The description information corresponding to the description field is used as the description information of the first type of leaf node. For example, if the field category is "Basic Information", then "Basic Information" is used as the first-type intermediate node, and "Age" and "Gender" are respectively used as two first-type leaf nodes under the first-type intermediate node. The leaf nodes of the first type of intermediate node "Basic Conditions" include "Doctor's Order", "Physical Examination", "Diagnosis and Treatment Type", "Main Complaint", "Auxiliary Examination", "Department", "Present Illness", and "Duration of Consultation" "8 first-type leaf nodes, see the graph structure data shown in Figure 2.
对于第二类中间节点,首先可以将预设的特征作为第二类中间节点。例如图2所示的示例中,将预设的特征“症状”作为一个第二类中间节点。第二类中间节点可以是预先设定的,需要进行识别的特征。服务器以所述特征对查询病历数据中各所述字段及各所述字段的描述信息进行自然语言理解NLU的识别,得到所述特征的描述信息,或者得到所述特征的描述信息及所述描述信息的加因子属性信息或乘因子属性信息。例如在根据现病史和主诉等字段及字段的描述信息中,对特征“症状”进行NLU处理,得到有咳嗽、咳嗽程度(轻微)咳嗽的持续时间(1个月)、发热及发热时间(当前的24小时)。那么,可以将描述信息“咳嗽”和“发热”作为第二类中间节点“症状”之下的2个第二类叶子节点。并且,可以将“有”作为描述信息“咳嗽”的乘因子属性信息,将“程度轻微”以及“持续1个月”作为描述信息“咳嗽”的加因子属性信息。同样地,可以将“持续24小时”作为描述信息“发热”的加因子属性信息。又例如在根据现病史和主诉等字段及字段的描述信息中,对特征“人群属性”进行NLU处理,得到老年人。那么可以将描述信息“老年人”作为第二类中间节点“人群属性”之下的第二类叶子节点。在确定了特征的描述信息后,服务器可以将所述特征的描述信息作为所述特征的第二类叶子节点,其中,具有所述加因子属性信息或乘因子属性信息的第二类叶子节点的数据类型为复杂数据类型。For the second type of intermediate node, the preset feature can be used as the second type of intermediate node first. For example, in the example shown in FIG. 2 , the preset feature "symptom" is used as a second-type intermediate node. The second type of intermediate node may be a preset feature that needs to be identified. The server uses the feature to perform natural language understanding NLU recognition on each of the fields and the description information of each field in the query medical record data, and obtains the description information of the feature, or obtains the description information of the feature and the description The addition factor attribute information or the multiplication factor attribute information of the information. For example, according to the description information of fields and fields such as the history of present illness and main complaint, NLU processing is performed on the feature "symptom", and the duration of cough (1 month), fever and fever time (current) are obtained. 24 hours). Then, the description information "cough" and "fever" can be used as two second-type leaf nodes under the second-type intermediate node "symptom". In addition, "yes" may be used as multiplier attribute information of the description information "cough", and "slight degree" and "continue for 1 month" may be used as the addition factor attribute information of the description information "cough". Similarly, "continue for 24 hours" can be used as the plus factor attribute information of the description information "fever". Another example is to perform NLU processing on the feature "population attribute" according to the fields and description information of the fields such as the history of present illness and the chief complaint to obtain the elderly. Then, the description information "elderly people" can be used as the second type of leaf node under the second type of intermediate node "crowd attribute". After determining the description information of the feature, the server may use the description information of the feature as the second type of leaf node of the feature, wherein the second type of leaf node having the attribute information of the addition factor or the attribute information of the multiplication factor has the attribute information of the second type of leaf node. The data type is a complex data type.
最后,根据上述得到的所述根节点、所述第一类中间节点、所述第一类叶子节点、所述第二类中间节点以及所述第二类叶子节点,可以得到所述查询病历数据对应的查询图结构数据,其中,所述根节点、所述第一类中间节点以及所述第一类叶子节点形成第一类子图;所述根节点、所述第二类中间节点以及所述第二类叶子节点形成第二类子图。参见图3,是本发明实施例提供的一种第一类子图和第二类子图的示意图。图3中示意了2个第一类子图,以及2个第二类子图。第一类子图和第二类子图的根节点都是主诊断,但第一类子图的第一类中间节点为字段类别,例如基本信息或基本病况,而第二类子图的第二类中间节点为特征,例如症状或者体征。第一类子图是以叶子节点为切分,每个第一类子图包括单个第一类叶子节点。而第二类子图则是以中间节点为切分,一个第二类子图包括单个第二类中间节点,但可能包括多个第二类叶子节点。Finally, according to the root node, the first type intermediate node, the first type leaf node, the second type intermediate node and the second type leaf node obtained above, the query medical record data can be obtained Corresponding query graph structure data, wherein the root node, the first type intermediate node and the first type leaf node form a first type subgraph; the root node, the second type intermediate node and all The second type of leaf nodes form the second type of subgraph. Referring to FIG. 3 , it is a schematic diagram of a first type of subgraph and a second type of subgraph provided by an embodiment of the present invention. Two first-type subgraphs and two second-type subgraphs are illustrated in FIG. 3 . The root nodes of the first type subgraph and the second type subgraph are the main diagnosis, but the first type intermediate node of the first type subgraph is a field category, such as basic information or basic condition, while the first type of the second type subgraph is a field category. Class II intermediate nodes are features, such as symptoms or signs. The first type of subgraph is divided into leaf nodes, and each first type of subgraph includes a single first type of leaf node. The second type of subgraph is divided by intermediate nodes. A second type of subgraph includes a single second type of intermediate node, but may include multiple second type of leaf nodes.
在得到上述查询图结构数据后,为了便于后续以子图为比较单元的相似性度量,可以先将查询图结构数据拆分为多个第一类子图和多个第二类子图。After obtaining the above query graph structure data, in order to facilitate the subsequent similarity measurement using subgraphs as comparison units, the query graph structure data may be divided into multiple first type subgraphs and multiple second type subgraphs.
S103,根据各所述历史图结构数据与所述查询图结构数据中根节点相似度、第一类子图相似度和第二类子图相似度,获取各所述历史图结构数据与所述查询图结构数据的相似程度。S103, according to the similarity of the root node, the similarity of the first type of sub-graph and the similarity of the second type of sub-graph in each of the historical graph structural data and the query graph structural data, obtain each of the historical graph structural data and the query Similarity of graph-structured data.
在病历相似度比较中,加入节点的主诊断完全不同,则通常不属于相似病历,因此将根节点相似度作为历史图结构数据与查询图结构数据的相似程度比较依据之一,能够提高相似病历查找的准确性。In the comparison of the similarity of medical records, if the main diagnosis of the added node is completely different, it is usually not a similar medical record. Therefore, the similarity of the root node is used as one of the basis for comparing the similarity between the historical graph structure data and the query graph structure data, which can improve similar medical records. Find the accuracy.
具体地,可以是获取历史图结构数据的根节点、第一类子图和第二类子图,以及查询图结构数据的根节点、第一类子图和第二类子图,然后以两者的根节点比较得到根节点相似度,两者的第一类子图比较得到第一类子图相似度,两者的第二类子图比较得到第二类子图相似度。Specifically, it can be to obtain the root node, the first type subgraph and the second type subgraph of the historical graph structure data, and to query the root node, the first type subgraph and the second type subgraph of the graph structure data, and then use two The root node similarity is obtained by comparing the root nodes of the two, the first type subgraph similarity is obtained by comparing the first type subgraphs of the two, and the second type subgraph similarity is obtained by comparing the second type subgraphs of the two.
为了更加清楚地说明上述步骤S103(根据各所述历史图结构数据与所述查询图结构数据中根节点相似度、第一类子图相似度和第二类子图相似度,获取各所述历史图结构数据与所述查询图结构数据的相似程度),下面结合附图和具体实施例进行举例说明。In order to explain the above step S103 more clearly (according to the similarity of the root node, the similarity of the first type of sub-graph and the similarity of the second type of sub-graph in each of the historical graph structural data and the query graph structural data, obtain each of the historical graphs The degree of similarity between the graph structure data and the query graph structure data) is described below with reference to the accompanying drawings and specific embodiments.
参见图4,是本发明实施例提供的一种图1中步骤S103可选的实施例流程示意图。在图4所示的方法中,包括步骤S201至步骤S204,具体如下:Referring to FIG. 4 , it is a schematic flowchart of an optional embodiment of step S103 in FIG. 1 according to an embodiment of the present invention. The method shown in FIG. 4 includes steps S201 to S204, which are as follows:
S201,根据所述历史图结构数据与所述查询图结构数据中的根节点相似度,确定所述历史图结构数据与所述查询图结构数据的第一相似度量值。S201: Determine a first similarity measure between the historical graph structural data and the query graph structural data according to the similarity of the root node in the historical graph structural data and the query graph structural data.
例如,可以直接将根节点相似度,作为历史图结构数据与查询图结构数据的第一相似度量值。For example, the root node similarity may be directly used as the first similarity measure between the historical graph structure data and the query graph structure data.
由于根节点的主诊断通常为文本类型数据,在确定第一相似度量值之前,或者是在图1所示步骤103之前,还可以先根据预设的文本型相似度确定模型以及所述根节点的描述信息,确定各所述历史图结构数据与所述查询图结构数据中根节点相似度。文本型相似度确定模型例如可以是下面表一所示的与文本类型数据相对应的相似度确定模型。Since the main diagnosis of the root node is usually text-type data, before determining the first similarity metric value, or before step 103 shown in FIG. 1 , the model and the root node can also be determined according to the preset text-type similarity. The description information is determined, and the similarity between each historical graph structure data and the root node in the query graph structure data is determined. The text-type similarity determination model may be, for example, the similarity determination model corresponding to the text-type data shown in Table 1 below.
S202,根据叶子节点的显著性类别,在所述第一类子图中,确定第一类显著子图和第一类非显著子图。S202, according to the saliency category of the leaf node, in the first type of subgraph, determine the first type of saliency subgraph and the first type of non-salient subgraph.
叶子节点的显著性类别可以是依据实际相似度匹配时的设计应用需求来设定的。假设设计应用需求为:“年龄”、“性别”、“诊疗类型”、“就诊时间”为相似病历结果的筛选条件。例如,可以将包括“年龄”、“性别”、“就诊时间”、“诊疗类型”的第一类子图,作为第一类显著子图。其中,“性别”、“诊疗类型”为布尔数据类型的相似病历匹配筛选条件,例如:相似病历筛选条件为「性别:男」,那么相似病历检索匹配「性别:男」的结果输出。其中,“年龄”、“就诊时间”为数值数据类型的相似病历筛选匹配条件,且应用数值衰减函数做相似病历匹配筛选,例如:相似病历筛选条件为「年龄:20岁」,相似病历针对“年龄”做衰减匹配筛选,假设相似病历检索结果其他匹配项基本一致,那么“年龄”与20岁越接近,相似程度越高。在本实施例中可以将包括“医嘱”、“体格检查”、“主诉”、“辅助检查”、“科室”、“现病史”的第一类子图,作为第一类非显著子图。The saliency category of the leaf node can be set according to the design application requirements when the actual similarity is matched. Assume that the design application requirements are: "age", "sex", "diagnosis and treatment type", and "diagnosis time" are the screening conditions for similar medical record results. For example, the first type of submap including "age", "sex", "time of consultation", and "type of diagnosis and treatment" can be used as the first type of significant submap. Among them, "gender" and "diagnosis and treatment type" are similar medical records matching filter conditions of Boolean data type. For example, if the similar medical record filter conditions are "gender: male", then the similar medical record search matches the result output of "gender: male". Among them, "age" and "visit time" are the similar medical record screening and matching conditions of numerical data type, and the numerical decay function is used for similar medical record matching screening, for example: the similar medical record screening condition is "age: 20 years old", similar medical records are for " Age" is used for attenuation matching screening. Assuming that other matching items of similar medical record retrieval results are basically the same, the closer "age" is to 20 years old, the higher the degree of similarity. In this embodiment, the first type of subgraphs including "doctor's order", "physical examination", "main complaint", "auxiliary examination", "department", and "history of present illness" can be used as the first type of non-significant subgraphs.
S203,根据所述历史图结构数据与所述查询图结构数据中的第一类显著子图相似度,确定所述历史图结构数据与所述查询图结构数据的第二相似度量值。S203: Determine a second similarity measure between the historical graph structural data and the query graph structural data according to the similarity between the historical graph structural data and the first type of significant subgraphs in the query graph structural data.
历史图结构数据与查询图结构数据都可能包括有多个第一类显著子图,历史图结构数据的每个第一类显著子图,可以与查询图结构数据中相应的第一类显著子图进行比较,得到第一类显著子图相似度。那么,服务器可以是将所述历史图结构数据与所述查询图结构数据的各第一类显著子图相似度的乘积,作为所述历史图结构数据与所述查询图结构数据的第二相似度量值。通过乘积的形式,可以提高各第一类显著子图相似度对第二相似度量值的影响程度。例如只要任一第一类显著子图相似度为0,即两个图结构数据的第一类显著子图完全不相关,则直接值确将第二相似度量定为0。Both the historical graph structure data and the query graph structure data may include multiple first-type saliency subgraphs. The graphs are compared to obtain the similarity of the first type of significant subgraphs. Then, the server may take the product of the similarity between the historical graph structure data and the first type of significant subgraphs of the query graph structure data as the second similarity between the historical graph structure data and the query graph structure data metric. In the form of a product, the degree of influence of the similarity of each first type of significant subgraphs on the second similarity metric value can be improved. For example, as long as the similarity of any first-type saliency subgraph is 0, that is, the first-type saliency subgraphs of the two graph structure data are completely irrelevant, the direct value will indeed set the second similarity measure as 0.
S204,根据所述历史图结构数据与所述查询图结构数据中的第一类非显著子图相似度和第二类子图相似度,确定所述历史图结构数据与所述查询图结构数据的第三相似度量值。S204: Determine the historical graph structural data and the query graph structural data according to the similarity of the first type of non-significant subgraph and the second type of subgraph similarity in the historical graph structural data and the query graph structural data The third similarity measure of .
在获取到历史图结构数据与查询图结构数据中的第一类非显著子图相似度和第二类子图相似度时,可以据此得到第三相似度量值。When the similarity of the first type of non-salient sub-graph and the similarity of the second type of sub-graph in the historical graph structure data and the query graph structure data are obtained, a third similarity measure value can be obtained accordingly.
由于第一类非显著子图和第二类子图通常对应病历具体的病况而不起到相似度的决定性作用,在确定第三相似度量值的过程中,例如可以将所述历史图结构数据与所述查询图结构数据中的各所述第一类非显著子图相似度、各所述第二类子图相似度之和,作为所述历史图结构数据与所述查询图结构数据的第三相似度量值。可以理解为,第三相似度量值体现为各第一类非显著子图相似度和各所述第二类子图相似度的总和。Since the first type of non-salient subgraph and the second type of subgraph usually correspond to the specific conditions of the medical record and do not play a decisive role in the similarity, in the process of determining the third similarity measure, for example, the historical graph structure data can be The sum of similarity with each of the first type of non-significant subgraphs and each of the second type of subgraphs in the query graph structure data is taken as the difference between the historical graph structure data and the query graph structure data. The third similarity measure. It can be understood that the third similarity metric value is embodied as the sum of the similarity of each of the first type of non-salient sub-graphs and the similarity of each of the second type of sub-graphs.
本实施例中上述步骤S201、步骤S203、步骤S204,并不受图4所描述的顺序限制,步骤S201、步骤S203、步骤S204可以采用其他顺序或者同时进行,在此不做限制。In this embodiment, the above steps S201, S203, and S204 are not limited by the sequence described in FIG. 4. Steps S201, S203, and S204 may be performed in other sequences or simultaneously, which are not limited here.
S205,根据各所述历史图结构数据与所述查询图结构数据的所述第一相似度量值、所述第二相似度量值和所述第三相似度量值,确定各所述历史图结构数据与所述查询图结构数据的相似程度。S205, according to the first similarity metric value, the second similarity metric value and the third similarity metric value of each of the historical graph structural data and the query graph structural data, determine each of the historical graph structural data Similarity to the query graph structured data.
具体地,可以是以下列公式一确定历史图结构数据与查询图结构数据的相似程度。Specifically, the following formula 1 may be used to determine the degree of similarity between the historical graph structure data and the query graph structure data.
其中,A=I-M;Wherein, A=I-M;
R(d,q)表示历史图结构数据d与查询图结构数据q的相似程度;R(d,q) represents the similarity between the historical graph structure data d and the query graph structure data q;
Sim为相似度算符,Sim(vd,根,vq,根)为第一相似度量值,vd,根为历史图结构数据的根节点,vq,根为查询图结构数据的根节点;Sim is the similarity operator, Sim(v d, root , v q, root ) is the first similarity measure value, v d, root is the root node of historical graph structure data, v q, root is the root of query graph structure data node;
为第二相似度量值,ui∈M,uj∈M表示第一类显著子图的叶子节点集合,由此限定了第二相似度量值中的Sd,i为历史图结构数据的第i个第一类显著子图,第二相似度量值中的Sq,为查询图结构数据的第j个第一类显著子图; is the second similarity metric value, u i ∈ M, u j ∈ M represents the leaf node set of the first type of saliency subgraph, which defines S d, i in the second similarity metric value as the first rank of the historical graph structure data. i first-type saliency subgraphs, S q in the second similarity measure, is the j-th first-type saliency subgraph of the query graph structure data;
为第三相似度量值,I是所有叶子节点的集合,ui∈A,uj∈A表示除第一类显著子图的叶子节点集合以外的叶子节点,由此限定了第三相似度量值中的Sd,为历史图结构数据的第i个第一类非显著子图或第二类子图,第三相似度量值中的Sq,为查询图结构数据的第j个第一类非显著子图或第二类子图。 is the third similarity metric value, I is the set of all leaf nodes, u i ∈ A, u j ∈ A represents the leaf nodes except the leaf node set of the first type of saliency subgraph, thus defining the third similarity metric value S d in the historical graph structure data is the i-th first-type non-salient sub-graph or second-type sub-graph, and S q in the third similarity measure is the j-th first-class sub-graph of the query graph structure data Non-significant subplots or subplots of the second type.
上述公式一中,根节点例如是图2所示的主诊断,M集合对应叶子节点所属子图,例如可以是图2所示的“年龄”、“性别”、“就诊时间”以及“诊疗类型”所属的第一类显著子图;A集合对应的子图例如可以是图2所示的除了“年龄”、“性别”、“就诊时间”以及“诊疗类型”所述子图以外的其他子图,包括第二类子图和第一类非显著子图。In the above formula 1, the root node is, for example, the main diagnosis shown in Figure 2, and the M set corresponds to the subgraph to which the leaf nodes belong. ” belongs to the first category of significant subgraphs; the subgraphs corresponding to set A can be, for example, other subgraphs shown in FIG. Figure, including the second type of submap and the first type of non-significant submap.
S104,根据预设选择规则和所述相似程度,在所述多个历史病历数据中确定所述查询病历数据的相似病历查找结果,其中,所述相似病历查找结果对应的所述历史图结构数据,具有满足所述预设选择规则的所述相似程度。S104, according to a preset selection rule and the degree of similarity, determine a similar medical record search result of the query medical record data among the plurality of historical medical record data, wherein the historical graph structure data corresponding to the similar medical record search result , having the similarity degree that satisfies the preset selection rule.
例如可以是基于决策器模型控制输出的相似病历查找结果中历史病历数据的数量。例如以相似程度对历史病历数据进行由相似程度高到低的排序,然后以预设的截断比例(例如50%)将符合截断比例的排前列的顺序若干个历史病历数据,作为相似病历查找结果。也可以是根据预设的截断个数,将排前列的满足截断个数(例如5个)顺序历史病历数据,作为相似病历查找结果。预设选择规则可以是基于个数或基于比例的,在此不做限定。For example, it may be the quantity of historical medical record data in the similar medical record search result output based on the decision maker model control. For example, sort the historical medical record data from high similarity to low degree of similarity, and then use a preset truncation ratio (for example, 50%) to rank several historical medical record data in the top order of the truncation ratio as the search result of similar medical records. . According to the preset number of truncations, the historical medical record data in the front row that satisfy the number of truncations (for example, 5) may be used as the search result of similar medical records. The preset selection rule may be number-based or proportion-based, which is not limited herein.
本实施例提供的一种相似病历查找方法,通过获取查询病历数据和多个历史病历数据;获取所述查询病历数据对应的查询图结构数据,以及各所述历史病历数据对应的历史图结构数据,其中,所述查询图结构数据和所述历史图结构数据都包括第一类子图和第二类子图,所述第一类子图的中间节点为病历字段类别,所述第二类子图的中间节点和叶子节点是对所述第一类子图进行特征识别得到的;根据各所述历史图结构数据与所述查询图结构数据中根节点相似度、第一类子图相似度和第二类子图相似度,获取各所述历史图结构数据与所述查询图结构数据的相似程度;根据预设选择规则和所述相似程度,在所述多个历史病历数据中确定所述查询病历数据的相似病历查找结果,从而提取出查询病历数据中固有的子图和可识别得到的子图,对查询病历数据和历史病历数据病历中的相应子图中数据的关联性进行度量,提高了相似病历查找的准确性。A method for searching similar medical records provided in this embodiment is obtained by acquiring query medical record data and multiple historical medical record data; acquiring query graph structure data corresponding to the queried medical record data, and historical graph structure data corresponding to each of the historical medical record data , wherein the query graph structure data and the history graph structure data both include a first type of subgraph and a second type of subgraph, the intermediate node of the first type of subgraph is a medical record field category, and the second type of subgraph The intermediate nodes and leaf nodes of the subgraph are obtained by performing feature recognition on the first type of subgraph; according to the similarity of the root node and the similarity of the first type of subgraph in each of the historical graph structure data and the query graph structure data and the similarity degree of the second type of sub-graph, to obtain the similarity degree of each of the historical graph structure data and the query graph structure data; according to the preset selection rule and the similarity degree, determine the degree of similarity among the plurality of historical medical record data The search results of similar medical records of the query medical record data are described, so as to extract the inherent subgraphs and identifiable subgraphs in the query medical record data, and measure the correlation of the data in the corresponding subgraphs in the medical records of the query medical record data and the historical medical record data. , which improves the accuracy of finding similar medical records.
由于第一类子图和第二类子图的形成方式及体现的内容类型不同,本发明实施例可以分别以不相同的两种相似度确定方式,确定第一类子图的相似度和第二类子图相似度。Since the first type of sub-image and the second type of sub-image are formed in different ways and the types of content embodied, the embodiment of the present invention can use two different similarity determination methods to determine the similarity and the second type of sub-image respectively. Second-class subgraph similarity.
对于确定第一类子图的相似度,其中又包括确定第一类显著子图相似度和第一类非显著子图相似度。Determining the similarity of the first type of subgraphs includes determining the similarity of the first type of salient subgraphs and the similarity of the first type of non-salient subgraphs.
上述实施例中,可以理解的,在步骤S203(根据所述历史图结构数据与所述查询图结构数据中的第一类显著子图相似度,确定所述历史图结构数据与所述查询图结构数据的第二相似度量值)以及步骤S204(根据所述历史图结构数据与所述查询图结构数据中的第一类非显著子图相似度和第二类子图相似度,确定所述历史图结构数据与所述查询图结构数据的第三相似度量值)之前,还可以包括计算第一类显著子图相似度和第一类非显著子图相似度的过程。In the above embodiment, it can be understood that in step S203 (according to the similarity between the historical graph structure data and the first type of significant subgraphs in the query graph structural data, determine the historical graph structural data and the query graph. the second similarity measure value of the structural data) and step S204 (according to the similarity of the first type of non-salient subgraph and the similarity of the second type of subgraph in the historical graph structural data and the query graph structural data, determine the Before the third similarity measure between the historical graph structure data and the query graph structure data), a process of calculating the similarity of the first type of salient subgraph and the similarity of the first type of non-salient subgraph may also be included.
例如,可以是在所述历史图结构数据与所述查询图结构数据的具有相同中间节点的所述第一类子图中,根据所述第一类子图中叶子节点相似度、根节点与叶子节点的边预设权重以及根节点相似度,获取所述历史图结构数据与所述查询图结构数据中第一类子图的相似度。其中,所述第一类子图的相似度包括所述第一类显著子图相似度和所述第一类非显著子图相似度,即第一类子图的相似度的计算方法,就是第一类显著子图相似度和第一类非显著子图相似度的计算方法。For example, in the first type subgraph with the same intermediate node in the historical graph structure data and the query graph structure data, according to the similarity of leaf nodes, the root node and the root node in the first type subgraph The edge preset weight of the leaf node and the similarity of the root node are obtained, and the similarity between the historical graph structure data and the first type of subgraphs in the query graph structure data is obtained. Wherein, the similarity of the first type of subgraph includes the similarity of the first type of salient subgraph and the similarity of the first type of non-salient subgraph, that is, the calculation method of the similarity of the first type of subgraph is The calculation method of the similarity of the first type of salient subgraph and the similarity of the first type of non-salient subgraph.
在一些实施例中,可以是以下面公式二获取历史图结构数据与查询图结构数据中第一类子图的相似度。In some embodiments, the similarity between the historical graph structure data and the first type of subgraphs in the query graph structure data may be obtained by the following formula 2.
Sim(Sd,Sq)=Sim(ud,uq)*weight(uv)*Sim(vd,vq) 公式二Sim(S d ,S q )=Sim(u d ,u q )*weight(uv)*Sim(v d ,v q ) Formula 2
其中,Sim表示相似度算符;Sd表示历史图结构数据的第一类子图,Sq表示查询图结构数据的第一类子图;ud表示历史图结构数据的第一类子图Sd的叶子节点,uq表示查询图结构数据的第一类子图Sq的叶子节点;weight(uv)表示第一类子图中根节点与叶子节点的边预设权重;vd表示历史图结构数据的第一类子图Sd的根节点,vq表示查询图结构数据的第一类子图Sq的根节点。Among them, Sim represents the similarity operator; S d represents the first type subgraph of historical graph structure data, S q represents the first type subgraph of query graph structure data; ud represents the first type subgraph of historical graph structure data The leaf node of S d , u q represents the leaf node of the first type subgraph S q of the query graph structure data; weight(uv) represents the preset weight of the edge between the root node and the leaf node in the first type subgraph; v d represents the history The root node of the first type subgraph S d of the graph structure data, v q represents the root node of the first type subgraph S q of the query graph structure data.
上述公式二中的第一类子图,可以是图2所示图结构数据中,分别以年龄、性别、医嘱、体格检查、诊疗类型、主诉、辅助检查、科室、现病史以及就诊时间为叶子节点的子图。其中,以年龄、性别、诊疗类型、就诊时间为叶子节点的子图,是第一类显著子图,其余为第一类非显著子图。根节点与叶子节点的边预设权重可以是根据医学专家经验预先设置的权重。The first type of subgraph in the above formula 2 can be in the graph structure data shown in Fig. 2, with age, gender, doctor's order, physical examination, type of diagnosis and treatment, chief complaint, auxiliary examination, department, history of present illness and visit time as the leaves. A subgraph of a node. Among them, the subgraphs with age, gender, type of diagnosis and treatment, and treatment time as leaf nodes are the first type of significant subgraphs, and the rest are the first type of non-significant subgraphs. The preset weights of the edges of the root node and the leaf nodes may be preset weights based on the experience of medical experts.
可以理解的,在上述步骤S204(根据所述历史图结构数据与所述查询图结构数据中的第一类非显著子图相似度和第二类子图相似度,确定所述历史图结构数据与所述查询图结构数据的第三相似度量值)之前,还可以先获取历史图结构数据与查询图结构数据中第二类子图相似度。具体地,可以是先获取所述第二类子图中根节点与中间节点的边预设权重,以及所述第二类子图中叶子节点与中间节点的边统计权重。应当理解地,在历史图结构数据中,以ud,i表示第二类子图的叶子节点,cd表示中间节点,vd表示根节点,那么边的关系为:ud,cd+cdvd=ud,vd。根节点与中间节点的边cdvd权重值,对于同属一个中间节点cd的不同叶子节点ud,与ud,而言,属于同类型的叶子节点,即cdvd=ud,vd-ud,cd与cdvd=ud,vd-ud,cd是相同的。所以中间节点与叶子节点的边ud,cd权重与根节点与叶子节点的边ud,vd权重是正线性相关的。例如,边ud,ivd权重表示为主诊断与症状A、症状B等叶子节点的相关性统计,具备医学可解释性,所以根据根节点与叶子节点的相关性统计结果,来确定叶子节点与中间节点的边统计权重。例如是主诊断和特征的描述信息基于互信息与卡方统计值的加权平均值。例如,具体可以是先获取所述第二类子图中中间节点之下的至少一个叶子节点;获取所述第二类子图中根节点与所述至少一个叶子节点的互信息。然后获取所述第二类子图中根节点与所述至少一个叶子节点的卡方统计值;最后将与所述中间节点相对应的所述互信息值和所述卡方统计值的加权和,作为所述第二类子图中所述叶子节点与中间节点的边统计权重。接着,在所述历史图结构数据与所述查询图结构数据的具有相同中间节点的所述第二类子图中,根据所述第二类子图中叶子节点相似度、所述根节点与中间节点的边预设权重、叶子节点与中间节点的边统计权重以及根节点相似度,获取所述历史图结构数据与所述查询图结构数据中第二类子图相似度。例如,将历史图结构数据的中间节点为症状的第二类子图,与查询图结构数据的中间节点为症状的第二类子图进行相似度计算,得到中间节点为症状的第二类子图相似度。It can be understood that in the above step S204 (according to the similarity of the first type of non-significant subgraph and the similarity of the second type of subgraph in the historical graph structure data and the query graph structure data, determine the historical graph structure data. Before the third similarity metric value of the query graph structure data), the similarity between the historical graph structure data and the second type of subgraphs in the query graph structure data may also be obtained first. Specifically, the preset weights of the edges of the root node and the intermediate nodes in the second type of subgraph and the statistical weights of the edges of the leaf nodes and the intermediate nodes in the second type of subgraph may be obtained first. It should be understood that, in the historical graph structure data, ud , i represent the leaf nodes of the second type of subgraph, cd represents the intermediate node, and v d represents the root node, then the edge relationship is: ud , c d + c d v d =ud , v d . The weight value of the edge c d v d between the root node and the intermediate node, for different leaf nodes ud and ud belonging to the same intermediate node c d , belong to the same type of leaf node, that is, c d v d = ud d , v d - ud , c d is the same as c d v d = ud , v d - ud , c d . Therefore, the weights of the edges ud and c d between the intermediate node and the leaf nodes are positively linearly related to the weights of the edges ud and v d between the root node and the leaf nodes. For example, the weights of edges ud , i v d represent the correlation statistics between the main diagnosis and leaf nodes such as symptom A and symptom B, which are medically interpretable. Therefore, the leaf nodes are determined according to the statistical results of the correlation between the root node and the leaf nodes. Statistical weights of edges between nodes and intermediate nodes. For example, the descriptive information of the main diagnosis and features is based on the weighted average of mutual information and chi-square statistics. For example, specifically, at least one leaf node under the middle node in the second type of subgraph may be obtained first; and the mutual information between the root node and the at least one leaf node in the second type of subgraph may be obtained. Then obtain the chi-square statistic value of the root node and the at least one leaf node in the second type of subgraph; finally, the weighted sum of the mutual information value and the chi-square statistic value corresponding to the intermediate node, as the statistical weight of the edge between the leaf node and the intermediate node in the second type of subgraph. Next, in the second type of subgraph with the same intermediate node in the historical graph structure data and the query graph structure data, according to the similarity of leaf nodes in the second type of subgraph, the root node and the The edge preset weight of the intermediate node, the edge statistical weight of the leaf node and the intermediate node, and the similarity of the root node are obtained, and the similarity of the second type of sub-graph in the historical graph structure data and the query graph structure data is obtained. For example, the intermediate node of the historical graph structure data is the second type subgraph of symptoms, and the similarity calculation is performed on the second type subgraph of which the intermediate node of the query graph structure data is the symptom, and the second type subgraph of which the intermediate node is the symptom is obtained. Image similarity.
例如,可以是以下列公式三获取所述历史图结构数据与所述查询图结构数据中第二类子图相似度。For example, the following formula 3 may be used to obtain the similarity between the historical graph structure data and the second type of subgraphs in the query graph structure data.
其中,历史图结构数据的第二类子图为Sd={ud,1,ud,2,…,ud,m,cd,vd},{ud,1cd,ud, 2cd,…,ud,mcd,cdvd}),ud,m为历史图结构数据的第二类子图的第m个叶子节点,cd为历史图结构数据的第二类子图的中间节点,vd为历史图结构数据的第二类子图的根节点,ud,cd为叶子节点ud,与中间节点cd的边,cdvd为中间节点cd与根节点vd的边;查询图结构数据的第二类子图为Sq={uq,1,uq,2,…,uq,n,cq,vq},{uq,1cq,uq,2cq,…,uq,ncq,cqvq}),uq,n为查询图结构数据的第二类子图的第n个叶子节点,cq为查询图结构数据的第二类子图的中间节点,vq为查询图结构数据的第二类子图的根节点,uq,ncq为叶子节点uq,n与中间节点cq的边,cqvq为中间节点cq与根节点vq的边;Among them, the second type of subgraph of historical graph structure data is S d ={ud ,1 ,ud ,2 ,...,ud ,m , cd ,v d } ,{ud ,1 cd ,u d, 2 c d ,…,u d,m c d ,c d v d }), ud ,m is the mth leaf node of the second type subgraph of the historical graph structure data, c d is the historical graph structure The intermediate node of the second type of subgraph of the data, v d is the root node of the second type of subgraph of the historical graph structure data, ud , c d are the leaf nodes ud , and the edge of the intermediate node c d , c d v d is the edge between the intermediate node c d and the root node v d ; the second subgraph of the query graph structure data is S q ={u q,1 ,u q,2 ,…,u q,n ,c q ,v q },{u q,1 c q ,u q,2 c q ,…,u q,n c q ,c q v q }), u q,n is the second type of subgraph of the query graph structure data The nth leaf node, c q is the middle node of the second type subgraph of the query graph structure data, v q is the root node of the second type subgraph of the query graph structure data, u q, n c q are the leaf nodes u q, n and the edge of the intermediate node c q , c q v q is the edge of the intermediate node c q and the root node v q ;
weight(ud,icd)表示第二类子图中叶子节点与中间节点的边统计权重;weight(u d, ic d ) represents the edge statistical weight between leaf nodes and intermediate nodes in the second type of subgraph;
weight(cdvd)表示第二类子图中根节点与中间节点的边预设权重;weight(c d v d ) represents the preset weight of the edge between the root node and the intermediate node in the second type of subgraph;
算符其中,α、β为常量。operator Among them, α and β are constants.
表一Table I
在上述确定第一类子图的相似度的实施例中,在获取所述历史图结构数据与所述查询图结构数据中第一类子图的相似度之前,还可以包括根据叶子节点的数据类型,而选择相应的方式确定第一类子图的各叶子节点相似度的步骤。具体地,可以是获取所述历史图结构数据与所述查询图结构数据中,所述第一类子图的叶子节点的数据类型。参见表一,是本发明实施例提供的四种数据类型可选的相似度确定模型。表一所示复杂数据类型对应的相似度确定模型中,乘因子属性集为节点x的所有乘因子属性信息构成的集合,加因子属性集为节点x的所有加因子属性信息构成的集合,其中进行相似度比较的节点x、y具有一一对应的乘因子属性信息,以及一一对应的加因子属性信息。第一类子图的叶子节点的数据类型通常都是简单数据类型,例如表一中的数值数据类型、布尔数据类型以及文本数据类型。获取与所述数据类型相对应的目标相似度确定模型,对应关系例如参见图1。然后根据所述目标相似度确定模型,和所述历史图结构数据与所述查询图结构数据中所述第一类子图中叶子节点的描述信息,得到所述历史图结构数据与所述查询图结构数据中所述第一类子图的各叶子节点相似度。即以目标相似度确定模型对第一类子图中叶子节点的描述信息进行计算,得到第一类子图的各叶子节点相似度。其中数值数据类型例如是年龄、就诊时间等字段的描述信息。布尔数据类型例如是性别字段的描述信息(例如预设男为1,女为0)。文本数据类型例如是主诉、现病史等字段的描述信息。通过选择对应的目标相似度确定模型对各叶子节点相似度进行计算,提高叶子节点相似度的准确性,进而提高最终查找结构的准确性。In the above embodiment of determining the similarity of the first type of subgraphs, before acquiring the similarity between the historical graph structure data and the first type of subgraphs in the query graph structure data, the data according to the leaf node may also be included. The steps of determining the similarity of each leaf node of the first type of subgraph by selecting the corresponding method. Specifically, the data types of the leaf nodes of the first type of subgraphs in the historical graph structure data and the query graph structure data may be obtained. Referring to Table 1, there are four optional similarity determination models for data types provided by the embodiment of the present invention. In the similarity determination model corresponding to the complex data type shown in Table 1, the multiplication factor attribute set is the set composed of all multiplication factor attribute information of node x, and the multiplication factor attribute set is the set composed of all multiplication factor attribute information of node x, where Nodes x and y for similarity comparison have one-to-one corresponding multiplication factor attribute information and one-to-one corresponding adding factor attribute information. The data types of the leaf nodes of the first type of subgraphs are usually simple data types, such as numeric data types, Boolean data types, and text data types in Table 1. A target similarity determination model corresponding to the data type is obtained, for example, see FIG. 1 for the corresponding relationship. Then, the model is determined according to the target similarity, and the description information of the leaf nodes in the first type of subgraphs in the historical graph structure data and the query graph structure data, to obtain the historical graph structure data and the query The similarity of each leaf node of the first type of subgraph in the graph structure data. That is, the description information of the leaf nodes in the first type of subgraph is calculated by the target similarity determination model, and the similarity of each leaf node of the first type of subgraph is obtained. The numerical data type is, for example, the description information of fields such as age and medical treatment time. The Boolean data type is, for example, the description information of the gender field (for example, the default value is 1 for male and 0 for female). The text data type is, for example, the description information of fields such as chief complaint and current illness history. By selecting the corresponding target similarity determination model to calculate the similarity of each leaf node, the accuracy of the similarity of the leaf nodes is improved, and the accuracy of the final search structure is further improved.
在上述确定第二类子图相似度的实施例中,在获取历史图结构数据与查询图结构数据中第二类子图相似度之前,还可以包括根据叶子节点的数据类型,而选择相应的方式确定第二类子图的各叶子节点相似度的步骤。具体地,可以是获取所述历史图结构数据与所述查询图结构数据中,所述第二类子图的叶子节点的数据类型。第二类子图的叶子节点的数据类型可能是表一中四种数据类型的任一种。其中,数值数据类型、布尔数据类型以及文本数据类型为简单数据类型,除此以外为复杂数据类型。In the above embodiment of determining the similarity of the second type of subgraph, before acquiring the similarity between the historical graph structure data and the second type of subgraph in the query graph structure data, it may also include selecting the corresponding subgraph according to the data type of the leaf node. The steps of determining the similarity of each leaf node of the second type of subgraph. Specifically, the data types of the leaf nodes of the second type of subgraphs in the historical graph structure data and the query graph structure data may be obtained. The data type of the leaf node of the second type of subgraph may be any of the four data types in Table 1. Among them, the numeric data type, the Boolean data type, and the text data type are simple data types, and the others are complex data types.
若所述数据类型为简单数据类型,则根据简单数据类型的所述叶子节点对应的所述数据类型,确定目标相似度确定模型。根据所述目标相似度确定模型,和简单数据类型的所述叶子节点的描述信息,得到所述历史图结构数据与所述查询图结构数据中所述第二类子图的各简单数据类型的叶子节点相似度。简单数据类型的第二类子图叶子节点相似度计算的实现方式,与第一类子图叶子节点相似度计算的实现方式类似,在此不做赘述。If the data type is a simple data type, a target similarity determination model is determined according to the data type corresponding to the leaf node of the simple data type. According to the target similarity determination model and the description information of the leaf nodes of the simple data type, the relationship between the historical graph structure data and the simple data types of the second type of subgraphs in the query graph structure data is obtained. Leaf node similarity. The implementation manner of the similarity calculation of the leaf nodes of the second type of subgraphs of the simple data type is similar to the implementation manner of the similarity calculation of the leaf nodes of the first type of subgraphs, and will not be repeated here.
若所述数据类型为复杂数据类型,则获取复杂数据类型的所述叶子节点对应的乘因子属性信息和加因子属性信息。复杂数据类型例如查询图结构数据中以“症状”为中间节点的第二类子图中,叶子节点包括“咳嗽”,“咳嗽”的乘因子属性信息为“1”(表示有),其加因子属性信息例如为“严重”、“24小时”。那么,可以将复杂数据类型的所述叶子节点对应的各所述乘因子属性信息的相似度的乘积,与复杂数据类型的所述叶子节点对应的各所述加因子属性信息的相似度的加权和之乘积,作为所述历史图结构数据与所述查询图结构数据中所述第二类子图的各复杂数据类型的叶子节点相似度。例如,历史图结构数据对应的乘因子属性信息为“1”(表示有),其加因子属性信息例如为“轻微”、“1小时”,那么以布尔数据类型的相似度确定模型对乘因子(1,1)进行计算,以文本数据类型的相似度确定模型对加因子属性信息(“验严重”,“轻微”)进行计算,以数值数据类型的相似度确定模型对加因子属性信息(“24小时”,“1小时”)进行计算。第二类子图中各叶子节点的描述信息对应乘因子属性信息还是加因子属性信息,都可以是预先设置对应关系的。在检测到第二类子图中各叶子节点的描述信息,根据第二类子图中各叶子节点的描述信息在预设对应关系中的记录,得到第二类子图中各叶子节点对应的乘因子属性信息、加因子属性信息。If the data type is a complex data type, the multiplication factor attribute information and the addition factor attribute information corresponding to the leaf node of the complex data type are acquired. For complex data types, such as the second type of subgraph with "symptom" as the intermediate node in the query graph structure data, the leaf node includes "cough", and the multiplication factor attribute information of "cough" is "1" (indicating that there is), which adds The factor attribute information is, for example, "severe" and "24 hours". Then, the product of the similarity of the attribute information of the multiplication factor corresponding to the leaf node of the complex data type can be weighted with the similarity of the attribute information of the multiplication factor corresponding to the leaf node of the complex data type The product of the sum is used as the similarity of leaf nodes of each complex data type of the second type of subgraph in the historical graph structure data and the query graph structure data. For example, if the attribute information of the multiplication factor corresponding to the historical graph structure data is "1" (indicating that there is), and the attribute information of the addition factor is, for example, "slight" and "1 hour", then the similarity of the Boolean data type is used to determine the multiplication factor of the model. (1,1) Calculate, use the similarity determination model of the text data type to calculate the attribute information of the addition factor (“very serious”, “slight”), and use the similarity determination model of the numerical data type to determine the attribute information of the addition factor ( "24 hours", "1 hour") to calculate. The description information of each leaf node in the second type of subgraph corresponds to the multiplication factor attribute information or the addition factor attribute information, and the corresponding relationship may be preset. When the description information of each leaf node in the second type of subgraph is detected, according to the record of the description information of each leaf node in the second type of subgraph in the preset corresponding relationship, the corresponding information of each leaf node in the second type of subgraph is obtained. Multiply factor attribute information, plus factor attribute information.
上述相似病历查找方法的实施例中,实现了相似病历的匹配和重排,并实现了病历特征之间关系的度量计算,病历各个特征属性的度量计算等,由此提高相似病历查找的准确性。In the embodiment of the above method for finding similar medical records, matching and rearranging of similar medical records are realized, and the metric calculation of the relationship between the characteristics of the medical records, the metric calculation of each feature attribute of the medical records, etc. are realized, thereby improving the accuracy of searching for similar medical records. .
         参见图5,是本发明实施例提供的一种相似病历查找装置结构示意图,如图5所示的相似病历查找装置50,包括:Referring to FIG. 5, it is a schematic structural diagram of a similar medical record search device provided by an embodiment of the present invention. The similar medical 
         病历获取模块51,用于获取查询病历数据和多个历史病历数据。The medical 
         图结构化模块52,用于获取所述查询病历数据对应的查询图结构数据,以及各所述历史病历数据对应的历史图结构数据,其中,所述查询图结构数据和所述历史图结构数据都包括第一类子图和第二类子图,所述第一类子图的中间节点为病历字段类别,所述第二类子图的中间节点和叶子节点是对所述第一类子图进行特征识别得到的。The 
         处理模块53,用于根据各所述历史图结构数据与所述查询图结构数据中根节点相似度、第一类子图相似度和第二类子图相似度,获取各所述历史图结构数据与所述查询图结构数据的相似程度。The 
         选择模块54,用于根据预设选择规则和所述相似程度,在所述多个历史病历数据中确定所述查询病历数据的相似病历查找结果,其中,所述相似病历查找结果对应的所述历史图结构数据,具有满足所述预设选择规则的所述相似程度。The 
图5所示实施例的相似病历查找装置对应地可用于执行图1所示方法实施例中服务器执行的步骤,其实现原理和技术效果类似,此处不再赘述。The apparatus for searching similar medical records in the embodiment shown in FIG. 5 can correspondingly be used to execute the steps executed by the server in the method embodiment shown in FIG. 1 , and the implementation principles and technical effects thereof are similar, which will not be repeated here.
         可选地,处理模块53,用于根据所述历史图结构数据与所述查询图结构数据中的根节点相似度,确定所述历史图结构数据与所述查询图结构数据的第一相似度量值;根据叶子节点的显著性类别,在所述第一类子图中,确定第一类显著子图和第一类非显著子图;根据所述历史图结构数据与所述查询图结构数据中的第一类显著子图相似度,确定所述历史图结构数据与所述查询图结构数据的第二相似度量值;根据所述历史图结构数据与所述查询图结构数据中的第一类非显著子图相似度和第二类子图相似度,确定所述历史图结构数据与所述查询图结构数据的第三相似度量值;根据各所述历史图结构数据与所述查询图结构数据的所述第一相似度量值、所述第二相似度量值和所述第三相似度量值,确定各所述历史图结构数据与所述查询图结构数据的相似程度。Optionally, the 
         可选地,处理模块53,在所述根据所述历史图结构数据与所述查询图结构数据中的第一类显著子图相似度,确定所述历史图结构数据与所述查询图结构数据的第二相似度量值;以及所述根据所述历史图结构数据与所述查询图结构数据中的第一类非显著子图相似度和第二类子图相似度,确定所述历史图结构数据与所述查询图结构数据的第三相似度量值之前,还用于在所述历史图结构数据与所述查询图结构数据的具有相同中间节点的所述第一类子图中,根据所述第一类子图中叶子节点相似度、根节点与叶子节点的边预设权重以及根节点相似度,获取所述历史图结构数据与所述查询图结构数据中第一类子图的相似度,其中,所述第一类子图的相似度包括所述第一类显著子图相似度和所述第一类非显著子图相似度。Optionally, the 
         可选地,处理模块53,用于将所述历史图结构数据与所述查询图结构数据的各第一类显著子图相似度的乘积,作为所述历史图结构数据与所述查询图结构数据的第二相似度量值。Optionally, the 
         可选地,处理模块53,在所述根据所述历史图结构数据与所述查询图结构数据中的第一类非显著子图相似度和第二类子图相似度,确定所述历史图结构数据与所述查询图结构数据的第三相似度量值之前,还用于获取所述第二类子图中叶子节点与中间节点的边统计权重;在所述历史图结构数据与所述查询图结构数据的具有相同中间节点的所述第二类子图中,根据所述第二类子图中叶子节点相似度、根节点与中间节点的边预设权重、叶子节点与中间节点的边统计权重以及根节点相似度,获取所述历史图结构数据与所述查询图结构数据中第二类子图相似度。Optionally, the 
         可选地,处理模块53,用于以下列公式三获取所述历史图结构数据与所述查询图结构数据中第二类子图相似度:Optionally, the 
其中,历史图结构数据的第二类子图为Sd={ud,1,ud,2,…,ud,m,cd,vd},{ud,1cd,ud, 2cd,…,ud,mcd,cdvd}),ud,m为历史图结构数据的第二类子图的第m个叶子节点,cd为历史图结构数据的第二类子图的中间节点,vd为历史图结构数据的第二类子图的根节点,ud,mcd为叶子节点ud,与中间节点cd的边,cdvd为中间节点cd与根节点vd的边;查询图结构数据的第二类子图为Sq={uq,1,uq,2,…,uq,n,cq,vq},{uq,1cq,uq,2cq,…,uq,ncq,cqvq}),uq,n为查询图结构数据的第二类子图的第n个叶子节点,cq为查询图结构数据的第二类子图的中间节点,vq为查询图结构数据的第二类子图的根节点,uq,cq为叶子节点uq,与中间节点cq的边,cqvq为中间节点cq与根节点vq的边;Among them, the second type of subgraph of historical graph structure data is S d ={ud ,1 ,ud ,2 ,...,ud ,m , cd ,v d } ,{ud ,1 cd ,u d, 2 c d ,…,u d,m c d ,c d v d }), ud ,m is the mth leaf node of the second type subgraph of the historical graph structure data, c d is the historical graph structure The intermediate node of the second type of subgraph of the data, v d is the root node of the second type of subgraph of the historical graph structure data, ud , m c d are the leaf nodes ud , and the edge of the intermediate node c d , c d v d is the edge between the intermediate node c d and the root node v d ; the second type of subgraph of the query graph structure data is S q ={u q,1 ,u q,2 ,...,u q,n ,c q , v q },{u q,1 c q ,u q,2 c q ,…,u q,n c q ,c q v q }), u q,n is the second type of subgraph of query graph structure data The nth leaf node of , c q is the middle node of the second type subgraph of the query graph structure data, v q is the root node of the second type subgraph of the query graph structure data, u q, c q are the leaf nodes u q, the edge with the intermediate node c q , c q v q is the edge between the intermediate node c q and the root node v q ;
weight(ud,icd)表示第二类子图中叶子节点与中间节点的边统计权重;weight(u d, ic d ) represents the edge statistical weight between leaf nodes and intermediate nodes in the second type of subgraph;
weight(cdvd)表示第二类子图中根节点与中间节点的边预设权重;weight(c d v d ) represents the preset weight of the edge between the root node and the intermediate node in the second type of subgraph;
算符其中,α、β为常量。operator Among them, α and β are constants.
         可选地,处理模块53,用于获取所述第二类子图中中间节点之下的至少一个叶子节点;获取所述第二类子图中根节点与所述至少一个叶子节点的互信息;获取所述第二类子图中根节点与所述至少一个叶子节点的卡方统计值;将与所述中间节点相对应的所述互信息值和所述卡方统计值的加权和,作为所述第二类子图中所述叶子节点与中间节点的边统计权重。Optionally, the 
         可选地,处理模块53,用于将所述历史图结构数据与所述查询图结构数据中的各所述第一类非显著子图相似度、各所述第二类子图相似度之和,作为所述历史图结构数据与所述查询图结构数据的第三相似度量值。Optionally, the 
         可选地,处理模块53,在所述在所述历史图结构数据与所述查询图结构数据的具有相同中间节点的所述第一类子图中,根据所述第一类子图中叶子节点相似度、根节点与叶子节点的边预设权重以及根节点相似度,获取所述历史图结构数据与所述查询图结构数据中第一类子图的相似度,其中,所述第一类子图的相似度包括所述第一类显著子图相似度和所述第一类非显著子图相似度之前,还用于获取所述历史图结构数据与所述查询图结构数据中,所述第一类子图的叶子节点的数据类型;获取与所述数据类型相对应的目标相似度确定模型;根据所述目标相似度确定模型,和所述历史图结构数据与所述查询图结构数据中所述第一类子图中叶子节点的描述信息,得到所述历史图结构数据与所述查询图结构数据中所述第一类子图的各叶子节点相似度。Optionally, the 
         可选地,处理模块53,在所述在所述历史图结构数据与所述查询图结构数据的具有相同中间节点的所述第二类子图中,根据所述第二类子图中叶子节点相似度、根节点与中间节点的边预设权重、叶子节点与中间节点的边统计权重以及根节点相似度,获取所述历史图结构数据与所述查询图结构数据中第二类子图相似度之前,还用于获取所述历史图结构数据与所述查询图结构数据中,所述第二类子图的叶子节点的数据类型;若所述数据类型为简单数据类型,则根据简单数据类型的所述叶子节点对应的所述数据类型,确定目标相似度确定模型;根据所述目标相似度确定模型,和简单数据类型的所述叶子节点的描述信息,得到所述历史图结构数据与所述查询图结构数据中所述第二类子图的各简单数据类型的叶子节点相似度;若所述数据类型为复杂数据类型,则获取复杂数据类型的所述叶子节点对应的乘因子属性信息和加因子属性信息;将复杂数据类型的所述叶子节点对应的各所述乘因子属性信息的相似度的乘积,与复杂数据类型的所述叶子节点对应的各所述加因子属性信息的相似度的加权和之乘积,作为所述历史图结构数据与所述查询图结构数据中所述第二类子图的各复杂数据类型的叶子节点相似度。Optionally, the 
         可选地,处理模块53,在所述根据各所述历史图结构数据与所述查询图结构数据中根节点相似度、第一类子图相似度和第二类子图相似度,获取各所述历史图结构数据与所述查询图结构数据的相似程度之前,还用于根据预设的文本型相似度确定模型以及所述根节点的描述信息,确定各所述历史图结构数据与所述查询图结构数据中根节点相似度。Optionally, the 
         可选地,图结构化模块52,用于将所述查询病历的主诊断作为根节点的描述信息;将所述查询病历包括的病历字段类别作为第一类中间节点,并将所述字段类别包括的字段分别作为所述字段类别的第一类叶子节点;将所述字段对应的描述信息,作为所述第一类叶子节点的描述信息;将预设的特征作为第二类中间节点,并以所述特征对各所述字段及各所述字段的描述信息进行自然语言理解NLU的识别,得到所述特征的描述信息,或者得到所述特征的描述信息及所述描述信息的加因子属性信息或乘因子属性信息;将所述特征的描述信息作为所述特征的第二类叶子节点,其中,具有所述加因子属性信息或乘因子属性信息的第二类叶子节点的数据类型为复杂数据类型;根据所述根节点、所述第一类中间节点、所述第一类叶子节点、所述第二类中间节点以及所述第二类叶子节点,得到所述查询病历数据对应的查询图结构数据,其中,所述根节点、所述第一类中间节点以及所述第一类叶子节点形成第一类子图;所述根节点、所述第二类中间节点以及所述第二类叶子节点形成第二类子图。Optionally, the 
         可选地,病历获取模块51,用于获取查询病历数据;根据所述查询病历数据,确定全文检索关键词;根据所述全文检索关键在病历库中进行病历全文查找,得到包含所述全文检索关键词的多个历史病历数据。Optionally, the medical 
         参见图6,是本发明实施例提供的一种设备的硬件结构示意图,该设备60包括:处理器61、存储器62和计算机程序;其中6 is a schematic diagram of a hardware structure of a device provided by an embodiment of the present invention, the 
         存储器62,用于存储所述计算机程序,该存储器还可以是闪存(flash)。所述计算机程序例如是实现上述方法的应用程序、功能模块等。The 
         处理器61,用于执行所述存储器存储的计算机程序,以实现上述相似病历查找方法中服务器执行的各个步骤。具体可以参见前面方法实施例中的相关描述。The 
         可选地,存储器62既可以是独立的,也可以跟处理器61集成在一起。Optionally, the 
         当所述存储器62是独立于处理器61之外的器件时,所述设备还可以包括:When the 
         总线63,用于连接所述存储器62和处理器61。图6的设备还可以进一步包括发送器(图中未画出),用于向外发送处理器61获取的查询病历数据的相似病历查找结果。The 
本发明还提供一种可读存储介质,所述可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时用于实现上述的各种实施方式提供的相似病历查找方法。The present invention further provides a readable storage medium, where a computer program is stored in the readable storage medium, and when the computer program is executed by a processor, is used to implement the search methods for similar medical records provided by the above-mentioned various embodiments.
其中,可读存储介质可以是计算机存储介质,也可以是通信介质。通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。计算机存储介质可以是通用或专用计算机能够存取的任何可用介质。例如,可读存储介质耦合至处理器,从而使处理器能够从该可读存储介质读取信息,且可向该可读存储介质写入信息。当然,可读存储介质也可以是处理器的组成部分。处理器和可读存储介质可以位于专用集成电路(ApplicationSpecific Integrated Circuits,简称:ASIC)中。另外,该ASIC可以位于用户设备中。当然,处理器和可读存储介质也可以作为分立组件存在于通信设备中。可读存储介质可以是只读存储器(ROM)、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media can be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium can also be an integral part of the processor. The processor and the readable storage medium may be located in application specific integrated circuits (Application Specific Integrated Circuits, ASIC for short). Alternatively, the ASIC may be located in the user equipment. Of course, the processor and the readable storage medium may also exist in the communication device as discrete components. The readable storage medium may be read only memory (ROM), random access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and the like.
本发明还提供一种程序产品,该程序产品包括执行指令,该执行指令存储在可读存储介质中。设备的至少一个处理器可以从可读存储介质读取该执行指令,至少一个处理器执行该执行指令使得设备实施上述的各种实施方式提供的相似病历查找方法。The present invention also provides a program product including execution instructions stored in a readable storage medium. At least one processor of the device can read the execution instruction from the readable storage medium, and the execution of the execution instruction by the at least one processor causes the device to implement the similar medical record search method provided by the above-mentioned various embodiments.
在上述设备的实施例中,应理解,处理器可以是中央处理单元(英文:CentralProcessing Unit,简称:CPU),还可以是其他通用处理器、数字信号处理器(英文:DigitalSignal Processor,简称:DSP)、专用集成电路(英文:Application Specific IntegratedCircuit,简称:ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。In the embodiment of the above device, it should be understood that the processor may be a central processing unit (English: Central Processing Unit, referred to as: CPU), or other general-purpose processors, digital signal processors (English: Digital Signal Processor, referred to as: DSP) ), application specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC) and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the present invention can be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features thereof can be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention. scope.
Claims (16)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201910557217.5A CN110299209B (en) | 2019-06-25 | 2019-06-25 | Similar medical record search method, device, device and readable storage medium | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201910557217.5A CN110299209B (en) | 2019-06-25 | 2019-06-25 | Similar medical record search method, device, device and readable storage medium | 
Publications (2)
| Publication Number | Publication Date | 
|---|---|
| CN110299209A CN110299209A (en) | 2019-10-01 | 
| CN110299209B true CN110299209B (en) | 2022-05-20 | 
Family
ID=68028806
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201910557217.5A Active CN110299209B (en) | 2019-06-25 | 2019-06-25 | Similar medical record search method, device, device and readable storage medium | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN110299209B (en) | 
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN111370086A (en) * | 2020-02-27 | 2020-07-03 | 平安国际智慧城市科技股份有限公司 | Electronic case detection method, electronic case detection device, computer equipment and storage medium | 
| CN113767401B (en) * | 2020-04-03 | 2025-01-03 | 清华大学 | Network representation learning method across medical data sources | 
| CN111599427B (en) * | 2020-05-14 | 2023-03-31 | 郑州大学第一附属医院 | Recommendation method and device for unified diagnosis, electronic equipment and storage medium | 
| CN111613339B (en) * | 2020-05-15 | 2021-07-09 | 山东大学 | A method and system for searching similar medical records based on deep learning | 
| CN111767707B (en) * | 2020-06-30 | 2023-10-31 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for detecting Leideogue cases | 
| CN113010746B (en) * | 2021-03-19 | 2023-08-29 | 厦门大学 | Medical record graph sequence retrieval method and system based on sub-tree inverted index | 
| CN113590777B (en) * | 2021-06-30 | 2024-09-06 | 北京百度网讯科技有限公司 | Text information processing method, apparatus, electronic device and storage medium | 
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| JP2017174406A (en) * | 2016-03-24 | 2017-09-28 | 富士通株式会社 | Healthcare risk estimation system and method | 
| CN108388580A (en) * | 2018-01-24 | 2018-08-10 | 平安医疗健康管理股份有限公司 | Merge the dynamic knowledge collection of illustrative plates update method of medical knowledge and application case | 
| CN109215754A (en) * | 2018-09-10 | 2019-01-15 | 平安科技(深圳)有限公司 | Medical record data processing method, device, computer equipment and storage medium | 
| CN109785968A (en) * | 2018-12-27 | 2019-05-21 | 东软集团股份有限公司 | A kind of event prediction method, apparatus, equipment and program product | 
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US8589400B2 (en) * | 2001-11-30 | 2013-11-19 | Intelligent Medical Objects, Inc. | Longitudinal electronic record system and method | 
| US20130096944A1 (en) * | 2011-10-13 | 2013-04-18 | The Board of Trustees of the Leland Stanford, Junior, University | Method and System for Ontology Based Analytics | 
- 
        2019
        - 2019-06-25 CN CN201910557217.5A patent/CN110299209B/en active Active
 
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| JP2017174406A (en) * | 2016-03-24 | 2017-09-28 | 富士通株式会社 | Healthcare risk estimation system and method | 
| CN108388580A (en) * | 2018-01-24 | 2018-08-10 | 平安医疗健康管理股份有限公司 | Merge the dynamic knowledge collection of illustrative plates update method of medical knowledge and application case | 
| CN109215754A (en) * | 2018-09-10 | 2019-01-15 | 平安科技(深圳)有限公司 | Medical record data processing method, device, computer equipment and storage medium | 
| CN109785968A (en) * | 2018-12-27 | 2019-05-21 | 东软集团股份有限公司 | A kind of event prediction method, apparatus, equipment and program product | 
Non-Patent Citations (1)
| Title | 
|---|
| 基于用户意图分析的电子病历检索技术研究;王超;《中国优秀硕士学位论文全文数据库 医药卫生科技辑》;20170815;E053-70 * | 
Also Published As
| Publication number | Publication date | 
|---|---|
| CN110299209A (en) | 2019-10-01 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| CN110299209B (en) | Similar medical record search method, device, device and readable storage medium | |
| CN110390044B (en) | Method and equipment for searching similar network pages | |
| CN109906449B (en) | A search method and device | |
| US20210350915A1 (en) | Universal physician ranking system based on an integrative model of physician expertise | |
| WO2021139262A1 (en) | Document mesh term aggregation method and apparatus, computer device, and readable storage medium | |
| CN110019474B (en) | Automatic synonymy data association method and device in heterogeneous database and electronic equipment | |
| KR101897080B1 (en) | Method and Apparatus for generating association rules between medical words in medical record document | |
| CN112329460A (en) | Text topic clustering method, device, equipment and storage medium | |
| CN110413751B (en) | Medicine searching method, device, terminal equipment and storage medium | |
| CN112635072A (en) | ICU (intensive care unit) similar case retrieval method and system based on similarity calculation and storage medium | |
| CN110705307A (en) | Information change index monitoring method and device, computer equipment and storage medium | |
| CN114330335A (en) | Keyword extraction method, device, equipment and storage medium | |
| CN113707302A (en) | Service recommendation method, device, equipment and storage medium based on associated information | |
| CN115617978A (en) | Index name retrieval method, device, electronic equipment and storage medium | |
| CN115631823A (en) | Similar case recommendation method and system | |
| WO2025194913A1 (en) | Data query method and apparatus, computer device, and storage medium | |
| WO2022227171A1 (en) | Method and apparatus for extracting key information, electronic device, and medium | |
| WO2021174923A1 (en) | Concept word sequence generation method, apparatus, computer device, and storage medium | |
| CN109997201A (en) | For the accurate clinical decision support using data-driven method of plurality of medical knowledge module | |
| JP2011159078A (en) | Information processing apparatus, determination program and determination method | |
| JP6210865B2 (en) | Data search system and data search method | |
| CN114741489A (en) | Document retrieval method, device, storage medium, and electronic device | |
| CN115374775A (en) | Method, device and equipment for determining text similarity and storage medium | |
| CN109144999B (en) | Data positioning method, device, storage medium and program product | |
| CN113254650B (en) | Knowledge graph-based assessment pushing method, system, equipment and medium | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |