[go: up one dir, main page]

CN107480139A - The bulk composition extracting method and device of medical field - Google Patents

The bulk composition extracting method and device of medical field Download PDF

Info

Publication number
CN107480139A
CN107480139A CN201710705003.9A CN201710705003A CN107480139A CN 107480139 A CN107480139 A CN 107480139A CN 201710705003 A CN201710705003 A CN 201710705003A CN 107480139 A CN107480139 A CN 107480139A
Authority
CN
China
Prior art keywords
template
successful
match
language material
extracted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710705003.9A
Other languages
Chinese (zh)
Inventor
熊子奇
姚佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen City Artificial Intelligence Technology Co Secluded Orchid In A Deserted Valley
Original Assignee
Shenzhen City Artificial Intelligence Technology Co Secluded Orchid In A Deserted Valley
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen City Artificial Intelligence Technology Co Secluded Orchid In A Deserted Valley filed Critical Shenzhen City Artificial Intelligence Technology Co Secluded Orchid In A Deserted Valley
Priority to CN201710705003.9A priority Critical patent/CN107480139A/en
Publication of CN107480139A publication Critical patent/CN107480139A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The bulk composition extracting method and device of a kind of medical field provided in an embodiment of the present invention, are related to medical field.Methods described includes obtaining language material to be extracted;The language material to be extracted is matched respectively with the multiple template to prestore again, obtains matching result corresponding to the template and the template that the match is successful that the match is successful;Then judge whether the template that the match is successful meets default extraction conditions, if meet, obtain and meet matching result corresponding to the template that the match is successful of default extraction conditions, to be used as bulk composition corresponding to the language material to be extracted, language material is matched by template with this, the bulk composition of language material is extracted, is realized simply, rapidly and efficiently.

Description

The bulk composition extracting method and device of medical field
Technical field
The present invention relates to medical field, in particular to the bulk composition extracting method and device of a kind of medical field.
Background technology
In the medical field, machine can automatically identify the related main body section of medical treatment in user's description and correspondingly State (for example " I had a stomachache yesterday ", corresponding main body section and state (bulk composition) they are " stomach-ache ").This Kind identification process is referred to as the extraction of medical bodies composition, the category that relation belonging to extracts.Relation extraction uses base in the prior art Description rule is defined to extract relation by domain expert in the abstracting method of rule, this method needs a large amount of artificial mark numbers According to, and it is difficult in adapt to frontier;Occur the Relation extraction method based on machine learning, this method flow complexity etc. afterwards.
The content of the invention
It is an object of the invention to provide the bulk composition extracting method and device of a kind of medical field, to improve above-mentioned ask Topic.To achieve these goals, the technical scheme that the present invention takes is as follows:
In a first aspect, the embodiments of the invention provide a kind of bulk composition extracting method of medical field, methods described bag Include:Obtain language material to be extracted;The language material to be extracted is matched respectively with the multiple template to prestore, obtain matching into Matching result corresponding to the template of work(and the template that the match is successful;Judge whether the template that the match is successful meets to preset Extraction conditions, if satisfied, obtain matching result corresponding to the template that the match is successful for meeting default extraction conditions, using as Bulk composition corresponding to the language material to be extracted.
Second aspect, the embodiments of the invention provide a kind of bulk composition extraction element of medical field, described device bag Include first acquisition unit, matching unit and second acquisition unit.First acquisition unit, for obtaining language material to be extracted.Matching Unit, for the language material to be extracted to be matched respectively with the multiple template to prestore, obtain the template that the match is successful and Matching result corresponding to the template that the match is successful.Second acquisition unit, for whether judging the template that the match is successful Meet default extraction conditions, if satisfied, obtaining matching knot corresponding to the template that the match is successful for meeting default extraction conditions Fruit, to be used as bulk composition corresponding to the language material to be extracted.
The bulk composition extracting method and device of a kind of medical field provided in an embodiment of the present invention, obtain language to be extracted Material;The language material to be extracted is matched respectively with the multiple template to prestore again, obtains the template that the match is successful and described Matching result corresponding to the template that the match is successful;Then judge whether the template that the match is successful meets default extraction conditions, If satisfied, matching result corresponding to the template that the match is successful for meeting default extraction conditions is obtained, to wait to carry as described Bulk composition corresponding to the language material taken, language material is matched by template with this, extracts the bulk composition of language material, realized simply, Rapidly and efficiently.
Other features and advantages of the present invention will illustrate in subsequent specification, also, partly become from specification It is clear that or by implementing understanding of the embodiment of the present invention.The purpose of the present invention and other advantages can be by saying what is write Specifically noted structure is realized and obtained in bright book, claims and accompanying drawing.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by embodiment it is required use it is attached Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 is the structured flowchart of electronic equipment provided in an embodiment of the present invention;
Fig. 2 is the flow chart of the bulk composition extracting method of medical field provided in an embodiment of the present invention;
Fig. 3 is the S220 sub-process figures of the bulk composition extracting method of medical field provided in an embodiment of the present invention;
Fig. 4 is to obtain the multiple template to prestore in the bulk composition extracting method of medical field provided in an embodiment of the present invention Flow chart;
Fig. 5 is the structured flowchart of the bulk composition extraction element of medical field provided in an embodiment of the present invention.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, rather than whole embodiments.The present invention implementation being generally described and illustrated herein in the accompanying drawings The component of example can be configured to arrange and design with a variety of.Therefore, the reality of the invention to providing in the accompanying drawings below The detailed description for applying example is not intended to limit the scope of claimed invention, but is merely representative of the selected implementation of the present invention Example.Based on the embodiment in the present invention, what those of ordinary skill in the art were obtained under the premise of creative work is not made Every other embodiment, belongs to the scope of protection of the invention.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent accompanying drawing in individual accompanying drawing.Meanwhile the present invention's In description, term " first ", " second " etc. are only used for distinguishing description, and it is not intended that instruction or hint relative importance.
Fig. 1 shows a kind of structured flowchart for the electronic equipment 100 that can be applied in the embodiment of the present invention.As shown in figure 1, Electronic equipment 100 can include memory 102, storage control 104, one or more (one is only shown in Fig. 1) processors 106th, Peripheral Interface 108, input/output module 110, audio-frequency module 112, display module 114, radio-frequency module 116 and medical field Bulk composition extraction element.
Memory 102, storage control 104, processor 106, Peripheral Interface 108, input/output module 110, audio mould Directly or indirectly electrically connected between block 112, display module 114,116 each element of radio-frequency module, with realize the transmission of data or Interaction.For example, electrical connection can be realized by one or more communication bus or signal bus between these elements.Medical field Bulk composition extracting method include at least one storage being stored in the form of software or firmware (firmware) respectively Software function module in device 102, for example, the software function module that includes of bulk composition extraction element of the medical field or Computer program.
Memory 102 can store various software programs and module, such as the medical field that the embodiment of the present application provides Programmed instruction/module corresponding to bulk composition extracting method and device.Processor 106 is by running storage in the memory 102 Software program and module, so as to perform various function application and data processing, that is, realize the doctor in the embodiment of the present application The bulk composition extracting method in treatment field.
Memory 102 can include but is not limited to random access memory (Random Access Memory, RAM), only Read memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..
Processor 106 can be a kind of IC chip, have signal handling capacity.Above-mentioned processor can be general Processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (DSP), application specific integrated circuit (ASIC), ready-made programmable Gate array (FPGA) either other PLDs, discrete gate or transistor logic, discrete hardware components.It can To realize or perform disclosed each method, step and the logic diagram in the embodiment of the present application.General processor can be micro- Processor or the processor can also be any conventional processors etc..
Various input/output devices are coupled to processor 106 and memory 102 by the Peripheral Interface 108.At some In embodiment, Peripheral Interface 108, processor 106 and storage control 104 can be realized in one single chip.Other one In a little examples, they can be realized by independent chip respectively.
Input/output module 110 is used to be supplied to user input data to realize interacting for user and electronic equipment 100.It is described Input/output module 110 may be, but not limited to, mouse and keyboard etc..
Audio-frequency module 112 provides a user COBBAIF, and it may include one or more microphones, one or more raises Sound device and voicefrequency circuit.
Display module 114 provides an interactive interface (such as user interface) between electronic equipment 100 and user Or referred to for display image data to user.In the present embodiment, the display module 114 can be liquid crystal display or touch Control display.If touch control display, it can be that the capacitance type touch control screen or resistance-type for supporting single-point and multi-point touch operation touch Control screen etc..Single-point and multi-point touch operation is supported to refer to that touch control display can sense on the touch control display one or more Individual opening position is with caused touch control operation, and the touch control operation that this is sensed transfers to processor 106 to be calculated and handled.
Radio-frequency module 116 is used to receiving and sending electromagnetic wave, realizes the mutual conversion of electromagnetic wave and electric signal, so that with Communication network or other equipment are communicated.
It is appreciated that structure shown in Fig. 1 is only to illustrate, electronic equipment 100 may also include it is more more than shown in Fig. 1 or Less component, or there is the configuration different from shown in Fig. 1.Each component shown in Fig. 1 can use hardware, software or its Combination is realized.
In the embodiment of the present invention, electronic equipment 100 can be used as user terminal, or as server.User terminal Can be PC (personal computer) computer, tablet personal computer, mobile phone, notebook computer, intelligent television, set top box, vehicle-mounted The terminal devices such as terminal.
First embodiment
Referring to Fig. 2, the embodiments of the invention provide a kind of bulk composition extracting method of medical field, methods described can With including:Step S200, step S210, step S220.
Step S200:Obtain language material to be extracted.
Step S210:The language material to be extracted is matched respectively with the multiple template to prestore, the match is successful for acquisition Template and the template that the match is successful corresponding to matching result.
As a kind of embodiment, the multiple template to prestore is the multiple template to be sorted according to length to prestore, based on step Rapid S210, further, by the language material to be extracted with it is described prestore according to length sequence multiple template in sequence Matched respectively, obtain matching result corresponding to the template and the template that the match is successful that the match is successful.In the present embodiment In, can be by longest common subsequence Longest Common Subsequence, LCS) algorithm is to the language to be extracted Expect to be matched with the multiple template to prestore.
In the present embodiment, the multiple template according to length sequence that prestores can be prestore according to length from length To short or from the multiple template for being short to long sequence.For example, the multiple template to prestore can be grouped, then wait to carry by described The language material taken is matched successively with the multiple template of the sequence according to length from long to short to prestore, and the match is successful for acquisition Template and the template that the match is successful corresponding to matching result.
Step S220:Judge whether the template that the match is successful meets default extraction conditions, if satisfied, it is pre- to obtain satisfaction If matching result corresponding to the template that the match is successful of extraction conditions, to be used as main body corresponding to the language material to be extracted Composition.
As a kind of embodiment, the default extraction conditions can include having one in the template that the match is successful With successful template, referring to Fig. 3, step S220 can include sub-step S221.
Sub-step S221:Judge whether in the template that the match is successful be a template that the match is successful, if so, output Matching result corresponding to one template that the match is successful, to be used as bulk composition corresponding to the language material to be extracted.
As another embodiment, the default extraction conditions can also include having in the template that the match is successful to Few two templates that the match is successful, referring to Fig. 3, step S220 can also include sub-step S222.
Sub-step S222:Judge whether there are at least two templates that the match is successful in the template that the match is successful, if so, Judge in each self-corresponding matching result of described at least two templates that the match is successful with the presence or absence of the matching that output length is most long As a result, if the most long matching result of output length be present, the most long matching result of the output length is obtained, to be treated as described Bulk composition corresponding to the language material of extraction.
Further, based on sub-step S222, sub-step S223:If in the absence of the most long matching result of output length, sentence It whether there is full composition template in disconnected described at least two templates that the match is successful, if in the presence of full composition template, obtain described complete Matching result corresponding to composition template, to be used as bulk composition corresponding to the language material to be extracted.
Further, based on sub-step S222, sub-step S224:If full composition template is not present, at least two described in acquisition Matching result corresponding to most compact template in the individual template that the match is successful, using as main body corresponding to the language material to be extracted into Point.
In the present embodiment, most compact template can be defined as keyword in sentence and be close to nearest template.
In addition, being based on sub-step S224, if in the absence of most compact template in described at least two templates that the match is successful, obtain Matching result corresponding to the template of the first position in the template that the match is successful is taken, to be used as the language material pair to be extracted The bulk composition answered.
Referring to Fig. 4, further, in order to obtain the multiple template to prestore, before step S200, methods described may be used also With including step S300, step S310, step S320, step S330, step S340 and step S350.
Step S300:Obtain multiple description language materials.
In the present embodiment, the description language material can be patient's description information.Crawled from the medical information website of specialty Doctor and patient talk with, and then patient's description information (sentence text) is cleaned, and are segmented and part-of-speech tagging, than Such as, " I had a stomachache yesterday " through participle and part-of-speech tagging after, be changed into " I/x yesterdays/t bellies/nmbw a little/nmm pains/ Nmz ", language material is in units of sentence.
Step S310:The multiple description language material is labeled respectively, obtains description each self-corresponding mark of language material Note result.
In the present embodiment, in the description language material after processing, often row is that in short, then the description language material after processing is entered Rower is noted, and marks out the bulk composition that should be extracted corresponding to each sentence (in units of word).With " I/x yesterdays/t bellies/ Nmbw a little/nmm pains/nmz " exemplified by, its corresponding bulk composition is probably " belly/nmbw pains/nmz ".
Step S320:Using the description language material as input, institute is obtained according to annotation results corresponding to the description language material State seed pattern corresponding to description language material.
In the present embodiment, template should include following several key elements:Matching unit, template content, matching result, template Mark.Wherein matching unit refers mainly to the particular content of each unit in template, such as the group of word or part of speech or word and part of speech Close;Template content is the sequence that template will match, and matching result is the composition of template output.Such as " [belly/nmbw, A little/nmm, pain/nmz]==>02;0;1 " is a template, and concrete meaning is:Modular unit is word+part of speech, template will The content matched somebody with somebody for " belly/nmbw a little/nmm pains/nmz ", the composition of output is " belly/nmbw pains/nmz " (0 2);Main body is " belly/nmbw ", template mark are " 1 ", are shown to be the template of a full composition.Using the description language material as input, according to Annotation results corresponding to the description language material obtain seed pattern corresponding to the description language material.This kind of seed pattern is by artificial Mark the template of generation.
Step S330:By a variety of default extended modes, the seed pattern is expanded, it is more after being expanded Individual pending template.
A variety of default extended modes can be expansion fashion, shrink mode, Total continuity mode and full composition mode.
Specifically, because mark language material is smaller, its corresponding seed pattern can not cover a large amount of descriptions in real world Information, so needing to expand template.
Template, which expands, uses following four mode:
Expansion fashion:For the mark sentence (former sentence) of each seed pattern and generation seed pattern:It is X to remember former sentence =[x1, x2 ..., xn] (n is former sentence length), seed pattern are S=[s1, s2 ..., sj] (j is seed pattern length), first Position (p1, p2) of the head and the tail unit in former sentence in grappling seed pattern, then selection is all (k ∈ [max (p1-2,0), min (p2+2, n)],) as new template.
Contraction mode:To the template and seed pattern after being handled by expansion fashion, one or more nonbodies are filtered out Composition (pressing part of speech) generates new template.
Total continuity mode:The each template respectively generated to seed pattern and expansion fashion, contraction mode, form one and own The whole continuous template in position of the composition in former sentence.
Full composition mode:The each template respectively generated to seed pattern and expansion fashion, contraction mode, form one and include The template of all bulk compositions in former sentence.
Step S340:The multiple pending template is given a mark according to longest common subsequence algorithm, described in acquisition Multiple each self-corresponding scores of pending template.
Based on step S340, further, to each pending template, traveled through successively by longest common subsequence algorithm The multiple description language material, obtain most long matching sequence corresponding to pending template described in the multiple description language material;By institute State corresponding to pending template that most long matching sequence is corresponding with the most long matching sequence to describe standard results corresponding to language material Contrasted and given a mark according to comparing result, obtain score corresponding to the pending template;Based on above-mentioned steps, described in acquisition Multiple each self-corresponding scores of pending template.
For example, on the basis of longest common subsequence (Longest Common Subsequence, LCS) algorithm, root A compact longest common subsequence (Compact Longest Common is realized according to own service demand Subsequence, CLCS) algorithm, the algorithm can return to most compact sequence in the case where there is multiple LCS.The algorithm can obtain To template and the most long matching sequence of sentence, then given a mark according to match condition.To each template, mark is traveled through successively All data (sentence) in language material, if (matching standard is template CLCS length and template to a sentence in template matches Whether length is equal), just contrasted, given a mark according to the content of matching result and annotation results.Normalizing can also finally be carried out Change.
Step S350:If score corresponding to the pending template is more than predetermined threshold value, it is more than predetermined threshold value described in acquisition Institute's template to be handled, with the multiple template to be prestored described in acquisition.
The predetermined threshold value can be set according to actual conditions.
It is possible to further sorted-by-length to the multiple template to prestore, so as to the longer template of priority match length. This, on word and part of speech basis, by template self study, with reference to template extended technology, can also obtain under small sample Preferable effect, avoid a large amount of artificial the problem of marking language material.
A kind of bulk composition extracting method of medical field provided in an embodiment of the present invention, obtains language material to be extracted;Again The language material to be extracted is matched respectively with the multiple template to prestore, obtain the template that the match is successful and the matching into Matching result corresponding to the template of work(;Then judge whether the template that the match is successful meets default extraction conditions, if satisfied, Obtain and meet matching result corresponding to the template that the match is successful of default extraction conditions, to be used as the language material to be extracted Corresponding bulk composition, language material is matched by template with this, extract the bulk composition of language material, realized simply, rapidly and efficiently.
Referring to Fig. 5, the embodiments of the invention provide a kind of bulk composition extraction element 400 of medical field, the dress Putting 400 can include:Language material acquiring unit 410, mark unit 420, seed pattern obtaining unit 430, expansion unit are described 440th, marking unit 450, template obtaining unit 460, first acquisition unit 470, matching unit 480 and second acquisition unit 490.
Language material acquiring unit 410 is described, for obtaining multiple description language materials.
Unit 420 is marked, for being labeled respectively to the multiple description language material, it is each right to obtain the description language material The annotation results answered.
Seed pattern obtaining unit 430, for using it is described description language material as input, according to it is described describe language material corresponding to Annotation results obtain seed pattern corresponding to the description language material.
Expansion unit 440, for by a variety of default extended modes, expanding the seed pattern, being expanded Multiple pending templates after filling.
Marking unit 450, for being given a mark according to longest common subsequence algorithm to the multiple pending template, is obtained Obtain the multiple pending each self-corresponding score of template.
Unit 450 of giving a mark can include marking subelement 451.
Marking subelement 451, for each pending template, being traveled through successively by longest common subsequence algorithm described Multiple description language materials, obtain most long matching sequence corresponding to pending template described in the multiple description language material;Treated described Most long matching sequence corresponding to processing template is corresponding with the most long matching sequence to describe standard results progress corresponding to language material Contrast and given a mark according to comparing result, obtain score corresponding to the pending template;Based on above-mentioned steps, obtain the multiple Pending each self-corresponding score of template.
Template obtaining unit 460, if being more than predetermined threshold value for score corresponding to the pending template, obtain described big In institute's template to be handled of predetermined threshold value, with the multiple template to be prestored described in acquisition.
First acquisition unit 470, for obtaining language material to be extracted.
Matching unit 480, for the language material to be extracted to be matched respectively with the multiple template to prestore, acquisition With matching result corresponding to successful template and the template that the match is successful.
The multiple template to prestore is the multiple template to be sorted according to length to prestore, and matching unit 480 can include matching Subelement 481.
Coupling subelement 481, for by the language material to be extracted with it is described prestore according to length sort multiple moulds Plate is matched respectively in sequence, obtains matching result corresponding to the template and the template that the match is successful that the match is successful.
Second acquisition unit 490, for judging whether the template that the match is successful meets default extraction conditions, if full Foot, matching result corresponding to the template that the match is successful for meeting default extraction conditions is obtained, using as described to be extracted Bulk composition corresponding to language material.
The default extraction conditions can include having a template that the match is successful in the template that the match is successful.Second Acquiring unit 490 can include first and obtain subelement 491.
First obtains subelement 491, for judging whether in the template that the match is successful be a mould that the match is successful Plate, if so, matching result corresponding to the one template that the match is successful of output, using as corresponding to the language material to be extracted Bulk composition.
The default extraction conditions can also include having at least two moulds that the match is successful in the template that the match is successful Plate.Second acquisition unit 490 can include second and obtain subelement 492.
Second obtains subelement 492, and for judging whether to have in the template that the match is successful at least two, the match is successful Template, if so, judge in each self-corresponding matching result of described at least two templates that the match is successful with the presence or absence of output length The most long matching result of degree, if the most long matching result of output length be present, the most long matching result of the output length is obtained, To be used as bulk composition corresponding to the language material to be extracted.
Second acquisition unit 490 can also include the 3rd and obtain the acquisition subelement 494 of subelement 493 and the 4th.
3rd obtains subelement 493, if in the absence of the most long matching result of output length, judging described at least two It whether there is full composition template in the template that the match is successful, if in the presence of full composition template, obtain corresponding to the full composition template Matching result, to be used as bulk composition corresponding to the language material to be extracted.
4th obtains subelement 494, if for full composition template to be not present, obtains described at least two moulds that the match is successful Matching result corresponding to most compact template in plate, to be used as bulk composition corresponding to the language material to be extracted.
Above each unit can be that now, above-mentioned each unit can be stored in memory 102 by software code realization. Above each unit can equally be realized by hardware such as IC chip.
The bulk composition extraction element 400 of medical field provided in an embodiment of the present invention, its realization principle and caused skill Art effect is identical with preceding method embodiment, and to briefly describe, device embodiment part does not refer to part, refers to preceding method Corresponding contents in embodiment.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can also pass through Other modes are realized.Device embodiment described above is only schematical, for example, flow chart and block diagram in accompanying drawing Show the device of multiple embodiments according to the present invention, method and computer program product architectural framework in the cards, Function and operation.At this point, each square frame in flow chart or block diagram can represent the one of a module, program segment or code Part, a part for the module, program segment or code include one or more and are used to realize holding for defined logic function Row instruction.It should also be noted that at some as in the implementation replaced, the function that is marked in square frame can also with different from The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially perform substantially in parallel, they are sometimes It can perform in the opposite order, this is depending on involved function.It is it is also noted that every in block diagram and/or flow chart The combination of individual square frame and block diagram and/or the square frame in flow chart, function or the special base of action as defined in performing can be used Realize, or can be realized with the combination of specialized hardware and computer instruction in the system of hardware.
In addition, each functional module in each embodiment of the present invention can integrate to form an independent portion Point or modules individualism, can also two or more modules be integrated to form an independent part.
If the function is realized in the form of software function module and is used as independent production marketing or in use, can be with It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words The part to be contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, including some instructions are causing a computer equipment (can be People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the present invention. And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.Need Illustrate, herein, such as first and second or the like relational terms be used merely to by an entity or operation with Another entity or operation make a distinction, and not necessarily require or imply between these entities or operation any this reality be present The relation or order on border.Moreover, term " comprising ", "comprising" or its any other variant are intended to the bag of nonexcludability Contain, so that process, method, article or equipment including a series of elements not only include those key elements, but also including The other element being not expressly set out, or also include for this process, method, article or the intrinsic key element of equipment. In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including the key element Process, method, other identical element also be present in article or equipment.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.It should be noted that:Similar label and letter exists Similar terms is represented in following accompanying drawing, therefore, once being defined in a certain Xiang Yi accompanying drawing, is then not required in subsequent accompanying drawing It is further defined and explained.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation make a distinction with another entity or operation, and not necessarily require or imply and deposited between these entities or operation In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to Nonexcludability includes, so that process, method, article or equipment including a series of elements not only will including those Element, but also the other element including being not expressly set out, or it is this process, method, article or equipment also to include Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that Other identical element also be present in process, method, article or equipment including the key element.

Claims (10)

1. the bulk composition extracting method of a kind of medical field, it is characterised in that methods described includes:
Obtain language material to be extracted;
The language material to be extracted is matched respectively with the multiple template to prestore, obtains the template and described that the match is successful With matching result corresponding to successful template;
Judge whether the template that the match is successful meets default extraction conditions, meet default extraction conditions if satisfied, obtaining Matching result corresponding to the template that the match is successful, to be used as bulk composition corresponding to the language material to be extracted.
2. according to the method for claim 1, it is characterised in that the multiple template to prestore is being sorted according to length of prestoring Multiple template, the language material to be extracted is matched respectively with the multiple template to prestore, obtain the template that the match is successful and Matching result corresponding to the template that the match is successful, including:
The language material to be extracted is matched respectively in sequence with the multiple template according to length sequence to prestore, Obtain matching result corresponding to the template and the template that the match is successful that the match is successful.
3. according to the method for claim 1, it is characterised in that the default extraction conditions include the mould that the match is successful There is a template that the match is successful in plate, judge whether the template that the match is successful meets default extracting rule, if satisfied, obtaining Take and meet matching result corresponding to the template that the match is successful of default extraction conditions, to be used as the language material pair to be extracted The bulk composition answered, including:
Judge whether in the template that the match is successful be a template that the match is successful, if so, the one matching of output into Matching result corresponding to the template of work(, to be used as bulk composition corresponding to the language material to be extracted.
4. according to the method for claim 1, it is characterised in that the match is successful also including described in for the default extraction conditions There are at least two templates that the match is successful in template, judge whether the template that the match is successful meets default extracting rule, if Meet, obtain matching result corresponding to the template that the match is successful for meeting default extraction conditions, using as described to be extracted Language material corresponding to bulk composition, in addition to:
Judge whether there are at least two templates that the match is successful in the template that the match is successful, if so, at least two described in judging With the presence or absence of the matching result that output length is most long in individual each self-corresponding matching result of the template that the match is successful, if output be present The most long matching result of length, the most long matching result of the output length is obtained, to be corresponded to as the language material to be extracted Bulk composition.
5. according to the method for claim 4, it is characterised in that judge that described at least two templates that the match is successful are each right In the matching result answered after the matching result most long with the presence or absence of length is exported, methods described also includes:
If in the absence of the most long matching result of output length, judge in described at least two templates that the match is successful with the presence or absence of complete Composition template, if in the presence of full composition template, matching result corresponding to the full composition template is obtained, using as described to be extracted Bulk composition corresponding to language material.
6. according to the method for claim 5, it is characterised in that judge in described at least two templates that the match is successful whether After full composition template, methods described also includes:
If full composition template is not present, matching knot corresponding to most compact template in described at least two templates that the match is successful is obtained Fruit, to be used as bulk composition corresponding to the language material to be extracted.
7. according to the method for claim 1, it is characterised in that before obtaining language material to be extracted, methods described also includes:
Obtain multiple description language materials;
The multiple description language material is labeled respectively, obtains description each self-corresponding annotation results of language material;
Using the description language material as input, it is corresponding to obtain the description language material according to annotation results corresponding to the description language material Seed pattern;
By a variety of default extended modes, the seed pattern is expanded, multiple pending templates after being expanded;
The multiple pending template is given a mark according to longest common subsequence algorithm, obtains the multiple pending template Each self-corresponding score;
If score corresponding to the pending template is more than predetermined threshold value, institute's mould to be handled more than predetermined threshold value is obtained Plate, with the multiple template to be prestored described in acquisition.
8. according to the method for claim 7, it is characterised in that a variety of default extended modes are expansion fashion, received Contracting mode, Total continuity mode and full composition mode.
9. according to the method for claim 7, it is characterised in that wait to locate to the multiple according to longest common subsequence algorithm Reason template is given a mark, and obtains the multiple pending each self-corresponding score of template, including:
To each pending template, the multiple description language material is traveled through successively by longest common subsequence algorithm, described in acquisition Most long matching sequence corresponding to pending template described in multiple description language materials;
By corresponding to the pending template most it is long matching sequence with it is described most it is long matching sequence it is corresponding describe language material corresponding to Standard results are contrasted and given a mark according to comparing result, obtain score corresponding to the pending template;
Based on above-mentioned steps, the multiple pending each self-corresponding score of template is obtained.
10. the bulk composition extraction element of a kind of medical field, it is characterised in that described device includes:
First acquisition unit, for obtaining language material to be extracted;
Matching unit, for the language material to be extracted to be matched respectively with the multiple template to prestore, the match is successful for acquisition Template and the template that the match is successful corresponding to matching result;
Second acquisition unit, for judging whether the template that the match is successful meets default extraction conditions, if satisfied, obtaining full Matching result corresponding to the template that the match is successful of the default extraction conditions of foot, using as corresponding to the language material to be extracted Bulk composition.
CN201710705003.9A 2017-08-16 2017-08-16 The bulk composition extracting method and device of medical field Pending CN107480139A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710705003.9A CN107480139A (en) 2017-08-16 2017-08-16 The bulk composition extracting method and device of medical field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710705003.9A CN107480139A (en) 2017-08-16 2017-08-16 The bulk composition extracting method and device of medical field

Publications (1)

Publication Number Publication Date
CN107480139A true CN107480139A (en) 2017-12-15

Family

ID=60598928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710705003.9A Pending CN107480139A (en) 2017-08-16 2017-08-16 The bulk composition extracting method and device of medical field

Country Status (1)

Country Link
CN (1) CN107480139A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628830A (en) * 2018-04-24 2018-10-09 北京京东尚科信息技术有限公司 A kind of method and apparatus of semantics recognition
CN109800219A (en) * 2019-01-18 2019-05-24 广东小天才科技有限公司 Corpus cleaning method and apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101576910A (en) * 2009-05-31 2009-11-11 北京学之途网络科技有限公司 Method and device for identifying product naming entity automatically
CN102368260A (en) * 2011-10-12 2012-03-07 北京百度网讯科技有限公司 Method and device of producing domain required template
CN104134017A (en) * 2014-07-18 2014-11-05 华南理工大学 Protein interaction relationship pair extraction method based on compact character representation
CN106910501A (en) * 2017-02-27 2017-06-30 腾讯科技(深圳)有限公司 Text entities extracting method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101576910A (en) * 2009-05-31 2009-11-11 北京学之途网络科技有限公司 Method and device for identifying product naming entity automatically
CN102368260A (en) * 2011-10-12 2012-03-07 北京百度网讯科技有限公司 Method and device of producing domain required template
CN104134017A (en) * 2014-07-18 2014-11-05 华南理工大学 Protein interaction relationship pair extraction method based on compact character representation
CN106910501A (en) * 2017-02-27 2017-06-30 腾讯科技(深圳)有限公司 Text entities extracting method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628830A (en) * 2018-04-24 2018-10-09 北京京东尚科信息技术有限公司 A kind of method and apparatus of semantics recognition
CN108628830B (en) * 2018-04-24 2022-04-12 北京汇钧科技有限公司 Semantic recognition method and device
CN109800219A (en) * 2019-01-18 2019-05-24 广东小天才科技有限公司 Corpus cleaning method and apparatus

Similar Documents

Publication Publication Date Title
CN110442841B (en) Resume identification method and device, computer equipment and storage medium
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language
CN111813905B (en) Corpus generation method, corpus generation device, computer equipment and storage medium
CN108629043A (en) Extracting method, device and the storage medium of webpage target information
CN109165384A (en) A kind of name entity recognition method and device
CN106874279A (en) Generate the method and device of applicating category label
CN111523324A (en) Training method and device for named entity recognition model
CN110874534B (en) Data processing method and data processing device
US20220156611A1 (en) Method and apparatus for entering information, electronic device, computer readable storage medium
CN109522338A (en) Clinical term method for digging, device, electronic equipment and computer-readable medium
CN113111162A (en) Department recommendation method and device, electronic equipment and storage medium
CN111191012A (en) Knowledge graph generating device, method and computer program product thereof
CN113626576B (en) Method, device, terminal and storage medium for extracting relational features in remote supervision
CN107330009A (en) Descriptor disaggregated model creation method, creating device and storage medium
CN112328655A (en) Text label mining method, device, equipment and storage medium
CN110600094A (en) Intelligent writing method and system for electronic medical record
CN112035614A (en) Test set generation method and device, computer equipment and storage medium
CN111161861A (en) Short text data processing method and device for hospital logistics operation and maintenance
CN116882414A (en) Automatic comment generation method and related device based on large-scale language model
CN113688268A (en) Picture information extraction method and device, computer equipment and storage medium
CN107480139A (en) The bulk composition extracting method and device of medical field
CN114595330B (en) Industrial chain construction method, equipment and storage medium
CN112347150B (en) A scholar's academic label labeling method, device and electronic device
CN112632260A (en) Intelligent question and answer method and device, electronic equipment and computer readable storage medium
CN116127100A (en) Knowledge graph construction method, device, equipment and storage medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171215

WD01 Invention patent application deemed withdrawn after publication