CN107480139A - The bulk composition extracting method and device of medical field - Google Patents
The bulk composition extracting method and device of medical field Download PDFInfo
- Publication number
- CN107480139A CN107480139A CN201710705003.9A CN201710705003A CN107480139A CN 107480139 A CN107480139 A CN 107480139A CN 201710705003 A CN201710705003 A CN 201710705003A CN 107480139 A CN107480139 A CN 107480139A
- Authority
- CN
- China
- Prior art keywords
- template
- successful
- match
- language material
- extracted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
Abstract
The bulk composition extracting method and device of a kind of medical field provided in an embodiment of the present invention, are related to medical field.Methods described includes obtaining language material to be extracted;The language material to be extracted is matched respectively with the multiple template to prestore again, obtains matching result corresponding to the template and the template that the match is successful that the match is successful;Then judge whether the template that the match is successful meets default extraction conditions, if meet, obtain and meet matching result corresponding to the template that the match is successful of default extraction conditions, to be used as bulk composition corresponding to the language material to be extracted, language material is matched by template with this, the bulk composition of language material is extracted, is realized simply, rapidly and efficiently.
Description
Technical field
The present invention relates to medical field, in particular to the bulk composition extracting method and device of a kind of medical field.
Background technology
In the medical field, machine can automatically identify the related main body section of medical treatment in user's description and correspondingly
State (for example " I had a stomachache yesterday ", corresponding main body section and state (bulk composition) they are " stomach-ache ").This
Kind identification process is referred to as the extraction of medical bodies composition, the category that relation belonging to extracts.Relation extraction uses base in the prior art
Description rule is defined to extract relation by domain expert in the abstracting method of rule, this method needs a large amount of artificial mark numbers
According to, and it is difficult in adapt to frontier;Occur the Relation extraction method based on machine learning, this method flow complexity etc. afterwards.
The content of the invention
It is an object of the invention to provide the bulk composition extracting method and device of a kind of medical field, to improve above-mentioned ask
Topic.To achieve these goals, the technical scheme that the present invention takes is as follows:
In a first aspect, the embodiments of the invention provide a kind of bulk composition extracting method of medical field, methods described bag
Include:Obtain language material to be extracted;The language material to be extracted is matched respectively with the multiple template to prestore, obtain matching into
Matching result corresponding to the template of work(and the template that the match is successful;Judge whether the template that the match is successful meets to preset
Extraction conditions, if satisfied, obtain matching result corresponding to the template that the match is successful for meeting default extraction conditions, using as
Bulk composition corresponding to the language material to be extracted.
Second aspect, the embodiments of the invention provide a kind of bulk composition extraction element of medical field, described device bag
Include first acquisition unit, matching unit and second acquisition unit.First acquisition unit, for obtaining language material to be extracted.Matching
Unit, for the language material to be extracted to be matched respectively with the multiple template to prestore, obtain the template that the match is successful and
Matching result corresponding to the template that the match is successful.Second acquisition unit, for whether judging the template that the match is successful
Meet default extraction conditions, if satisfied, obtaining matching knot corresponding to the template that the match is successful for meeting default extraction conditions
Fruit, to be used as bulk composition corresponding to the language material to be extracted.
The bulk composition extracting method and device of a kind of medical field provided in an embodiment of the present invention, obtain language to be extracted
Material;The language material to be extracted is matched respectively with the multiple template to prestore again, obtains the template that the match is successful and described
Matching result corresponding to the template that the match is successful;Then judge whether the template that the match is successful meets default extraction conditions,
If satisfied, matching result corresponding to the template that the match is successful for meeting default extraction conditions is obtained, to wait to carry as described
Bulk composition corresponding to the language material taken, language material is matched by template with this, extracts the bulk composition of language material, realized simply,
Rapidly and efficiently.
Other features and advantages of the present invention will illustrate in subsequent specification, also, partly become from specification
It is clear that or by implementing understanding of the embodiment of the present invention.The purpose of the present invention and other advantages can be by saying what is write
Specifically noted structure is realized and obtained in bright book, claims and accompanying drawing.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by embodiment it is required use it is attached
Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair
The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this
A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 is the structured flowchart of electronic equipment provided in an embodiment of the present invention;
Fig. 2 is the flow chart of the bulk composition extracting method of medical field provided in an embodiment of the present invention;
Fig. 3 is the S220 sub-process figures of the bulk composition extracting method of medical field provided in an embodiment of the present invention;
Fig. 4 is to obtain the multiple template to prestore in the bulk composition extracting method of medical field provided in an embodiment of the present invention
Flow chart;
Fig. 5 is the structured flowchart of the bulk composition extraction element of medical field provided in an embodiment of the present invention.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, rather than whole embodiments.The present invention implementation being generally described and illustrated herein in the accompanying drawings
The component of example can be configured to arrange and design with a variety of.Therefore, the reality of the invention to providing in the accompanying drawings below
The detailed description for applying example is not intended to limit the scope of claimed invention, but is merely representative of the selected implementation of the present invention
Example.Based on the embodiment in the present invention, what those of ordinary skill in the art were obtained under the premise of creative work is not made
Every other embodiment, belongs to the scope of protection of the invention.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi
It is defined, then it further need not be defined and explained in subsequent accompanying drawing in individual accompanying drawing.Meanwhile the present invention's
In description, term " first ", " second " etc. are only used for distinguishing description, and it is not intended that instruction or hint relative importance.
Fig. 1 shows a kind of structured flowchart for the electronic equipment 100 that can be applied in the embodiment of the present invention.As shown in figure 1,
Electronic equipment 100 can include memory 102, storage control 104, one or more (one is only shown in Fig. 1) processors
106th, Peripheral Interface 108, input/output module 110, audio-frequency module 112, display module 114, radio-frequency module 116 and medical field
Bulk composition extraction element.
Memory 102, storage control 104, processor 106, Peripheral Interface 108, input/output module 110, audio mould
Directly or indirectly electrically connected between block 112, display module 114,116 each element of radio-frequency module, with realize the transmission of data or
Interaction.For example, electrical connection can be realized by one or more communication bus or signal bus between these elements.Medical field
Bulk composition extracting method include at least one storage being stored in the form of software or firmware (firmware) respectively
Software function module in device 102, for example, the software function module that includes of bulk composition extraction element of the medical field or
Computer program.
Memory 102 can store various software programs and module, such as the medical field that the embodiment of the present application provides
Programmed instruction/module corresponding to bulk composition extracting method and device.Processor 106 is by running storage in the memory 102
Software program and module, so as to perform various function application and data processing, that is, realize the doctor in the embodiment of the present application
The bulk composition extracting method in treatment field.
Memory 102 can include but is not limited to random access memory (Random Access Memory, RAM), only
Read memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only
Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM),
Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..
Processor 106 can be a kind of IC chip, have signal handling capacity.Above-mentioned processor can be general
Processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network
Processor, abbreviation NP) etc.;It can also be digital signal processor (DSP), application specific integrated circuit (ASIC), ready-made programmable
Gate array (FPGA) either other PLDs, discrete gate or transistor logic, discrete hardware components.It can
To realize or perform disclosed each method, step and the logic diagram in the embodiment of the present application.General processor can be micro-
Processor or the processor can also be any conventional processors etc..
Various input/output devices are coupled to processor 106 and memory 102 by the Peripheral Interface 108.At some
In embodiment, Peripheral Interface 108, processor 106 and storage control 104 can be realized in one single chip.Other one
In a little examples, they can be realized by independent chip respectively.
Input/output module 110 is used to be supplied to user input data to realize interacting for user and electronic equipment 100.It is described
Input/output module 110 may be, but not limited to, mouse and keyboard etc..
Audio-frequency module 112 provides a user COBBAIF, and it may include one or more microphones, one or more raises
Sound device and voicefrequency circuit.
Display module 114 provides an interactive interface (such as user interface) between electronic equipment 100 and user
Or referred to for display image data to user.In the present embodiment, the display module 114 can be liquid crystal display or touch
Control display.If touch control display, it can be that the capacitance type touch control screen or resistance-type for supporting single-point and multi-point touch operation touch
Control screen etc..Single-point and multi-point touch operation is supported to refer to that touch control display can sense on the touch control display one or more
Individual opening position is with caused touch control operation, and the touch control operation that this is sensed transfers to processor 106 to be calculated and handled.
Radio-frequency module 116 is used to receiving and sending electromagnetic wave, realizes the mutual conversion of electromagnetic wave and electric signal, so that with
Communication network or other equipment are communicated.
It is appreciated that structure shown in Fig. 1 is only to illustrate, electronic equipment 100 may also include it is more more than shown in Fig. 1 or
Less component, or there is the configuration different from shown in Fig. 1.Each component shown in Fig. 1 can use hardware, software or its
Combination is realized.
In the embodiment of the present invention, electronic equipment 100 can be used as user terminal, or as server.User terminal
Can be PC (personal computer) computer, tablet personal computer, mobile phone, notebook computer, intelligent television, set top box, vehicle-mounted
The terminal devices such as terminal.
First embodiment
Referring to Fig. 2, the embodiments of the invention provide a kind of bulk composition extracting method of medical field, methods described can
With including:Step S200, step S210, step S220.
Step S200:Obtain language material to be extracted.
Step S210:The language material to be extracted is matched respectively with the multiple template to prestore, the match is successful for acquisition
Template and the template that the match is successful corresponding to matching result.
As a kind of embodiment, the multiple template to prestore is the multiple template to be sorted according to length to prestore, based on step
Rapid S210, further, by the language material to be extracted with it is described prestore according to length sequence multiple template in sequence
Matched respectively, obtain matching result corresponding to the template and the template that the match is successful that the match is successful.In the present embodiment
In, can be by longest common subsequence Longest Common Subsequence, LCS) algorithm is to the language to be extracted
Expect to be matched with the multiple template to prestore.
In the present embodiment, the multiple template according to length sequence that prestores can be prestore according to length from length
To short or from the multiple template for being short to long sequence.For example, the multiple template to prestore can be grouped, then wait to carry by described
The language material taken is matched successively with the multiple template of the sequence according to length from long to short to prestore, and the match is successful for acquisition
Template and the template that the match is successful corresponding to matching result.
Step S220:Judge whether the template that the match is successful meets default extraction conditions, if satisfied, it is pre- to obtain satisfaction
If matching result corresponding to the template that the match is successful of extraction conditions, to be used as main body corresponding to the language material to be extracted
Composition.
As a kind of embodiment, the default extraction conditions can include having one in the template that the match is successful
With successful template, referring to Fig. 3, step S220 can include sub-step S221.
Sub-step S221:Judge whether in the template that the match is successful be a template that the match is successful, if so, output
Matching result corresponding to one template that the match is successful, to be used as bulk composition corresponding to the language material to be extracted.
As another embodiment, the default extraction conditions can also include having in the template that the match is successful to
Few two templates that the match is successful, referring to Fig. 3, step S220 can also include sub-step S222.
Sub-step S222:Judge whether there are at least two templates that the match is successful in the template that the match is successful, if so,
Judge in each self-corresponding matching result of described at least two templates that the match is successful with the presence or absence of the matching that output length is most long
As a result, if the most long matching result of output length be present, the most long matching result of the output length is obtained, to be treated as described
Bulk composition corresponding to the language material of extraction.
Further, based on sub-step S222, sub-step S223:If in the absence of the most long matching result of output length, sentence
It whether there is full composition template in disconnected described at least two templates that the match is successful, if in the presence of full composition template, obtain described complete
Matching result corresponding to composition template, to be used as bulk composition corresponding to the language material to be extracted.
Further, based on sub-step S222, sub-step S224:If full composition template is not present, at least two described in acquisition
Matching result corresponding to most compact template in the individual template that the match is successful, using as main body corresponding to the language material to be extracted into
Point.
In the present embodiment, most compact template can be defined as keyword in sentence and be close to nearest template.
In addition, being based on sub-step S224, if in the absence of most compact template in described at least two templates that the match is successful, obtain
Matching result corresponding to the template of the first position in the template that the match is successful is taken, to be used as the language material pair to be extracted
The bulk composition answered.
Referring to Fig. 4, further, in order to obtain the multiple template to prestore, before step S200, methods described may be used also
With including step S300, step S310, step S320, step S330, step S340 and step S350.
Step S300:Obtain multiple description language materials.
In the present embodiment, the description language material can be patient's description information.Crawled from the medical information website of specialty
Doctor and patient talk with, and then patient's description information (sentence text) is cleaned, and are segmented and part-of-speech tagging, than
Such as, " I had a stomachache yesterday " through participle and part-of-speech tagging after, be changed into " I/x yesterdays/t bellies/nmbw a little/nmm pains/
Nmz ", language material is in units of sentence.
Step S310:The multiple description language material is labeled respectively, obtains description each self-corresponding mark of language material
Note result.
In the present embodiment, in the description language material after processing, often row is that in short, then the description language material after processing is entered
Rower is noted, and marks out the bulk composition that should be extracted corresponding to each sentence (in units of word).With " I/x yesterdays/t bellies/
Nmbw a little/nmm pains/nmz " exemplified by, its corresponding bulk composition is probably " belly/nmbw pains/nmz ".
Step S320:Using the description language material as input, institute is obtained according to annotation results corresponding to the description language material
State seed pattern corresponding to description language material.
In the present embodiment, template should include following several key elements:Matching unit, template content, matching result, template
Mark.Wherein matching unit refers mainly to the particular content of each unit in template, such as the group of word or part of speech or word and part of speech
Close;Template content is the sequence that template will match, and matching result is the composition of template output.Such as " [belly/nmbw,
A little/nmm, pain/nmz]==>02;0;1 " is a template, and concrete meaning is:Modular unit is word+part of speech, template will
The content matched somebody with somebody for " belly/nmbw a little/nmm pains/nmz ", the composition of output is " belly/nmbw pains/nmz " (0 2);Main body is
" belly/nmbw ", template mark are " 1 ", are shown to be the template of a full composition.Using the description language material as input, according to
Annotation results corresponding to the description language material obtain seed pattern corresponding to the description language material.This kind of seed pattern is by artificial
Mark the template of generation.
Step S330:By a variety of default extended modes, the seed pattern is expanded, it is more after being expanded
Individual pending template.
A variety of default extended modes can be expansion fashion, shrink mode, Total continuity mode and full composition mode.
Specifically, because mark language material is smaller, its corresponding seed pattern can not cover a large amount of descriptions in real world
Information, so needing to expand template.
Template, which expands, uses following four mode:
Expansion fashion:For the mark sentence (former sentence) of each seed pattern and generation seed pattern:It is X to remember former sentence
=[x1, x2 ..., xn] (n is former sentence length), seed pattern are S=[s1, s2 ..., sj] (j is seed pattern length), first
Position (p1, p2) of the head and the tail unit in former sentence in grappling seed pattern, then selection is all
(k ∈ [max (p1-2,0), min (p2+2, n)],) as new template.
Contraction mode:To the template and seed pattern after being handled by expansion fashion, one or more nonbodies are filtered out
Composition (pressing part of speech) generates new template.
Total continuity mode:The each template respectively generated to seed pattern and expansion fashion, contraction mode, form one and own
The whole continuous template in position of the composition in former sentence.
Full composition mode:The each template respectively generated to seed pattern and expansion fashion, contraction mode, form one and include
The template of all bulk compositions in former sentence.
Step S340:The multiple pending template is given a mark according to longest common subsequence algorithm, described in acquisition
Multiple each self-corresponding scores of pending template.
Based on step S340, further, to each pending template, traveled through successively by longest common subsequence algorithm
The multiple description language material, obtain most long matching sequence corresponding to pending template described in the multiple description language material;By institute
State corresponding to pending template that most long matching sequence is corresponding with the most long matching sequence to describe standard results corresponding to language material
Contrasted and given a mark according to comparing result, obtain score corresponding to the pending template;Based on above-mentioned steps, described in acquisition
Multiple each self-corresponding scores of pending template.
For example, on the basis of longest common subsequence (Longest Common Subsequence, LCS) algorithm, root
A compact longest common subsequence (Compact Longest Common is realized according to own service demand
Subsequence, CLCS) algorithm, the algorithm can return to most compact sequence in the case where there is multiple LCS.The algorithm can obtain
To template and the most long matching sequence of sentence, then given a mark according to match condition.To each template, mark is traveled through successively
All data (sentence) in language material, if (matching standard is template CLCS length and template to a sentence in template matches
Whether length is equal), just contrasted, given a mark according to the content of matching result and annotation results.Normalizing can also finally be carried out
Change.
Step S350:If score corresponding to the pending template is more than predetermined threshold value, it is more than predetermined threshold value described in acquisition
Institute's template to be handled, with the multiple template to be prestored described in acquisition.
The predetermined threshold value can be set according to actual conditions.
It is possible to further sorted-by-length to the multiple template to prestore, so as to the longer template of priority match length.
This, on word and part of speech basis, by template self study, with reference to template extended technology, can also obtain under small sample
Preferable effect, avoid a large amount of artificial the problem of marking language material.
A kind of bulk composition extracting method of medical field provided in an embodiment of the present invention, obtains language material to be extracted;Again
The language material to be extracted is matched respectively with the multiple template to prestore, obtain the template that the match is successful and the matching into
Matching result corresponding to the template of work(;Then judge whether the template that the match is successful meets default extraction conditions, if satisfied,
Obtain and meet matching result corresponding to the template that the match is successful of default extraction conditions, to be used as the language material to be extracted
Corresponding bulk composition, language material is matched by template with this, extract the bulk composition of language material, realized simply, rapidly and efficiently.
Referring to Fig. 5, the embodiments of the invention provide a kind of bulk composition extraction element 400 of medical field, the dress
Putting 400 can include:Language material acquiring unit 410, mark unit 420, seed pattern obtaining unit 430, expansion unit are described
440th, marking unit 450, template obtaining unit 460, first acquisition unit 470, matching unit 480 and second acquisition unit 490.
Language material acquiring unit 410 is described, for obtaining multiple description language materials.
Unit 420 is marked, for being labeled respectively to the multiple description language material, it is each right to obtain the description language material
The annotation results answered.
Seed pattern obtaining unit 430, for using it is described description language material as input, according to it is described describe language material corresponding to
Annotation results obtain seed pattern corresponding to the description language material.
Expansion unit 440, for by a variety of default extended modes, expanding the seed pattern, being expanded
Multiple pending templates after filling.
Marking unit 450, for being given a mark according to longest common subsequence algorithm to the multiple pending template, is obtained
Obtain the multiple pending each self-corresponding score of template.
Unit 450 of giving a mark can include marking subelement 451.
Marking subelement 451, for each pending template, being traveled through successively by longest common subsequence algorithm described
Multiple description language materials, obtain most long matching sequence corresponding to pending template described in the multiple description language material;Treated described
Most long matching sequence corresponding to processing template is corresponding with the most long matching sequence to describe standard results progress corresponding to language material
Contrast and given a mark according to comparing result, obtain score corresponding to the pending template;Based on above-mentioned steps, obtain the multiple
Pending each self-corresponding score of template.
Template obtaining unit 460, if being more than predetermined threshold value for score corresponding to the pending template, obtain described big
In institute's template to be handled of predetermined threshold value, with the multiple template to be prestored described in acquisition.
First acquisition unit 470, for obtaining language material to be extracted.
Matching unit 480, for the language material to be extracted to be matched respectively with the multiple template to prestore, acquisition
With matching result corresponding to successful template and the template that the match is successful.
The multiple template to prestore is the multiple template to be sorted according to length to prestore, and matching unit 480 can include matching
Subelement 481.
Coupling subelement 481, for by the language material to be extracted with it is described prestore according to length sort multiple moulds
Plate is matched respectively in sequence, obtains matching result corresponding to the template and the template that the match is successful that the match is successful.
Second acquisition unit 490, for judging whether the template that the match is successful meets default extraction conditions, if full
Foot, matching result corresponding to the template that the match is successful for meeting default extraction conditions is obtained, using as described to be extracted
Bulk composition corresponding to language material.
The default extraction conditions can include having a template that the match is successful in the template that the match is successful.Second
Acquiring unit 490 can include first and obtain subelement 491.
First obtains subelement 491, for judging whether in the template that the match is successful be a mould that the match is successful
Plate, if so, matching result corresponding to the one template that the match is successful of output, using as corresponding to the language material to be extracted
Bulk composition.
The default extraction conditions can also include having at least two moulds that the match is successful in the template that the match is successful
Plate.Second acquisition unit 490 can include second and obtain subelement 492.
Second obtains subelement 492, and for judging whether to have in the template that the match is successful at least two, the match is successful
Template, if so, judge in each self-corresponding matching result of described at least two templates that the match is successful with the presence or absence of output length
The most long matching result of degree, if the most long matching result of output length be present, the most long matching result of the output length is obtained,
To be used as bulk composition corresponding to the language material to be extracted.
Second acquisition unit 490 can also include the 3rd and obtain the acquisition subelement 494 of subelement 493 and the 4th.
3rd obtains subelement 493, if in the absence of the most long matching result of output length, judging described at least two
It whether there is full composition template in the template that the match is successful, if in the presence of full composition template, obtain corresponding to the full composition template
Matching result, to be used as bulk composition corresponding to the language material to be extracted.
4th obtains subelement 494, if for full composition template to be not present, obtains described at least two moulds that the match is successful
Matching result corresponding to most compact template in plate, to be used as bulk composition corresponding to the language material to be extracted.
Above each unit can be that now, above-mentioned each unit can be stored in memory 102 by software code realization.
Above each unit can equally be realized by hardware such as IC chip.
The bulk composition extraction element 400 of medical field provided in an embodiment of the present invention, its realization principle and caused skill
Art effect is identical with preceding method embodiment, and to briefly describe, device embodiment part does not refer to part, refers to preceding method
Corresponding contents in embodiment.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can also pass through
Other modes are realized.Device embodiment described above is only schematical, for example, flow chart and block diagram in accompanying drawing
Show the device of multiple embodiments according to the present invention, method and computer program product architectural framework in the cards,
Function and operation.At this point, each square frame in flow chart or block diagram can represent the one of a module, program segment or code
Part, a part for the module, program segment or code include one or more and are used to realize holding for defined logic function
Row instruction.It should also be noted that at some as in the implementation replaced, the function that is marked in square frame can also with different from
The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially perform substantially in parallel, they are sometimes
It can perform in the opposite order, this is depending on involved function.It is it is also noted that every in block diagram and/or flow chart
The combination of individual square frame and block diagram and/or the square frame in flow chart, function or the special base of action as defined in performing can be used
Realize, or can be realized with the combination of specialized hardware and computer instruction in the system of hardware.
In addition, each functional module in each embodiment of the present invention can integrate to form an independent portion
Point or modules individualism, can also two or more modules be integrated to form an independent part.
If the function is realized in the form of software function module and is used as independent production marketing or in use, can be with
It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words
The part to be contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter
Calculation machine software product is stored in a storage medium, including some instructions are causing a computer equipment (can be
People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the present invention.
And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.Need
Illustrate, herein, such as first and second or the like relational terms be used merely to by an entity or operation with
Another entity or operation make a distinction, and not necessarily require or imply between these entities or operation any this reality be present
The relation or order on border.Moreover, term " comprising ", "comprising" or its any other variant are intended to the bag of nonexcludability
Contain, so that process, method, article or equipment including a series of elements not only include those key elements, but also including
The other element being not expressly set out, or also include for this process, method, article or the intrinsic key element of equipment.
In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including the key element
Process, method, other identical element also be present in article or equipment.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should be included in the scope of the protection.It should be noted that:Similar label and letter exists
Similar terms is represented in following accompanying drawing, therefore, once being defined in a certain Xiang Yi accompanying drawing, is then not required in subsequent accompanying drawing
It is further defined and explained.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained
Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality
Body or operation make a distinction with another entity or operation, and not necessarily require or imply and deposited between these entities or operation
In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to
Nonexcludability includes, so that process, method, article or equipment including a series of elements not only will including those
Element, but also the other element including being not expressly set out, or it is this process, method, article or equipment also to include
Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that
Other identical element also be present in process, method, article or equipment including the key element.
Claims (10)
1. the bulk composition extracting method of a kind of medical field, it is characterised in that methods described includes:
Obtain language material to be extracted;
The language material to be extracted is matched respectively with the multiple template to prestore, obtains the template and described that the match is successful
With matching result corresponding to successful template;
Judge whether the template that the match is successful meets default extraction conditions, meet default extraction conditions if satisfied, obtaining
Matching result corresponding to the template that the match is successful, to be used as bulk composition corresponding to the language material to be extracted.
2. according to the method for claim 1, it is characterised in that the multiple template to prestore is being sorted according to length of prestoring
Multiple template, the language material to be extracted is matched respectively with the multiple template to prestore, obtain the template that the match is successful and
Matching result corresponding to the template that the match is successful, including:
The language material to be extracted is matched respectively in sequence with the multiple template according to length sequence to prestore,
Obtain matching result corresponding to the template and the template that the match is successful that the match is successful.
3. according to the method for claim 1, it is characterised in that the default extraction conditions include the mould that the match is successful
There is a template that the match is successful in plate, judge whether the template that the match is successful meets default extracting rule, if satisfied, obtaining
Take and meet matching result corresponding to the template that the match is successful of default extraction conditions, to be used as the language material pair to be extracted
The bulk composition answered, including:
Judge whether in the template that the match is successful be a template that the match is successful, if so, the one matching of output into
Matching result corresponding to the template of work(, to be used as bulk composition corresponding to the language material to be extracted.
4. according to the method for claim 1, it is characterised in that the match is successful also including described in for the default extraction conditions
There are at least two templates that the match is successful in template, judge whether the template that the match is successful meets default extracting rule, if
Meet, obtain matching result corresponding to the template that the match is successful for meeting default extraction conditions, using as described to be extracted
Language material corresponding to bulk composition, in addition to:
Judge whether there are at least two templates that the match is successful in the template that the match is successful, if so, at least two described in judging
With the presence or absence of the matching result that output length is most long in individual each self-corresponding matching result of the template that the match is successful, if output be present
The most long matching result of length, the most long matching result of the output length is obtained, to be corresponded to as the language material to be extracted
Bulk composition.
5. according to the method for claim 4, it is characterised in that judge that described at least two templates that the match is successful are each right
In the matching result answered after the matching result most long with the presence or absence of length is exported, methods described also includes:
If in the absence of the most long matching result of output length, judge in described at least two templates that the match is successful with the presence or absence of complete
Composition template, if in the presence of full composition template, matching result corresponding to the full composition template is obtained, using as described to be extracted
Bulk composition corresponding to language material.
6. according to the method for claim 5, it is characterised in that judge in described at least two templates that the match is successful whether
After full composition template, methods described also includes:
If full composition template is not present, matching knot corresponding to most compact template in described at least two templates that the match is successful is obtained
Fruit, to be used as bulk composition corresponding to the language material to be extracted.
7. according to the method for claim 1, it is characterised in that before obtaining language material to be extracted, methods described also includes:
Obtain multiple description language materials;
The multiple description language material is labeled respectively, obtains description each self-corresponding annotation results of language material;
Using the description language material as input, it is corresponding to obtain the description language material according to annotation results corresponding to the description language material
Seed pattern;
By a variety of default extended modes, the seed pattern is expanded, multiple pending templates after being expanded;
The multiple pending template is given a mark according to longest common subsequence algorithm, obtains the multiple pending template
Each self-corresponding score;
If score corresponding to the pending template is more than predetermined threshold value, institute's mould to be handled more than predetermined threshold value is obtained
Plate, with the multiple template to be prestored described in acquisition.
8. according to the method for claim 7, it is characterised in that a variety of default extended modes are expansion fashion, received
Contracting mode, Total continuity mode and full composition mode.
9. according to the method for claim 7, it is characterised in that wait to locate to the multiple according to longest common subsequence algorithm
Reason template is given a mark, and obtains the multiple pending each self-corresponding score of template, including:
To each pending template, the multiple description language material is traveled through successively by longest common subsequence algorithm, described in acquisition
Most long matching sequence corresponding to pending template described in multiple description language materials;
By corresponding to the pending template most it is long matching sequence with it is described most it is long matching sequence it is corresponding describe language material corresponding to
Standard results are contrasted and given a mark according to comparing result, obtain score corresponding to the pending template;
Based on above-mentioned steps, the multiple pending each self-corresponding score of template is obtained.
10. the bulk composition extraction element of a kind of medical field, it is characterised in that described device includes:
First acquisition unit, for obtaining language material to be extracted;
Matching unit, for the language material to be extracted to be matched respectively with the multiple template to prestore, the match is successful for acquisition
Template and the template that the match is successful corresponding to matching result;
Second acquisition unit, for judging whether the template that the match is successful meets default extraction conditions, if satisfied, obtaining full
Matching result corresponding to the template that the match is successful of the default extraction conditions of foot, using as corresponding to the language material to be extracted
Bulk composition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710705003.9A CN107480139A (en) | 2017-08-16 | 2017-08-16 | The bulk composition extracting method and device of medical field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710705003.9A CN107480139A (en) | 2017-08-16 | 2017-08-16 | The bulk composition extracting method and device of medical field |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107480139A true CN107480139A (en) | 2017-12-15 |
Family
ID=60598928
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710705003.9A Pending CN107480139A (en) | 2017-08-16 | 2017-08-16 | The bulk composition extracting method and device of medical field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107480139A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108628830A (en) * | 2018-04-24 | 2018-10-09 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus of semantics recognition |
CN109800219A (en) * | 2019-01-18 | 2019-05-24 | 广东小天才科技有限公司 | A kind of method and apparatus of corpus cleaning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101576910A (en) * | 2009-05-31 | 2009-11-11 | 北京学之途网络科技有限公司 | Method and device for identifying product naming entity automatically |
CN102368260A (en) * | 2011-10-12 | 2012-03-07 | 北京百度网讯科技有限公司 | Method and device of producing domain required template |
CN104134017A (en) * | 2014-07-18 | 2014-11-05 | 华南理工大学 | Protein interaction relationship pair extraction method based on compact character representation |
CN106910501A (en) * | 2017-02-27 | 2017-06-30 | 腾讯科技(深圳)有限公司 | Text entities extracting method and device |
-
2017
- 2017-08-16 CN CN201710705003.9A patent/CN107480139A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101576910A (en) * | 2009-05-31 | 2009-11-11 | 北京学之途网络科技有限公司 | Method and device for identifying product naming entity automatically |
CN102368260A (en) * | 2011-10-12 | 2012-03-07 | 北京百度网讯科技有限公司 | Method and device of producing domain required template |
CN104134017A (en) * | 2014-07-18 | 2014-11-05 | 华南理工大学 | Protein interaction relationship pair extraction method based on compact character representation |
CN106910501A (en) * | 2017-02-27 | 2017-06-30 | 腾讯科技(深圳)有限公司 | Text entities extracting method and device |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108628830A (en) * | 2018-04-24 | 2018-10-09 | 北京京东尚科信息技术有限公司 | A kind of method and apparatus of semantics recognition |
CN108628830B (en) * | 2018-04-24 | 2022-04-12 | 北京汇钧科技有限公司 | Semantic recognition method and device |
CN109800219A (en) * | 2019-01-18 | 2019-05-24 | 广东小天才科技有限公司 | A kind of method and apparatus of corpus cleaning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110442841B (en) | Resume identification method and device, computer equipment and storage medium | |
CN108629043A (en) | Extracting method, device and the storage medium of webpage target information | |
CN112131393A (en) | Construction method of medical knowledge map question-answering system based on BERT and similarity algorithm | |
CN106874279A (en) | Generate the method and device of applicating category label | |
CN111191012A (en) | Knowledge graph generation apparatus, method and computer program product thereof | |
CN113722483A (en) | Topic classification method, device, equipment and storage medium | |
CN113111162A (en) | Department recommendation method and device, electronic equipment and storage medium | |
CN108573707A (en) | A kind of processing method of voice recognition result, device, equipment and medium | |
CN110600094A (en) | Intelligent writing method and system for electronic medical record | |
CN110209875A (en) | User content portrait determines method, access object recommendation method and relevant apparatus | |
CN107330009A (en) | Descriptor disaggregated model creation method, creating device and storage medium | |
CN107480139A (en) | The bulk composition extracting method and device of medical field | |
CN112035614A (en) | Test set generation method and device, computer equipment and storage medium | |
CN110874534A (en) | Data processing method and data processing device | |
CN111180077A (en) | Medical and American subject identification method, device, equipment and storage medium | |
CN112800758A (en) | Method, system, equipment and medium for distinguishing similar meaning words in text | |
CN111161861A (en) | Short text data processing method and device for hospital logistics operation and maintenance | |
US20220156611A1 (en) | Method and apparatus for entering information, electronic device, computer readable storage medium | |
CN111597296A (en) | Commodity data processing method, device and system | |
CN108228573A (en) | Text emotion analysis method, device and electronic equipment | |
CN112364131B (en) | Corpus processing method and related device thereof | |
CN114943306A (en) | Intention classification method, device, equipment and storage medium | |
CN114637831A (en) | Data query method based on semantic analysis and related equipment thereof | |
CN113569741A (en) | Answer generation method and device for image test questions, electronic equipment and readable medium | |
CN112347150A (en) | Method and device for labeling academic label of student and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171215 |
|
WD01 | Invention patent application deemed withdrawn after publication |