CN108549639A - Based on the modified Chinese medicine case name recognition methods of multiple features template and system - Google Patents
Based on the modified Chinese medicine case name recognition methods of multiple features template and system Download PDFInfo
- Publication number
- CN108549639A CN108549639A CN201810359240.9A CN201810359240A CN108549639A CN 108549639 A CN108549639 A CN 108549639A CN 201810359240 A CN201810359240 A CN 201810359240A CN 108549639 A CN108549639 A CN 108549639A
- Authority
- CN
- China
- Prior art keywords
- word
- chinese medicine
- feature
- sentence
- character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Abstract
The invention discloses based on the modified Chinese medicine case name recognition methods of multiple features template and system, including step:Sentence extraction is carried out to Chinese medicine case text;Classify to the sentence of extraction;Word segmentation processing is carried out to every a kind of sentence;Character feature, part of speech feature, left deictic words feature, right deictic words feature and term characteristics mark are carried out successively to each word that participle obtains;Build training corpus;Formulate feature templates;Obtained language material and feature templates are input in conditional random field models, conditional random field models are trained, obtain trained conditional random field models;Language material to be predicted is built to Chinese medicine case to be predicted;It using the identification language material of structure as input, is input in trained conditional random field models, the Chinese medicine four methods of diagnosis, card type and therapy are finally identified in the position for exporting Chinese medicine case class label and character according to the position of Chinese medicine case class label and character.
Description
Technical field
The present invention relates to based on the modified Chinese medicine case name recognition methods of multiple features template and system.
Background technology
The committed step of Chinese medicine case text message is can be named entity to the Chinese medicine case text of magnanimity
Identification, and the incidence relation between Chinese medicine four methods of diagnosis four methods of diagnosis text message, card type text message and therapy text message is established,
With the thought of this scientific appraisal " diagnosis and treatment based on an overall analysis of the illness and the patient's condition ".Name Entity recognition in numerous areas extensive use, such as:Field of finance and economics,
Product identification, microblogging text, military text etc..
Name Entity recognition analysis method is broadly divided into two classes:
The first kind is rule-based and dictionary method and the method based on machine learning.Rule-based and dictionary method
The shortcomings that be higher to the dependence of dictionary and rule base, the recognition capability of unregistered word can be greatly reduced;
Second class is the method based on machine learning.Method based on machine learning rapidly and efficiently, has preferable transplanting
Property, common method includes mainly Hidden Markov Model (HMM), the hidden horse model (MEMM) of maximum entropy, conditional random field models
(CRFs).By comparing, CRFs behaving oneself best in terms of the synthesis such as ease for use, stability and accuracy, in output independence
Assuming that and being better than HMM algorithms and MEMM algorithms in terms of the label prejudice problem that is difficult to avoid that.
Invention content
In order to solve the deficiencies in the prior art, the present invention provides known based on the modified Chinese medicine case name of multiple features template
Other method and system;
As the first aspect of the present invention:
Recognition methods is named based on the modified Chinese medicine case of multiple features template, is included the following steps:
Step (1):Sentence extraction is carried out to Chinese medicine case text;
Step (2):Classify to the sentence of extraction;
Step (3):Word segmentation processing is carried out to every a kind of sentence;
Step (4):Each word obtained to participle carries out character feature, part of speech feature, left deictic words feature, the right side successively
Deictic words feature and term characteristics mark;The character feature of all words, part of speech feature, left deictic words feature, right deictic words feature
And term characteristics form language material observation sequence, according to Chinese medicine case class label and BIO labelling methods to each word into rower
Note, generates the output feature of each word;The output feature composition output characteristic sequence of all words;Language material observation sequence and output
Characteristic sequence collectively constitutes language material;
Step (5):Formulate feature templates;
Step (6):The language material that step (4) obtains and the feature templates that step (5) obtains are input to condition random field
In model, conditional random field models are trained, obtain trained conditional random field models;
Step (7):Chinese medicine case to be predicted is handled using the same method in step (1)-(4), structure waits for pre-
Survey language material;It using the identification language material of structure as input, is input in trained conditional random field models, exports Chinese medicine case class
The position of distinguishing label and character, finally, according to the position of Chinese medicine case class label and character identify the Chinese medicine four methods of diagnosis, card type and
Therapy.
BIO labelling methods refer to:The first character of B presentation-entity, the non-first character of I presentation-entity, O indicate non-
Entity character.
As a further improvement on the present invention, the Chinese medicine case text refers to Traditional Chinese Medicine experts diagnosis and treatment activation record.
It is that separator extracts sentence according to the punctuation mark in Chinese medicine case text, the punctuation mark refers to teasing
Number, branch or fullstop.
The sentence that the step (1) is extracted, including Chinese medicine four methods of diagnosis sentence, card type sentence and therapy sentence;
The patient's abnormal symptom and abnormal sign that the Chinese medicine four methods of diagnosis, which refer to doctor, to be obtained by the four methods of diagnosis;Card type refers to doctor
The raw symptom confirmed;Therapy refers to the therapy that doctor confirms.
As a further improvement on the present invention, classify to sentence according to qualifier, sentence is classified as to current sentence
Son, the sentence denied and possible sentence;
The sentence denied and possible sentence are rejected, current sentence is retained.
The qualifier, including it is current, denying or possible.
The current sentence indicates the current malaise symptoms occurred certainly or disease;
The possible sentence indicates the diagnosis made before issuable symptom or doctor are made a definite diagnosis;
The sentence denied indicates the disease or symptom that do not betide sufferers themselves certainly.
As a further improvement on the present invention, remove meaningless word first, modify to wrong word;Point based on word
Word method cuts every a kind of sentence, cuts into single word;The meaningless word, including:Number, unit and punctuate symbol
Number.
As a further improvement on the present invention, the character feature refers to each word itself;
The part of speech feature, including:Verb, noun, adjective, adverbial word and preposition;
Left deictic words feature refers to the word for appearing in the name entity left side;If it is left that current word occurs from name entity
The word on side, then the left deictic words of current word be characterized as T, otherwise, the left deictic words of current word is characterized as F;
Right deictic words feature refers to the word appeared on the right of name entity;If it is right that current word occurs from name entity
The word on side, then the right deictic words of current word be characterized as T, otherwise, the right deictic words of current word is characterized as F;
Term characteristics refer to the word for describing human organ;If current word is human organ, the term characteristics of current word
For T, otherwise, the term characteristics of current word are F;
As a further improvement on the present invention, the foundation Chinese medicine case class label and BIO labelling methods are to each word
It is labeled:Chinese medicine case class label includes Chinese medicine four methods of diagnosis ZS, card type ZX and therapy ZF;
If current word belongs to the Chinese medicine four methods of diagnosis, the Chinese medicine case class label of current word is ZS;If label is ZS's
Current word is first character, then the output of current word is characterized as ZS-B;If label is the right and wrong the of the current word of ZS
One character, then the output of current word is characterized as ZS-I;
If current word belongs to card type, the Chinese medicine case class label of current word is ZX;If label is the current of ZX
Word is first character, then the output of current word is characterized as ZX-B;If label is the right and wrong first of the current word of ZX
Character, then the output of current word is characterized as ZX-I;
If current word belongs to therapy, the Chinese medicine case class label of current word is ZF;If label is the current of ZF
Word is first character, then the output of current word is characterized as ZF-B;If label is the right and wrong first of the current word of ZF
Character, then the output of current word is characterized as ZF-I;
If current word is not belonging to Chinese medicine four methods of diagnosis ZS, card type ZX and therapy ZF and any one of works as, the mark of current word
Label are O.
The step of step (5) is:
All words that participle obtains are lined up into sequence;It is about 5 in window size to each word that participle obtains
Contextual feature is extracted in text window [- 2,2], by each character representation at the format of " letter+number ", wherein word is by " W "
It indicating, part of speech is indicated by " P ", and left deictic words is indicated by " L ", and right deictic words is indicated by " R ", and TCM-related Terms feature is indicated by " Y ",
Based on context 19 public characteristic marks are set,
W_-2 indicates preceding second word;First character before W_-1 is indicated;W_0 indicates current word;W_1 indicates the latter word;
Second word after W_2 is indicated;
P_-2 indicates the part of speech of preceding second word;The part of speech of first character before P_-1 is indicated;P_0 indicates the part of speech of current word;
First character part of speech after P_1 is indicated;Second words after P_2 is indicated;
L_-2 indicates the left instruction of preceding second word;L_-1 indicates the left instruction of preceding first word;
The right instruction of first character after R_1 is indicated;The right instruction of second word after R_2 is indicated;
Y_-2 indicates the term characteristics of preceding second word;The term characteristics of first character before Y_-1 is indicated;
Y_0 indicates the term characteristics of current word;The term characteristics of first character after Y_1 is indicated;
The term characteristics of second word after Y_2 is indicated;
It is identified according to public characteristic, it is as follows to formulate feature templates:
W_-2, W_-1, W_0, W_1, W_2, W_-1/W_0, W_0/W_1, W_-2/W_0, W_0/W_2, P_-2, P_-1, P_
0, P_1, P_2, P_-1/P_0, P_0/P_1, P_-2/P_0, P_0/P_2, L_-2/W_0, L_-1/W_0, W_0/R_1, W_0/R_
2, Y_-2/W_0, Y_-1/W_0, W_0/Y_0, W_0/Y_1, W_0/Y_2;Wherein, "/" indicates separator.
As a further improvement on the present invention, the step of step (6) are:
Step (6.1):The language material that step (4) is obtained is as training corpus;
Step (6.2):By the word character representation in training corpus at observation sequence x, output characteristic sequence is expressed as defeated
Go out sequences y, input and output combination (x, y) are saved in training sample set;
Step (6.3):With training sample set training condition random field models;
Step (6.4):The conditional probability of prediction combination (x, y) is until convergence, obtains trained conditional random field models.
As the second aspect of the present invention:
Identifying system is named based on the modified Chinese medicine case of multiple features template, including:It memory, processor and is stored in
The computer instruction run on memory and on a processor when the computer instruction is run by processor, completes above-mentioned
Step described in one method.
As the third aspect of the present invention:
A kind of computer readable storage medium, is stored thereon with computer instruction, and the computer instruction is transported by processor
When row, the step described in any of the above-described method is completed.
Compared with prior art, the beneficial effects of the invention are as follows:
Application conditions random field of the present invention proposes a kind of based on the modified Chinese medicine case name entity knowledge of multiple features template
Other method proposes character feature, part of speech feature, left and right deictic words feature and term characteristics in conjunction with the characteristics of Chinese medicine case text
Mask method, train CRFs models, the identification Chinese medicine four methods of diagnosis, card type and therapy entity to pass through experiment using the data after mark
Verification, increase left and right deictic words feature and term characteristics mark after, accuracy rate, recall rate and F estimate had it is larger
It improves.By the continuous accumulation of case and more rational parameter setting, and characteristic value, the name are further rationally set
Entity recognition method can provide more for structure " the Chinese medicine four methods of diagnosis-card type-therapy " triple correspondence, scientific appraisal diagnosis and treatment
Valuable reference and foundation.
Description of the drawings
The accompanying drawings which form a part of this application are used for providing further understanding of the present application, and the application's shows
Meaning property embodiment and its explanation do not constitute the improper restriction to the application for explaining the application.
Fig. 1 is the flow chart of the present invention;
Fig. 2 is word frequency figure.
Specific implementation mode
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another
It indicates, all technical and scientific terms that the present invention uses have logical with the application person of an ordinary skill in the technical field
The identical meanings understood.
It should be noted that term used herein above is merely to describe specific implementation mode, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative
It is also intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or combination thereof.
Chinese medicine case is the movable record of Traditional Chinese Medicine experts diagnosis and treatment, and Entity recognition is named to it and is standardized to Chinese medicine case
And informatization research is of great significance.For Chinese medicine case text there is the fuzzy and indefinite feature of appellation is stated, originally
Invention is based on condition random field, it is proposed that one kind being based on the modified name entity recognition method of multiple features template.First to Chinese medicine
Case text carries out sentence extraction and automatic word segmentation, then refers to the language material progress character feature after participle, part of speech feature, left and right
Show that word feature and term characteristics are labeled, finally trains CRFs models, the four methods of diagnosis of identification Chinese medicine, card type using the data after mark
And therapy entity, " the Chinese medicine four methods of diagnosis-card type-therapy " triple correspondence is built, reference is provided for scientific appraisal diagnosis and treatment
And foundation.With in June, 2014 for selecting in December, 2016 Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine angiocarpy outpatient service expert
12000 parts of Chinese medicine cases be data source, pass through the adjustment of the various combination and contextual window size of feature, further promoted
The accuracy of identification.Accuracy rate, recall rate and F estimate average value and have respectively reached 90.68%, 90.45% and 90.56%.
For Chinese medicine case text there is the fuzzy and indefinite situation of appellation is stated, the present invention is based on condition randoms
, it proposes a kind of based on the modified name entity recognition method of multiple features template.Chinese medicine case is felt concerned about for 12000, is analyzed
Its text feature, part of speech and label feature, definition template and training CRFs models, establish text feature and name entity class and
Incidence relation between lexeme excavates the pass of Chinese medicine four methods of diagnosis text message, card type text message and therapy text message by extraction
Connection relationship explains " diagnosis and treatment based on an overall analysis of the illness and the patient's condition " principle, and scientific basis is provided for experience succession and knowledge acquisition.
1 case text examples of table
The punctuation mark occurred using in case text carries out subordinate sentence as separator, includes altogether in 12000 parts of cases
180635 sentences remove after number and nonsense word altogether comprising covering 1267 words.Wherein the most persons of number be " heart ",
Totally 37236 times, frequency 310.30%;Followed by " chest " totally 20701 times, frequency 172.51%, word of the frequency less than 1% have
727, wherein 50 words such as " boiling ", " lance ", " group ", " curling up ", " section " have only been used 1 time.Word frequency figure is shown in that Fig. 2, frequency are more than
40% shares 28 words, and word frequency table is shown in Table 2.
2. word frequency table of table
Chinese medicine case text belongs to natural language, but traditional Chinese medicine is the multidisciplinary subject interpenetrated, with other natural sections
It learns language to compare, Chinese medicine case text has the characteristics such as ambiguousness, Metaphor, classical Chinese and stationarity again.
(1) ambiguity.Such as deep red tongue, it is construed to the tongue picture developed by red tongue, more deepens a step than red tongue, however it is red
It can not quantify with deep red defining standard, be typically expressed as " tongue is red " or " tongue is deep red " or directly broadly be recorded as " deep red tongue ".
(2) Metaphor.Such as " the wooden prosperous gram of soil ", it is also " liver qi invading spleen ", is a tcm syndrome caused by liver qi invading spleen.
For another example " harmonizing liver-spleen ", then it is specially the therapy for being directed to " the wooden prosperous gram of soil " and setting up to be, i.e., treats " irritability by soothing liver and strengthening spleen
The therapy of criminal's spleen " syndrome.
(3) classical Chinese.If " solution expression is evil " is so that perverse trend is gone out from fleshy exterior using drug, " reinforcing earth to generate metal " is to utilize
Method nourishing lung qi of the theory invigoration spleen qi of mutual generation of five phases etc..
(4) stationarity.For tongue nature, tongue color, quality are generally described, is expressed as " tongue ×× or tongue nature ×× ", such as " tongue
It is red ", " pink tongue ", " tongue is dark ", " tongue nature is light ", " tongue nature is light fat " etc..For tongue fur, then to describe the color of tongue fur, moisturize,
Thickness.
One embodiment as the present invention:
As shown in Figure 1, naming recognition methods based on the modified Chinese medicine case of multiple features template, include the following steps:
Step (1):Sentence extraction is carried out to Chinese medicine case text;
Further, the Chinese medicine case text refers to Traditional Chinese Medicine experts diagnosis and treatment activation record.
It is that separator extracts sentence according to the punctuation mark in Chinese medicine case text, the punctuation mark refers to teasing
Number, branch or fullstop.
The sentence that the step (1) is extracted, including Chinese medicine four methods of diagnosis sentence, card type sentence and therapy sentence;
The patient's abnormal symptom and abnormal sign that the Chinese medicine four methods of diagnosis, which refer to doctor, to be obtained by the four methods of diagnosis;Card type refers to doctor
The raw symptom confirmed;Therapy refers to the therapy that doctor confirms.
Step (2):Classify to the sentence of extraction;
Further, classify to sentence according to qualifier, sentence be classified as to current sentence, the sentence denied and
Possible sentence;The sentence denied and possible sentence are rejected, current sentence is retained.
The qualifier, including it is current, denying or possible.
The current sentence indicates the current malaise symptoms occurred certainly or disease;
The possible sentence indicates the diagnosis made before issuable symptom or doctor are made a definite diagnosis;
The sentence denied indicates the disease or symptom that do not betide sufferers themselves certainly.
Step (3):Word segmentation processing is carried out to every a kind of sentence;
Further, remove meaningless word first, modify to wrong word;As " blood stasis " is revised as " blood stasis ";Base
Every a kind of sentence is cut in the segmenting method of word, cuts into single word;The meaningless word, including:Number, unit
And punctuation mark.
Step (4):Each word obtained to participle carries out character feature, part of speech feature, left deictic words feature, the right side successively
Deictic words feature and term characteristics mark;These features have salient feature, automatic marking easy to implement.The word of all words
It accords with feature, part of speech feature, left deictic words feature, right deictic words feature and term characteristics and forms language material observation sequence, according in
Doctor's case class label and BIO labelling methods are labeled each word, generate the output feature of each word;All words it is defeated
Go out feature composition output characteristic sequence;Language material observation sequence and output characteristic sequence collectively constitute language material;Entity is the Chinese medicine four methods of diagnosis
(ZS), card type (ZX) and therapy (ZF);
As a further improvement on the present invention, the character feature refers to each word itself;
The part of speech feature, including:Verb, noun, adjective, adverbial word and preposition;
Left deictic words feature refers to the word for appearing in the name entity left side;If it is left that current word occurs from name entity
The word on side, then the left deictic words of current word be characterized as T, otherwise, the left deictic words of current word is characterized as F;
Right deictic words feature refers to the word appeared on the right of name entity;If it is right that current word occurs from name entity
The word on side, then the right deictic words of current word be characterized as T, otherwise, the right deictic words of current word is characterized as F;
Term characteristics refer to the word for describing human organ;If current word is human organ, the term characteristics of current word
For T, otherwise, the term characteristics of current word are F;
Further, described that each word is labeled according to Chinese medicine case class label and BIO labelling methods:Chinese medicine is cured
Case class label includes Chinese medicine four methods of diagnosis ZS, card type ZX and therapy ZF;
If current word belongs to the Chinese medicine four methods of diagnosis, the Chinese medicine case class label of current word is ZS;If label is ZS's
Current word is first character, then the output of current word is characterized as ZS-B;If label is the right and wrong the of the current word of ZS
One character, then the output of current word is characterized as ZS-I;
If current word belongs to card type, the Chinese medicine case class label of current word is ZX;If label is the current of ZX
Word is first character, then the output of current word is characterized as ZX-B;If label is the right and wrong first of the current word of ZX
Character, then the output of current word is characterized as ZX-I;
If current word belongs to therapy, the Chinese medicine case class label of current word is ZF;If label is the current of ZF
Word is first character, then the output of current word is characterized as ZF-B;If label is the right and wrong first of the current word of ZF
Character, then the output of current word is characterized as ZF-I;
If current word is not belonging to Chinese medicine four methods of diagnosis ZS, card type ZX and therapy ZF and any one of works as, the mark of current word
Label are O.
(1) character feature (W)
Word segmentation processing based on word is carried out to case text, such as:" cough runny nose " is divided into " cough/coughing/flows/tears/".
(2) part of speech feature (P)
Part of speech is divided into verb, noun, adjective, adverbial word and preposition etc..Original case language material example is as follows:Experienced before half a month
Chill, runny nose of coughing, phlegm is white and sticks, then oedema, by instep, spread to waist abdomen.Uncomfortable in chest, abdominal distension receives difference.Oliguria, it is complete
Indigested food.Recently aggravation, perspiration, chilly, limb be cold, palpitaition, it is out of breath, cannot lie down.Result after part-of-speech tagging is as follows:
Before half/n months/n/adv senses/v by/v wind/n it is cold/n coughs/v coughs/v streams/v tears/v phlegm/n is white/adj and/con stick/adj after/con and/
Con water/n is swollen/and v opens by/con foot/n the back ofs the body/n/con beginnings/con is climing/v prolongs/v to/v waists/n abdomens/n chests/n is bored/v abdomens/n is swollen/and v receives/v
Difference/adj urine/n is few/adj is complete/n paddy/n not /advization/v is close/n day/n diseases/n feelings/n adds/v weight/adv go out/v sweat/fear/tremble with fear/n limbs/
N is cold/heart/n throbs with fear/v gas/n urgency/adv not /adv energy/adv are flat/n is sleeping/v.
(3) left and right deictic words feature (L) and (R)
Chinese medicine name entity often occurs with together with specific word, and certain words for often appearing in the name entity left side are designated as
Zuo Zhi circle words, the word for appearing in the right are designated as You Zhi circle words.In terms of the Chinese medicine four methods of diagnosis:Place near Chinese medicine four methods of diagnosis entity
Often will appear deictic words has " with ", " further ", " still " etc..In terms of card type:Near card type place often have deictic words " with
Cause ", " gesture " etc..In terms of therapy:Often there are deictic words " giving ", " suitable ", " still giving ", " control and give ", " controlling preferably " in place near therapy
Deng.
(4) term characteristics (Y)
Chinese medicine case entity includes the word of description human organ, such as:The terms such as " head ", " eye ", " tongue ", " fire ".In
Cure the four methods of diagnosis in terms of description human organ and pathologic substances word, such as " head ", " pain ", " eye ", " dry ", " sweat ", " going out ", " urine ",
" Huang ";In terms of card type:It is said with the yin-yang and five elements that " gold, wood, water, fire, soil " is representative and " wind, cold, wet, dry, fiery " is to patient
Mechanistic description, such as " spleen ", " void ", " wet ", " Sheng ", " heart ", " the moon ", " no ", " foot " etc..In terms of therapy:Usually 4 word
Or 8 word patterns, such as " soothing liver-qi stagnation ", " supplementing qi and nourishing yin, promoting blood circulation and removing blood stasis " etc..
By participle and feature automatic marking, language material observation sequence and output characteristic sequence are generated.The language material is indicated with " T "
Meet mark feature, " F " indicates that the language material does not meet mark feature and respectively represents the Chinese medicine four methods of diagnosis, card type with ZS, ZX and ZF and control
Method entity, as shown in the table.
3 Chinese medicine case class label table of table
It is marked using " BIO " method, first character, the non-first character of " B ", " I " and " O " difference presentation-entity
With non-physical character, after upper example class label mark as shown in table 4.
Table 4 " BIO " class label marks
Step (5):Formulate feature templates;
Further, the step of step (5) are:
All words that participle obtains are lined up into sequence;It is about 5 in window size to each word that participle obtains
Contextual feature is extracted in text window [- 2,2], by each character representation at the format of " letter+number ", wherein word is by " W "
It indicating, part of speech is indicated by " P ", and left deictic words is indicated by " L ", and right deictic words is indicated by " R ", and TCM-related Terms feature is indicated by " Y ",
Based on context 19 public characteristic marks are set,
W_-2 indicates preceding second word;First character before W_-1 is indicated;W_0 indicates current word;W_1 indicates the latter word;
Second word after W_2 is indicated;
P_-2 indicates the part of speech of preceding second word;The part of speech of first character before P_-1 is indicated;P_0 indicates the part of speech of current word;
First character part of speech after P_1 is indicated;Second words after P_2 is indicated;
L_-2 indicates the left instruction of preceding second word;L_-1 indicates the left instruction of preceding first word;
The right instruction of first character after R_1 is indicated;The right instruction of second word after R_2 is indicated;
Y_-2 indicates the term characteristics of preceding second word;The term characteristics of first character before Y_-1 is indicated;
Y_0 indicates the term characteristics of current word;The term characteristics of first character after Y_1 is indicated;
The term characteristics of second word after Y_2 is indicated;
It is identified according to public characteristic, it is as follows to formulate feature templates:
W_-2, W_-1, W_0, W_1, W_2, W_-1/W_0, W_0/W_1, W_-2/W_0, W_0/W_2, P_-2, P_-1, P_
0, P_1, P_2, P_-1/P_0, P_0/P_1, P_-2/P_0, P_0/P_2, L_-2/W_0, L_-1/W_0, W_0/R_1, W_0/R_
2, Y_-2/W_0, Y_-1/W_0, W_0/Y_0, W_0/Y_1, W_0/Y_2;Wherein, "/" indicates separator.
Step (6):The language material that step (4) obtains and the feature templates that step (5) obtains are input to condition random field
In model, conditional random field models are trained, obtain trained conditional random field models;
Further, the step of step (6) are:
Step (6.1):The language material that step (4) is obtained is as training corpus;
Step (6.2):By the word character representation in training corpus at observation sequence x, output characteristic sequence is expressed as defeated
Go out sequences y, input and output combination (x, y) are saved in training sample set;
Step (6.3):With training sample set training condition random field models;
Condition random field is defined as follows:It is marked by participle, data cleansing and feature, obtains text input sequence x (x=
(x1,x2,...,xn)), model parameter is obtained by training, predicts the conditional probability of the corpus labeling needed combination y.
Assuming that input variable is x, when output variable is y, conditional probability P (y | x) it is defined as following form:
Wherein λkFor weight, tkAnd slIt is characterized function, Z (x) is normalization coefficient.
Step (6.4):The conditional probability of prediction combination (x, y) is until convergence, obtains trained conditional random field models.
Step (7):Chinese medicine case to be predicted is handled using the same method in step (1)-(4), structure waits for pre-
Survey language material;It using the identification language material of structure as input, is input in trained conditional random field models, exports Chinese medicine case class
The position of distinguishing label and character, finally, according to the position of Chinese medicine case class label and character identify the Chinese medicine four methods of diagnosis, card type and
Therapy.
BIO labelling methods refer to:The first character of B presentation-entity, the non-first character of I presentation-entity, O indicate non-
Entity character.
Second embodiment as the present invention:
Identifying system is named based on the modified Chinese medicine case of multiple features template, including:It memory, processor and is stored in
The computer instruction run on memory and on a processor when the computer instruction is run by processor, completes above-mentioned
Step described in one method.
Third embodiment as the present invention:
A kind of computer readable storage medium, is stored thereon with computer instruction, and the computer instruction is transported by processor
When row, the step described in any of the above-described method is completed.
1 experiment and its analysis
1.1 evaluation criterion
The index that evaluation information extracts has:Accuracy rate (P), recall rate (R) and F- estimate (F), are defined as follows:
1.2 experimental designs and verification
(1) signature identification
The present invention extracts contextual feature in the contextual window [- 2,2] that window size is 5, and feature space is referred to as
Each group of feature templates, are expressed as the format of " letter+number " by " 5 word window ", and wherein word indicates that part of speech is by " P " table by " W "
Show, left and right instruction conjunction is indicated that TCM-related Terms feature is indicated by " Y " by " L " and " R ", and based on context setting 19 is public
Signature identification, table 5 are public characteristic mark and meaning.
5 signature identification of table and meaning
Serial number | Mark | Meaning | Serial number | Mark | Meaning |
1 | W_-2 | Preceding second word | 2 | W_-1 | Preceding first character |
3 | W_0 | Current word | 4 | W_1 | First character afterwards |
5 | W_2 | Second word afterwards | 6 | P_-2 | Preceding second words |
7 | P_-1 | Preceding first character part of speech | 8 | P_0 | Current words |
9 | P_1 | First character part of speech afterwards | 10 | P_2 | Second words afterwards |
11 | L_-2 | The left instruction of preceding second word | 12 | L_-1 | The left instruction of preceding first word |
13 | R_1 | The right instruction of first character afterwards | 14 | R_2 | The right instruction of second word afterwards |
15 | Y_-2 | The term characteristics of preceding second word | 16 | Y_-1 | The term characteristics of preceding first character |
17 | Y_0 | The term characteristics of current word | 18 | Y_1 | The term characteristics of first character afterwards |
19 | Y_2 | The term characteristics of second word afterwards |
(2) experimental design
Tmpt_1, Tmpt_2, Tmpt_3 and Tmpt_4 is used to complete three groups of contrast experiments, test feature selection and window respectively
For size to the difference of experimental result, template definition is as shown in table 6, and experimental design is as shown in table 7.
6 template of table
7 experimental design of table
4.3 interpretation of result
(1) one analysis of experiment
It is tested, is given in table 8 when contextual window is respectively set as 3 and 5 using Tmpt_1, Tmpt_2, it is right
The influence of experimental result.
The variation of 8 window of table influences result
Item name | P (%) | R (%) | F (%) |
The Chinese medicine four methods of diagnosis | +0.45 | +0.47 | +0.46 |
Card type | +0.04 | +0.38 | +0.14 |
Therapy | +1.83 | +1.12 | +1.28 |
The name entity effect of the Chinese medicine four methods of diagnosis, card type and therapy is different, since the average character length of 3 class entities is:It hopes
News asks that diagnosis is 3.17 characters, and card type is 2.21 characters, and therapy is 4.78 characters.And in terms of the raising situation of effect:
The F values of the Chinese medicine four methods of diagnosis increase by 0.46%, and the F values of card type increase 0.14%, and therapy increases 1.28%.It is found through experiment real
When body length and the close contextual window length of selection, effect is preferable.
(2) two analysis of experiment
Contrast experiment is carried out as experiment mould using Tmpt_2, Tmpt_3, after increasing left and right mark, recognition effect obviously changes
Become, the effect promoting of therapy is the most apparent, as a result influences as shown in the table.
Influence of the 9 feature selecting classification logotype of table to result
Item name | P (%) | R (%) | F (%) |
The Chinese medicine four methods of diagnosis | +7.17 | +6.23 | +0.19 |
Card type | +5.37 | +5.48 | +0.42 |
Therapy | +5.86 | +4.76 | +0.84 |
(3) three analysis of experiment
Experiment three increases an experimental group newly, and template uses Tmpt_4, compared with Tmpt_3, compare increase term characteristics
The influence to naming Entity recognition is identified, experimental result is as shown in table 10.
10 optimal identification result of table
It can be obtained by the F values of all kinds of name entities in contrast table 10, best template is Tmpt_4, and accuracy rate is called together
The rate of returning and F estimate average value and respectively reach:90.68%, 90.45%, 90.56%, recognition performance is enhanced.
In the accuracy rate for improving name Entity recognition, abundant feature set has served critically important.For some special circumstances, also need
It is modified by dictionary and rule.
(4) compared with existing method
By consulting literatures, Feng Lizhi proposes that the mixing based on Bootstrapping is known for tcm clinical practice case history language material
Method, F values reach 87%;Yuan Yuhu carries out symptom terms name entity extraction using CRFs models and tests, in open test
The optimal F values of evaluation result reach 87%;The present invention higher than the two, reaches the average F values of the name Entity recognition of case
90.51%.
Application conditions random field of the present invention proposes a kind of based on the modified Chinese medicine case name entity knowledge of multiple features template
Other method proposes character feature, part of speech feature, left and right deictic words feature and term characteristics in conjunction with the characteristics of Chinese medicine case text
Mask method, train CRFs models, the identification Chinese medicine four methods of diagnosis, card type and therapy entity to pass through experiment using the data after mark
Verification, increase left and right deictic words feature and term characteristics mark after, accuracy rate, recall rate and F estimate had it is larger
It improves.By the continuous accumulation of case and more rational parameter setting, and characteristic value, the name are further rationally set
Entity recognition method can provide more for structure " the Chinese medicine four methods of diagnosis-card type-therapy " triple correspondence, scientific appraisal diagnosis and treatment
Valuable reference and foundation.
The foregoing is merely the preferred embodiments of the application, are not intended to limit this application, for the skill of this field
For art personnel, the application can have various modifications and variations.Within the spirit and principles of this application, any made by repair
Change, equivalent replacement, improvement etc., should be included within the protection domain of the application.
Claims (10)
1. naming recognition methods based on the modified Chinese medicine case of multiple features template, characterized in that include the following steps:
Step (1):Sentence extraction is carried out to Chinese medicine case text;
Step (2):Classify to the sentence of extraction;
Step (3):Word segmentation processing is carried out to every a kind of sentence;
Step (4):Each word obtained to participle carries out character feature, part of speech feature, left deictic words feature, right instruction successively
Word feature and term characteristics mark;The character features of all words, part of speech feature, left deictic words feature, right deictic words feature and
Term characteristics form language material observation sequence, are labeled to each word according to Chinese medicine case class label and BIO labelling methods, raw
At the output feature of each word;The output feature composition output characteristic sequence of all words;Language material observation sequence and output feature
Sequence collectively constitutes language material;
Step (5):Formulate feature templates;
Step (6):The language material that step (4) obtains and the feature templates that step (5) obtains are input to conditional random field models
In, conditional random field models are trained, trained conditional random field models are obtained;
Step (7):Chinese medicine case to be predicted is handled using the same method in step (1)-(4), builds language to be predicted
Material;It using the identification language material of structure as input, is input in trained conditional random field models, exports Chinese medicine case classification mark
The position of label and character is finally identified the Chinese medicine four methods of diagnosis, card type according to the position of Chinese medicine case class label and character and is controlled
Method.
2. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that
Be that separator extracts sentence according to the punctuation mark in Chinese medicine case text, the punctuation mark refer to comma,
Branch or fullstop;The sentence that the step (1) is extracted, including Chinese medicine four methods of diagnosis sentence, card type sentence and therapy sentence;The Chinese medicine four methods of diagnosis
The patient's abnormal symptom and abnormal sign that refer to doctor obtained by the four methods of diagnosis;Card type refers to the symptom that doctor confirms;It controls
Method refers to the therapy that doctor confirms.
3. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that foundation
Qualifier classifies to sentence, and sentence is classified as to current sentence, the sentence and possible sentence denied;What rejecting was denied
Sentence and possible sentence, retain current sentence;The qualifier, including it is current, denying or possible;It is described current
Sentence, indicate the current malaise symptoms occurred certainly or disease;The possible sentence, indicate issuable symptom or
The diagnosis that doctor makes before making a definite diagnosis;The sentence denied indicates the disease or symptom that do not betide sufferers themselves certainly.
4. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that first
Remove meaningless word, modifies to wrong word;Every a kind of sentence is cut based on the segmenting method of word, cuts into list
A word;The meaningless word, including:Number, unit and punctuation mark.
5. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that described
Character feature refers to each word itself;
The part of speech feature, including:Verb, noun, adjective, adverbial word and preposition;
Left deictic words feature refers to the word for appearing in the name entity left side;If current word occurs from the name entity left side
Word, then the left deictic words of current word be characterized as T, otherwise, the left deictic words of current word is characterized as F;
Right deictic words feature refers to the word appeared on the right of name entity;If current word occurs from the right of name entity
Word, then the right deictic words of current word be characterized as T, otherwise, the right deictic words of current word is characterized as F;
Term characteristics refer to the word for describing human organ;If current word is human organ, the term characteristics of current word are T,
Otherwise, the term characteristics of current word are F.
6. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that described
Each word is labeled according to Chinese medicine case class label and BIO labelling methods:Chinese medicine case class label includes Chinese medicine four
Examine ZS, card type ZX and therapy ZF;
If current word belongs to the Chinese medicine four methods of diagnosis, the Chinese medicine case class label of current word is ZS;If label is the current of ZS
Word is first character, then the output of current word is characterized as ZS-B;If label is the right and wrong first of the current word of ZS
Character, then the output of current word is characterized as ZS-I;
If current word belongs to card type, the Chinese medicine case class label of current word is ZX;If label is the current word of ZX
It is first character, then the output of current word is characterized as ZX-B;If label is that the current word of ZX is non-first character,
The output of so current word is characterized as ZX-I;
If current word belongs to therapy, the Chinese medicine case class label of current word is ZF;If label is the current word of ZF
It is first character, then the output of current word is characterized as ZF-B;If label is that the current word of ZF is non-first character,
The output of so current word is characterized as ZF-I;
If current word is not belonging to Chinese medicine four methods of diagnosis ZS, card type ZX and therapy ZF and any one of works as, the label of current word is
O。
7. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that described
The step of step (5) is:
All words that participle obtains are lined up into sequence;To each word that participle obtains, the context window for being 5 in window size
Contextual feature is extracted in mouthful [- 2,2], by each character representation at the format of " letter+number ", wherein word is indicated by " W ",
Part of speech is indicated that left deictic words is indicated by " L " by " P ", and right deictic words is indicated by " R ", and TCM-related Terms feature is indicated by " Y ", according to
Context sets 19 public characteristic marks,
W_-2 indicates preceding second word;First character before W_-1 is indicated;W_0 indicates current word;W_1 indicates the latter word;
Second word after W_2 is indicated;
P_-2 indicates the part of speech of preceding second word;The part of speech of first character before P_-1 is indicated;P_0 indicates the part of speech of current word;
First character part of speech after P_1 is indicated;Second words after P_2 is indicated;
L_-2 indicates the left instruction of preceding second word;L_-1 indicates the left instruction of preceding first word;
The right instruction of first character after R_1 is indicated;The right instruction of second word after R_2 is indicated;
Y_-2 indicates the term characteristics of preceding second word;The term characteristics of first character before Y_-1 is indicated;
Y_0 indicates the term characteristics of current word;The term characteristics of first character after Y_1 is indicated;
The term characteristics of second word after Y_2 is indicated;
It is identified according to public characteristic, it is as follows to formulate feature templates:
W_-2, W_-1, W_0, W_1, W_2, W_-1/W_0, W_0/W_1, W_-2/W_0, W_0/W_2, P_-2, P_-1, P_0, P_
1, P_2, P_-1/P_0, P_0/P_1, P_-2/P_0, P_0/P_2, L_-2/W_0, L_-1/W_0, W_0/R_1, W_0/R_2, Y_-
2/W_0, Y_-1/W_0, W_0/Y_0, W_0/Y_1, W_0/Y_2;Wherein, "/" indicates separator.
8. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that step
(6) the step of is:
Step (6.1):The language material that step (4) is obtained is as training corpus;
Step (6.2):By the word character representation in training corpus at observation sequence x, output characteristic sequence is expressed as to export sequence
Y is arranged, input and output combination (x, y) are saved in training sample set;
Step (6.3):With training sample set training condition random field models;
Step (6.4):The conditional probability of prediction combination (x, y) is until convergence, obtains trained conditional random field models.
9. naming identifying system based on the modified Chinese medicine case of multiple features template, characterized in that including:Memory, processor with
And the computer instruction run on a memory and on a processor is stored, it is complete when the computer instruction is run by processor
The step of at the claims 1-8 either method.
10. a kind of computer readable storage medium, characterized in that be stored thereon with computer instruction, the computer instruction quilt
Processor run when, complete the claims 1-8 either method the step of.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810359240.9A CN108549639A (en) | 2018-04-20 | 2018-04-20 | Based on the modified Chinese medicine case name recognition methods of multiple features template and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810359240.9A CN108549639A (en) | 2018-04-20 | 2018-04-20 | Based on the modified Chinese medicine case name recognition methods of multiple features template and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108549639A true CN108549639A (en) | 2018-09-18 |
Family
ID=63511841
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810359240.9A Pending CN108549639A (en) | 2018-04-20 | 2018-04-20 | Based on the modified Chinese medicine case name recognition methods of multiple features template and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108549639A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109166608A (en) * | 2018-09-17 | 2019-01-08 | 新华三大数据技术有限公司 | Electronic health record information extracting method, device and equipment |
CN109215798A (en) * | 2018-10-09 | 2019-01-15 | 北京科技大学 | A kind of construction of knowledge base method towards Chinese medicine ancient Chinese prose |
CN109635123A (en) * | 2018-11-28 | 2019-04-16 | 北京工业大学 | A kind of Chinese medicine text concept recognition methods of increment type |
CN110175246A (en) * | 2019-04-09 | 2019-08-27 | 山东科技大学 | A method of extracting notional word from video caption |
CN110502750A (en) * | 2019-08-06 | 2019-11-26 | 山东师范大学 | Disambiguation method, system, equipment and medium during Chinese medicine text participle |
CN110516241A (en) * | 2019-08-26 | 2019-11-29 | 北京三快在线科技有限公司 | Geographical address analytic method, device, readable storage medium storing program for executing and electronic equipment |
CN110879831A (en) * | 2019-10-12 | 2020-03-13 | 杭州师范大学 | Chinese medicine sentence word segmentation method based on entity recognition technology |
CN111079377A (en) * | 2019-12-03 | 2020-04-28 | 哈尔滨工程大学 | Method for recognizing named entities oriented to Chinese medical texts |
CN111259626A (en) * | 2020-01-16 | 2020-06-09 | 上海国民集团健康科技有限公司 | Traditional Chinese medicine entity recognition algorithm |
CN111274391A (en) * | 2020-01-15 | 2020-06-12 | 北京百度网讯科技有限公司 | SPO extraction method and device, electronic equipment and storage medium |
CN111832306A (en) * | 2020-07-09 | 2020-10-27 | 昆明理工大学 | Image diagnosis report named entity identification method based on multi-feature fusion |
CN112017773A (en) * | 2020-08-31 | 2020-12-01 | 吾征智能技术(北京)有限公司 | Disease cognition model construction method based on nightmare and disease cognition system |
CN112131862A (en) * | 2020-07-20 | 2020-12-25 | 中国中医科学院中医药信息研究所 | Traditional Chinese medicine medical record data processing method and device and electronic equipment |
CN112380856A (en) * | 2020-10-20 | 2021-02-19 | 湖南大学 | Method, system, terminal and readable storage medium for automatically extracting component names in patent text |
WO2021146831A1 (en) * | 2020-01-20 | 2021-07-29 | 京东方科技集团股份有限公司 | Entity recognition method and apparatus, dictionary creation method, device, and medium |
CN113488035A (en) * | 2020-04-28 | 2021-10-08 | 海信集团有限公司 | Voice information processing method, device, equipment and medium |
CN113807097A (en) * | 2020-10-30 | 2021-12-17 | 北京中科凡语科技有限公司 | Named entity recognition model establishing method and named entity recognition method |
CN117708338A (en) * | 2024-02-05 | 2024-03-15 | 成都中医药大学 | Extraction method and model for Chinese electronic medical record entity identification and four-diagnosis classification |
CN117708338B (en) * | 2024-02-05 | 2024-04-26 | 成都中医药大学 | Extraction method and model for Chinese electronic medical record entity identification and four-diagnosis classification |
-
2018
- 2018-04-20 CN CN201810359240.9A patent/CN108549639A/en active Pending
Non-Patent Citations (1)
Title |
---|
袁锋: "中医医案文本挖掘的若干关键技术研究", 《中国博士学位论文全文数据库-信息科技辑》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109166608A (en) * | 2018-09-17 | 2019-01-08 | 新华三大数据技术有限公司 | Electronic health record information extracting method, device and equipment |
CN109215798A (en) * | 2018-10-09 | 2019-01-15 | 北京科技大学 | A kind of construction of knowledge base method towards Chinese medicine ancient Chinese prose |
CN109635123A (en) * | 2018-11-28 | 2019-04-16 | 北京工业大学 | A kind of Chinese medicine text concept recognition methods of increment type |
CN110175246A (en) * | 2019-04-09 | 2019-08-27 | 山东科技大学 | A method of extracting notional word from video caption |
CN110502750B (en) * | 2019-08-06 | 2023-08-11 | 山东师范大学 | Disambiguation method, disambiguation system, disambiguation equipment and disambiguation medium in Chinese medicine text word segmentation process |
CN110502750A (en) * | 2019-08-06 | 2019-11-26 | 山东师范大学 | Disambiguation method, system, equipment and medium during Chinese medicine text participle |
CN110516241A (en) * | 2019-08-26 | 2019-11-29 | 北京三快在线科技有限公司 | Geographical address analytic method, device, readable storage medium storing program for executing and electronic equipment |
CN110879831A (en) * | 2019-10-12 | 2020-03-13 | 杭州师范大学 | Chinese medicine sentence word segmentation method based on entity recognition technology |
CN111079377A (en) * | 2019-12-03 | 2020-04-28 | 哈尔滨工程大学 | Method for recognizing named entities oriented to Chinese medical texts |
CN111274391A (en) * | 2020-01-15 | 2020-06-12 | 北京百度网讯科技有限公司 | SPO extraction method and device, electronic equipment and storage medium |
CN111274391B (en) * | 2020-01-15 | 2023-09-01 | 北京百度网讯科技有限公司 | SPO extraction method and device, electronic equipment and storage medium |
CN111259626A (en) * | 2020-01-16 | 2020-06-09 | 上海国民集团健康科技有限公司 | Traditional Chinese medicine entity recognition algorithm |
WO2021146831A1 (en) * | 2020-01-20 | 2021-07-29 | 京东方科技集团股份有限公司 | Entity recognition method and apparatus, dictionary creation method, device, and medium |
CN113488035A (en) * | 2020-04-28 | 2021-10-08 | 海信集团有限公司 | Voice information processing method, device, equipment and medium |
CN111832306A (en) * | 2020-07-09 | 2020-10-27 | 昆明理工大学 | Image diagnosis report named entity identification method based on multi-feature fusion |
CN112131862A (en) * | 2020-07-20 | 2020-12-25 | 中国中医科学院中医药信息研究所 | Traditional Chinese medicine medical record data processing method and device and electronic equipment |
CN112017773A (en) * | 2020-08-31 | 2020-12-01 | 吾征智能技术(北京)有限公司 | Disease cognition model construction method based on nightmare and disease cognition system |
CN112017773B (en) * | 2020-08-31 | 2024-03-26 | 吾征智能技术(北京)有限公司 | Disease cognitive model construction method and disease cognitive system based on nightmare |
CN112380856A (en) * | 2020-10-20 | 2021-02-19 | 湖南大学 | Method, system, terminal and readable storage medium for automatically extracting component names in patent text |
CN112380856B (en) * | 2020-10-20 | 2023-09-29 | 湖南大学 | Automatic extraction method, system, terminal and readable storage medium for component naming in patent text |
CN113807097A (en) * | 2020-10-30 | 2021-12-17 | 北京中科凡语科技有限公司 | Named entity recognition model establishing method and named entity recognition method |
CN117708338A (en) * | 2024-02-05 | 2024-03-15 | 成都中医药大学 | Extraction method and model for Chinese electronic medical record entity identification and four-diagnosis classification |
CN117708338B (en) * | 2024-02-05 | 2024-04-26 | 成都中医药大学 | Extraction method and model for Chinese electronic medical record entity identification and four-diagnosis classification |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108549639A (en) | Based on the modified Chinese medicine case name recognition methods of multiple features template and system | |
CN105894088B (en) | Based on deep learning and distributed semantic feature medical information extraction system and method | |
CN111079377B (en) | Method for recognizing named entities of Chinese medical texts | |
CN109670179B (en) | Medical record text named entity identification method based on iterative expansion convolutional neural network | |
Yin et al. | Chinese clinical named entity recognition with radical-level feature and self-attention mechanism | |
CN108628824A (en) | A kind of entity recognition method based on Chinese electronic health record | |
CN108319605A (en) | The structuring processing method and system of medical examination data | |
CN106844351B (en) | Medical institution organization entity identification method and device oriented to multiple data sources | |
CN108509419A (en) | Ancient TCM books document participle and part of speech indexing method and system | |
CN108647203B (en) | Method for calculating text similarity of traditional Chinese medicine disease conditions | |
CN105138829B (en) | A kind of natural language processing method and system of Chinese medical information | |
CN109102899A (en) | Chinese medicine intelligent assistance system and method based on machine learning and big data | |
CN109947901B (en) | Prescription efficacy prediction method based on multilayer perceptron and natural language processing technology | |
CN110032649A (en) | Relation extraction method and device between a kind of entity of TCM Document | |
CN115310446A (en) | Traditional Chinese medicine ancient book named entity identification method and device, electronic equipment and memory | |
CN109215798B (en) | Knowledge base construction method for traditional Chinese medicine ancient languages | |
CN112949308A (en) | Method and system for identifying named entities of Chinese electronic medical record based on functional structure | |
Cuffy et al. | Measuring the quality of patient–physician communication | |
Pahta | Code-switching in Early Modern English medical writing | |
CN107122582A (en) | Towards the diagnosis and treatment class entity recognition method and device of multi-data source | |
CN116092699A (en) | Cancer question-answer interaction method based on pre-training model | |
CN111627561B (en) | Standard symptom extraction method, device, electronic equipment and storage medium | |
Liu et al. | Cross-document attention-based gated fusion network for automated medical licensing exam | |
CN109977406A (en) | A kind of Chinese medicine state of an illness text key word extracting method based on sick position | |
CN110516234A (en) | Chinese medicine text segmenting method, system, equipment and medium based on GRU |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |