CN108549639A - Based on the modified Chinese medicine case name recognition methods of multiple features template and system - Google Patents

Based on the modified Chinese medicine case name recognition methods of multiple features template and system Download PDF

Info

Publication number
CN108549639A
CN108549639A CN201810359240.9A CN201810359240A CN108549639A CN 108549639 A CN108549639 A CN 108549639A CN 201810359240 A CN201810359240 A CN 201810359240A CN 108549639 A CN108549639 A CN 108549639A
Authority
CN
China
Prior art keywords
word
chinese medicine
feature
sentence
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810359240.9A
Other languages
Chinese (zh)
Inventor
袁锋
陈阳
陈守强
赵丽丽
梁科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Management University
Original Assignee
Shandong Management University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Management University filed Critical Shandong Management University
Priority to CN201810359240.9A priority Critical patent/CN108549639A/en
Publication of CN108549639A publication Critical patent/CN108549639A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The invention discloses based on the modified Chinese medicine case name recognition methods of multiple features template and system, including step:Sentence extraction is carried out to Chinese medicine case text;Classify to the sentence of extraction;Word segmentation processing is carried out to every a kind of sentence;Character feature, part of speech feature, left deictic words feature, right deictic words feature and term characteristics mark are carried out successively to each word that participle obtains;Build training corpus;Formulate feature templates;Obtained language material and feature templates are input in conditional random field models, conditional random field models are trained, obtain trained conditional random field models;Language material to be predicted is built to Chinese medicine case to be predicted;It using the identification language material of structure as input, is input in trained conditional random field models, the Chinese medicine four methods of diagnosis, card type and therapy are finally identified in the position for exporting Chinese medicine case class label and character according to the position of Chinese medicine case class label and character.

Description

Based on the modified Chinese medicine case name recognition methods of multiple features template and system
Technical field
The present invention relates to based on the modified Chinese medicine case name recognition methods of multiple features template and system.
Background technology
The committed step of Chinese medicine case text message is can be named entity to the Chinese medicine case text of magnanimity Identification, and the incidence relation between Chinese medicine four methods of diagnosis four methods of diagnosis text message, card type text message and therapy text message is established, With the thought of this scientific appraisal " diagnosis and treatment based on an overall analysis of the illness and the patient's condition ".Name Entity recognition in numerous areas extensive use, such as:Field of finance and economics, Product identification, microblogging text, military text etc..
Name Entity recognition analysis method is broadly divided into two classes:
The first kind is rule-based and dictionary method and the method based on machine learning.Rule-based and dictionary method The shortcomings that be higher to the dependence of dictionary and rule base, the recognition capability of unregistered word can be greatly reduced;
Second class is the method based on machine learning.Method based on machine learning rapidly and efficiently, has preferable transplanting Property, common method includes mainly Hidden Markov Model (HMM), the hidden horse model (MEMM) of maximum entropy, conditional random field models (CRFs).By comparing, CRFs behaving oneself best in terms of the synthesis such as ease for use, stability and accuracy, in output independence Assuming that and being better than HMM algorithms and MEMM algorithms in terms of the label prejudice problem that is difficult to avoid that.
Invention content
In order to solve the deficiencies in the prior art, the present invention provides known based on the modified Chinese medicine case name of multiple features template Other method and system;
As the first aspect of the present invention:
Recognition methods is named based on the modified Chinese medicine case of multiple features template, is included the following steps:
Step (1):Sentence extraction is carried out to Chinese medicine case text;
Step (2):Classify to the sentence of extraction;
Step (3):Word segmentation processing is carried out to every a kind of sentence;
Step (4):Each word obtained to participle carries out character feature, part of speech feature, left deictic words feature, the right side successively Deictic words feature and term characteristics mark;The character feature of all words, part of speech feature, left deictic words feature, right deictic words feature And term characteristics form language material observation sequence, according to Chinese medicine case class label and BIO labelling methods to each word into rower Note, generates the output feature of each word;The output feature composition output characteristic sequence of all words;Language material observation sequence and output Characteristic sequence collectively constitutes language material;
Step (5):Formulate feature templates;
Step (6):The language material that step (4) obtains and the feature templates that step (5) obtains are input to condition random field In model, conditional random field models are trained, obtain trained conditional random field models;
Step (7):Chinese medicine case to be predicted is handled using the same method in step (1)-(4), structure waits for pre- Survey language material;It using the identification language material of structure as input, is input in trained conditional random field models, exports Chinese medicine case class The position of distinguishing label and character, finally, according to the position of Chinese medicine case class label and character identify the Chinese medicine four methods of diagnosis, card type and Therapy.
BIO labelling methods refer to:The first character of B presentation-entity, the non-first character of I presentation-entity, O indicate non- Entity character.
As a further improvement on the present invention, the Chinese medicine case text refers to Traditional Chinese Medicine experts diagnosis and treatment activation record.
It is that separator extracts sentence according to the punctuation mark in Chinese medicine case text, the punctuation mark refers to teasing Number, branch or fullstop.
The sentence that the step (1) is extracted, including Chinese medicine four methods of diagnosis sentence, card type sentence and therapy sentence;
The patient's abnormal symptom and abnormal sign that the Chinese medicine four methods of diagnosis, which refer to doctor, to be obtained by the four methods of diagnosis;Card type refers to doctor The raw symptom confirmed;Therapy refers to the therapy that doctor confirms.
As a further improvement on the present invention, classify to sentence according to qualifier, sentence is classified as to current sentence Son, the sentence denied and possible sentence;
The sentence denied and possible sentence are rejected, current sentence is retained.
The qualifier, including it is current, denying or possible.
The current sentence indicates the current malaise symptoms occurred certainly or disease;
The possible sentence indicates the diagnosis made before issuable symptom or doctor are made a definite diagnosis;
The sentence denied indicates the disease or symptom that do not betide sufferers themselves certainly.
As a further improvement on the present invention, remove meaningless word first, modify to wrong word;Point based on word Word method cuts every a kind of sentence, cuts into single word;The meaningless word, including:Number, unit and punctuate symbol Number.
As a further improvement on the present invention, the character feature refers to each word itself;
The part of speech feature, including:Verb, noun, adjective, adverbial word and preposition;
Left deictic words feature refers to the word for appearing in the name entity left side;If it is left that current word occurs from name entity The word on side, then the left deictic words of current word be characterized as T, otherwise, the left deictic words of current word is characterized as F;
Right deictic words feature refers to the word appeared on the right of name entity;If it is right that current word occurs from name entity The word on side, then the right deictic words of current word be characterized as T, otherwise, the right deictic words of current word is characterized as F;
Term characteristics refer to the word for describing human organ;If current word is human organ, the term characteristics of current word For T, otherwise, the term characteristics of current word are F;
As a further improvement on the present invention, the foundation Chinese medicine case class label and BIO labelling methods are to each word It is labeled:Chinese medicine case class label includes Chinese medicine four methods of diagnosis ZS, card type ZX and therapy ZF;
If current word belongs to the Chinese medicine four methods of diagnosis, the Chinese medicine case class label of current word is ZS;If label is ZS's Current word is first character, then the output of current word is characterized as ZS-B;If label is the right and wrong the of the current word of ZS One character, then the output of current word is characterized as ZS-I;
If current word belongs to card type, the Chinese medicine case class label of current word is ZX;If label is the current of ZX Word is first character, then the output of current word is characterized as ZX-B;If label is the right and wrong first of the current word of ZX Character, then the output of current word is characterized as ZX-I;
If current word belongs to therapy, the Chinese medicine case class label of current word is ZF;If label is the current of ZF Word is first character, then the output of current word is characterized as ZF-B;If label is the right and wrong first of the current word of ZF Character, then the output of current word is characterized as ZF-I;
If current word is not belonging to Chinese medicine four methods of diagnosis ZS, card type ZX and therapy ZF and any one of works as, the mark of current word Label are O.
The step of step (5) is:
All words that participle obtains are lined up into sequence;It is about 5 in window size to each word that participle obtains Contextual feature is extracted in text window [- 2,2], by each character representation at the format of " letter+number ", wherein word is by " W " It indicating, part of speech is indicated by " P ", and left deictic words is indicated by " L ", and right deictic words is indicated by " R ", and TCM-related Terms feature is indicated by " Y ", Based on context 19 public characteristic marks are set,
W_-2 indicates preceding second word;First character before W_-1 is indicated;W_0 indicates current word;W_1 indicates the latter word;
Second word after W_2 is indicated;
P_-2 indicates the part of speech of preceding second word;The part of speech of first character before P_-1 is indicated;P_0 indicates the part of speech of current word;
First character part of speech after P_1 is indicated;Second words after P_2 is indicated;
L_-2 indicates the left instruction of preceding second word;L_-1 indicates the left instruction of preceding first word;
The right instruction of first character after R_1 is indicated;The right instruction of second word after R_2 is indicated;
Y_-2 indicates the term characteristics of preceding second word;The term characteristics of first character before Y_-1 is indicated;
Y_0 indicates the term characteristics of current word;The term characteristics of first character after Y_1 is indicated;
The term characteristics of second word after Y_2 is indicated;
It is identified according to public characteristic, it is as follows to formulate feature templates:
W_-2, W_-1, W_0, W_1, W_2, W_-1/W_0, W_0/W_1, W_-2/W_0, W_0/W_2, P_-2, P_-1, P_ 0, P_1, P_2, P_-1/P_0, P_0/P_1, P_-2/P_0, P_0/P_2, L_-2/W_0, L_-1/W_0, W_0/R_1, W_0/R_ 2, Y_-2/W_0, Y_-1/W_0, W_0/Y_0, W_0/Y_1, W_0/Y_2;Wherein, "/" indicates separator.
As a further improvement on the present invention, the step of step (6) are:
Step (6.1):The language material that step (4) is obtained is as training corpus;
Step (6.2):By the word character representation in training corpus at observation sequence x, output characteristic sequence is expressed as defeated Go out sequences y, input and output combination (x, y) are saved in training sample set;
Step (6.3):With training sample set training condition random field models;
Step (6.4):The conditional probability of prediction combination (x, y) is until convergence, obtains trained conditional random field models.
As the second aspect of the present invention:
Identifying system is named based on the modified Chinese medicine case of multiple features template, including:It memory, processor and is stored in The computer instruction run on memory and on a processor when the computer instruction is run by processor, completes above-mentioned Step described in one method.
As the third aspect of the present invention:
A kind of computer readable storage medium, is stored thereon with computer instruction, and the computer instruction is transported by processor When row, the step described in any of the above-described method is completed.
Compared with prior art, the beneficial effects of the invention are as follows:
Application conditions random field of the present invention proposes a kind of based on the modified Chinese medicine case name entity knowledge of multiple features template Other method proposes character feature, part of speech feature, left and right deictic words feature and term characteristics in conjunction with the characteristics of Chinese medicine case text Mask method, train CRFs models, the identification Chinese medicine four methods of diagnosis, card type and therapy entity to pass through experiment using the data after mark Verification, increase left and right deictic words feature and term characteristics mark after, accuracy rate, recall rate and F estimate had it is larger It improves.By the continuous accumulation of case and more rational parameter setting, and characteristic value, the name are further rationally set Entity recognition method can provide more for structure " the Chinese medicine four methods of diagnosis-card type-therapy " triple correspondence, scientific appraisal diagnosis and treatment Valuable reference and foundation.
Description of the drawings
The accompanying drawings which form a part of this application are used for providing further understanding of the present application, and the application's shows Meaning property embodiment and its explanation do not constitute the improper restriction to the application for explaining the application.
Fig. 1 is the flow chart of the present invention;
Fig. 2 is word frequency figure.
Specific implementation mode
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another It indicates, all technical and scientific terms that the present invention uses have logical with the application person of an ordinary skill in the technical field The identical meanings understood.
It should be noted that term used herein above is merely to describe specific implementation mode, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative It is also intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or combination thereof.
Chinese medicine case is the movable record of Traditional Chinese Medicine experts diagnosis and treatment, and Entity recognition is named to it and is standardized to Chinese medicine case And informatization research is of great significance.For Chinese medicine case text there is the fuzzy and indefinite feature of appellation is stated, originally Invention is based on condition random field, it is proposed that one kind being based on the modified name entity recognition method of multiple features template.First to Chinese medicine Case text carries out sentence extraction and automatic word segmentation, then refers to the language material progress character feature after participle, part of speech feature, left and right Show that word feature and term characteristics are labeled, finally trains CRFs models, the four methods of diagnosis of identification Chinese medicine, card type using the data after mark And therapy entity, " the Chinese medicine four methods of diagnosis-card type-therapy " triple correspondence is built, reference is provided for scientific appraisal diagnosis and treatment And foundation.With in June, 2014 for selecting in December, 2016 Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine angiocarpy outpatient service expert 12000 parts of Chinese medicine cases be data source, pass through the adjustment of the various combination and contextual window size of feature, further promoted The accuracy of identification.Accuracy rate, recall rate and F estimate average value and have respectively reached 90.68%, 90.45% and 90.56%.
For Chinese medicine case text there is the fuzzy and indefinite situation of appellation is stated, the present invention is based on condition randoms , it proposes a kind of based on the modified name entity recognition method of multiple features template.Chinese medicine case is felt concerned about for 12000, is analyzed Its text feature, part of speech and label feature, definition template and training CRFs models, establish text feature and name entity class and Incidence relation between lexeme excavates the pass of Chinese medicine four methods of diagnosis text message, card type text message and therapy text message by extraction Connection relationship explains " diagnosis and treatment based on an overall analysis of the illness and the patient's condition " principle, and scientific basis is provided for experience succession and knowledge acquisition.
1 case text examples of table
The punctuation mark occurred using in case text carries out subordinate sentence as separator, includes altogether in 12000 parts of cases 180635 sentences remove after number and nonsense word altogether comprising covering 1267 words.Wherein the most persons of number be " heart ", Totally 37236 times, frequency 310.30%;Followed by " chest " totally 20701 times, frequency 172.51%, word of the frequency less than 1% have 727, wherein 50 words such as " boiling ", " lance ", " group ", " curling up ", " section " have only been used 1 time.Word frequency figure is shown in that Fig. 2, frequency are more than 40% shares 28 words, and word frequency table is shown in Table 2.
2. word frequency table of table
Chinese medicine case text belongs to natural language, but traditional Chinese medicine is the multidisciplinary subject interpenetrated, with other natural sections It learns language to compare, Chinese medicine case text has the characteristics such as ambiguousness, Metaphor, classical Chinese and stationarity again.
(1) ambiguity.Such as deep red tongue, it is construed to the tongue picture developed by red tongue, more deepens a step than red tongue, however it is red It can not quantify with deep red defining standard, be typically expressed as " tongue is red " or " tongue is deep red " or directly broadly be recorded as " deep red tongue ".
(2) Metaphor.Such as " the wooden prosperous gram of soil ", it is also " liver qi invading spleen ", is a tcm syndrome caused by liver qi invading spleen. For another example " harmonizing liver-spleen ", then it is specially the therapy for being directed to " the wooden prosperous gram of soil " and setting up to be, i.e., treats " irritability by soothing liver and strengthening spleen The therapy of criminal's spleen " syndrome.
(3) classical Chinese.If " solution expression is evil " is so that perverse trend is gone out from fleshy exterior using drug, " reinforcing earth to generate metal " is to utilize Method nourishing lung qi of the theory invigoration spleen qi of mutual generation of five phases etc..
(4) stationarity.For tongue nature, tongue color, quality are generally described, is expressed as " tongue ×× or tongue nature ×× ", such as " tongue It is red ", " pink tongue ", " tongue is dark ", " tongue nature is light ", " tongue nature is light fat " etc..For tongue fur, then to describe the color of tongue fur, moisturize, Thickness.
One embodiment as the present invention:
As shown in Figure 1, naming recognition methods based on the modified Chinese medicine case of multiple features template, include the following steps:
Step (1):Sentence extraction is carried out to Chinese medicine case text;
Further, the Chinese medicine case text refers to Traditional Chinese Medicine experts diagnosis and treatment activation record.
It is that separator extracts sentence according to the punctuation mark in Chinese medicine case text, the punctuation mark refers to teasing Number, branch or fullstop.
The sentence that the step (1) is extracted, including Chinese medicine four methods of diagnosis sentence, card type sentence and therapy sentence;
The patient's abnormal symptom and abnormal sign that the Chinese medicine four methods of diagnosis, which refer to doctor, to be obtained by the four methods of diagnosis;Card type refers to doctor The raw symptom confirmed;Therapy refers to the therapy that doctor confirms.
Step (2):Classify to the sentence of extraction;
Further, classify to sentence according to qualifier, sentence be classified as to current sentence, the sentence denied and Possible sentence;The sentence denied and possible sentence are rejected, current sentence is retained.
The qualifier, including it is current, denying or possible.
The current sentence indicates the current malaise symptoms occurred certainly or disease;
The possible sentence indicates the diagnosis made before issuable symptom or doctor are made a definite diagnosis;
The sentence denied indicates the disease or symptom that do not betide sufferers themselves certainly.
Step (3):Word segmentation processing is carried out to every a kind of sentence;
Further, remove meaningless word first, modify to wrong word;As " blood stasis " is revised as " blood stasis ";Base Every a kind of sentence is cut in the segmenting method of word, cuts into single word;The meaningless word, including:Number, unit And punctuation mark.
Step (4):Each word obtained to participle carries out character feature, part of speech feature, left deictic words feature, the right side successively Deictic words feature and term characteristics mark;These features have salient feature, automatic marking easy to implement.The word of all words It accords with feature, part of speech feature, left deictic words feature, right deictic words feature and term characteristics and forms language material observation sequence, according in Doctor's case class label and BIO labelling methods are labeled each word, generate the output feature of each word;All words it is defeated Go out feature composition output characteristic sequence;Language material observation sequence and output characteristic sequence collectively constitute language material;Entity is the Chinese medicine four methods of diagnosis (ZS), card type (ZX) and therapy (ZF);
As a further improvement on the present invention, the character feature refers to each word itself;
The part of speech feature, including:Verb, noun, adjective, adverbial word and preposition;
Left deictic words feature refers to the word for appearing in the name entity left side;If it is left that current word occurs from name entity The word on side, then the left deictic words of current word be characterized as T, otherwise, the left deictic words of current word is characterized as F;
Right deictic words feature refers to the word appeared on the right of name entity;If it is right that current word occurs from name entity The word on side, then the right deictic words of current word be characterized as T, otherwise, the right deictic words of current word is characterized as F;
Term characteristics refer to the word for describing human organ;If current word is human organ, the term characteristics of current word For T, otherwise, the term characteristics of current word are F;
Further, described that each word is labeled according to Chinese medicine case class label and BIO labelling methods:Chinese medicine is cured Case class label includes Chinese medicine four methods of diagnosis ZS, card type ZX and therapy ZF;
If current word belongs to the Chinese medicine four methods of diagnosis, the Chinese medicine case class label of current word is ZS;If label is ZS's Current word is first character, then the output of current word is characterized as ZS-B;If label is the right and wrong the of the current word of ZS One character, then the output of current word is characterized as ZS-I;
If current word belongs to card type, the Chinese medicine case class label of current word is ZX;If label is the current of ZX Word is first character, then the output of current word is characterized as ZX-B;If label is the right and wrong first of the current word of ZX Character, then the output of current word is characterized as ZX-I;
If current word belongs to therapy, the Chinese medicine case class label of current word is ZF;If label is the current of ZF Word is first character, then the output of current word is characterized as ZF-B;If label is the right and wrong first of the current word of ZF Character, then the output of current word is characterized as ZF-I;
If current word is not belonging to Chinese medicine four methods of diagnosis ZS, card type ZX and therapy ZF and any one of works as, the mark of current word Label are O.
(1) character feature (W)
Word segmentation processing based on word is carried out to case text, such as:" cough runny nose " is divided into " cough/coughing/flows/tears/".
(2) part of speech feature (P)
Part of speech is divided into verb, noun, adjective, adverbial word and preposition etc..Original case language material example is as follows:Experienced before half a month Chill, runny nose of coughing, phlegm is white and sticks, then oedema, by instep, spread to waist abdomen.Uncomfortable in chest, abdominal distension receives difference.Oliguria, it is complete Indigested food.Recently aggravation, perspiration, chilly, limb be cold, palpitaition, it is out of breath, cannot lie down.Result after part-of-speech tagging is as follows: Before half/n months/n/adv senses/v by/v wind/n it is cold/n coughs/v coughs/v streams/v tears/v phlegm/n is white/adj and/con stick/adj after/con and/ Con water/n is swollen/and v opens by/con foot/n the back ofs the body/n/con beginnings/con is climing/v prolongs/v to/v waists/n abdomens/n chests/n is bored/v abdomens/n is swollen/and v receives/v Difference/adj urine/n is few/adj is complete/n paddy/n not /advization/v is close/n day/n diseases/n feelings/n adds/v weight/adv go out/v sweat/fear/tremble with fear/n limbs/ N is cold/heart/n throbs with fear/v gas/n urgency/adv not /adv energy/adv are flat/n is sleeping/v.
(3) left and right deictic words feature (L) and (R)
Chinese medicine name entity often occurs with together with specific word, and certain words for often appearing in the name entity left side are designated as Zuo Zhi circle words, the word for appearing in the right are designated as You Zhi circle words.In terms of the Chinese medicine four methods of diagnosis:Place near Chinese medicine four methods of diagnosis entity Often will appear deictic words has " with ", " further ", " still " etc..In terms of card type:Near card type place often have deictic words " with Cause ", " gesture " etc..In terms of therapy:Often there are deictic words " giving ", " suitable ", " still giving ", " control and give ", " controlling preferably " in place near therapy Deng.
(4) term characteristics (Y)
Chinese medicine case entity includes the word of description human organ, such as:The terms such as " head ", " eye ", " tongue ", " fire ".In Cure the four methods of diagnosis in terms of description human organ and pathologic substances word, such as " head ", " pain ", " eye ", " dry ", " sweat ", " going out ", " urine ", " Huang ";In terms of card type:It is said with the yin-yang and five elements that " gold, wood, water, fire, soil " is representative and " wind, cold, wet, dry, fiery " is to patient Mechanistic description, such as " spleen ", " void ", " wet ", " Sheng ", " heart ", " the moon ", " no ", " foot " etc..In terms of therapy:Usually 4 word Or 8 word patterns, such as " soothing liver-qi stagnation ", " supplementing qi and nourishing yin, promoting blood circulation and removing blood stasis " etc..
By participle and feature automatic marking, language material observation sequence and output characteristic sequence are generated.The language material is indicated with " T " Meet mark feature, " F " indicates that the language material does not meet mark feature and respectively represents the Chinese medicine four methods of diagnosis, card type with ZS, ZX and ZF and control Method entity, as shown in the table.
3 Chinese medicine case class label table of table
It is marked using " BIO " method, first character, the non-first character of " B ", " I " and " O " difference presentation-entity With non-physical character, after upper example class label mark as shown in table 4.
Table 4 " BIO " class label marks
Step (5):Formulate feature templates;
Further, the step of step (5) are:
All words that participle obtains are lined up into sequence;It is about 5 in window size to each word that participle obtains Contextual feature is extracted in text window [- 2,2], by each character representation at the format of " letter+number ", wherein word is by " W " It indicating, part of speech is indicated by " P ", and left deictic words is indicated by " L ", and right deictic words is indicated by " R ", and TCM-related Terms feature is indicated by " Y ", Based on context 19 public characteristic marks are set,
W_-2 indicates preceding second word;First character before W_-1 is indicated;W_0 indicates current word;W_1 indicates the latter word;
Second word after W_2 is indicated;
P_-2 indicates the part of speech of preceding second word;The part of speech of first character before P_-1 is indicated;P_0 indicates the part of speech of current word;
First character part of speech after P_1 is indicated;Second words after P_2 is indicated;
L_-2 indicates the left instruction of preceding second word;L_-1 indicates the left instruction of preceding first word;
The right instruction of first character after R_1 is indicated;The right instruction of second word after R_2 is indicated;
Y_-2 indicates the term characteristics of preceding second word;The term characteristics of first character before Y_-1 is indicated;
Y_0 indicates the term characteristics of current word;The term characteristics of first character after Y_1 is indicated;
The term characteristics of second word after Y_2 is indicated;
It is identified according to public characteristic, it is as follows to formulate feature templates:
W_-2, W_-1, W_0, W_1, W_2, W_-1/W_0, W_0/W_1, W_-2/W_0, W_0/W_2, P_-2, P_-1, P_ 0, P_1, P_2, P_-1/P_0, P_0/P_1, P_-2/P_0, P_0/P_2, L_-2/W_0, L_-1/W_0, W_0/R_1, W_0/R_ 2, Y_-2/W_0, Y_-1/W_0, W_0/Y_0, W_0/Y_1, W_0/Y_2;Wherein, "/" indicates separator.
Step (6):The language material that step (4) obtains and the feature templates that step (5) obtains are input to condition random field In model, conditional random field models are trained, obtain trained conditional random field models;
Further, the step of step (6) are:
Step (6.1):The language material that step (4) is obtained is as training corpus;
Step (6.2):By the word character representation in training corpus at observation sequence x, output characteristic sequence is expressed as defeated Go out sequences y, input and output combination (x, y) are saved in training sample set;
Step (6.3):With training sample set training condition random field models;
Condition random field is defined as follows:It is marked by participle, data cleansing and feature, obtains text input sequence x (x= (x1,x2,...,xn)), model parameter is obtained by training, predicts the conditional probability of the corpus labeling needed combination y.
Assuming that input variable is x, when output variable is y, conditional probability P (y | x) it is defined as following form:
Wherein λkFor weight, tkAnd slIt is characterized function, Z (x) is normalization coefficient.
Step (6.4):The conditional probability of prediction combination (x, y) is until convergence, obtains trained conditional random field models.
Step (7):Chinese medicine case to be predicted is handled using the same method in step (1)-(4), structure waits for pre- Survey language material;It using the identification language material of structure as input, is input in trained conditional random field models, exports Chinese medicine case class The position of distinguishing label and character, finally, according to the position of Chinese medicine case class label and character identify the Chinese medicine four methods of diagnosis, card type and Therapy.
BIO labelling methods refer to:The first character of B presentation-entity, the non-first character of I presentation-entity, O indicate non- Entity character.
Second embodiment as the present invention:
Identifying system is named based on the modified Chinese medicine case of multiple features template, including:It memory, processor and is stored in The computer instruction run on memory and on a processor when the computer instruction is run by processor, completes above-mentioned Step described in one method.
Third embodiment as the present invention:
A kind of computer readable storage medium, is stored thereon with computer instruction, and the computer instruction is transported by processor When row, the step described in any of the above-described method is completed.
1 experiment and its analysis
1.1 evaluation criterion
The index that evaluation information extracts has:Accuracy rate (P), recall rate (R) and F- estimate (F), are defined as follows:
1.2 experimental designs and verification
(1) signature identification
The present invention extracts contextual feature in the contextual window [- 2,2] that window size is 5, and feature space is referred to as Each group of feature templates, are expressed as the format of " letter+number " by " 5 word window ", and wherein word indicates that part of speech is by " P " table by " W " Show, left and right instruction conjunction is indicated that TCM-related Terms feature is indicated by " Y " by " L " and " R ", and based on context setting 19 is public Signature identification, table 5 are public characteristic mark and meaning.
5 signature identification of table and meaning
Serial number Mark Meaning Serial number Mark Meaning
1 W_-2 Preceding second word 2 W_-1 Preceding first character
3 W_0 Current word 4 W_1 First character afterwards
5 W_2 Second word afterwards 6 P_-2 Preceding second words
7 P_-1 Preceding first character part of speech 8 P_0 Current words
9 P_1 First character part of speech afterwards 10 P_2 Second words afterwards
11 L_-2 The left instruction of preceding second word 12 L_-1 The left instruction of preceding first word
13 R_1 The right instruction of first character afterwards 14 R_2 The right instruction of second word afterwards
15 Y_-2 The term characteristics of preceding second word 16 Y_-1 The term characteristics of preceding first character
17 Y_0 The term characteristics of current word 18 Y_1 The term characteristics of first character afterwards
19 Y_2 The term characteristics of second word afterwards
(2) experimental design
Tmpt_1, Tmpt_2, Tmpt_3 and Tmpt_4 is used to complete three groups of contrast experiments, test feature selection and window respectively For size to the difference of experimental result, template definition is as shown in table 6, and experimental design is as shown in table 7.
6 template of table
7 experimental design of table
4.3 interpretation of result
(1) one analysis of experiment
It is tested, is given in table 8 when contextual window is respectively set as 3 and 5 using Tmpt_1, Tmpt_2, it is right The influence of experimental result.
The variation of 8 window of table influences result
Item name P (%) R (%) F (%)
The Chinese medicine four methods of diagnosis +0.45 +0.47 +0.46
Card type +0.04 +0.38 +0.14
Therapy +1.83 +1.12 +1.28
The name entity effect of the Chinese medicine four methods of diagnosis, card type and therapy is different, since the average character length of 3 class entities is:It hopes News asks that diagnosis is 3.17 characters, and card type is 2.21 characters, and therapy is 4.78 characters.And in terms of the raising situation of effect: The F values of the Chinese medicine four methods of diagnosis increase by 0.46%, and the F values of card type increase 0.14%, and therapy increases 1.28%.It is found through experiment real When body length and the close contextual window length of selection, effect is preferable.
(2) two analysis of experiment
Contrast experiment is carried out as experiment mould using Tmpt_2, Tmpt_3, after increasing left and right mark, recognition effect obviously changes Become, the effect promoting of therapy is the most apparent, as a result influences as shown in the table.
Influence of the 9 feature selecting classification logotype of table to result
Item name P (%) R (%) F (%)
The Chinese medicine four methods of diagnosis +7.17 +6.23 +0.19
Card type +5.37 +5.48 +0.42
Therapy +5.86 +4.76 +0.84
(3) three analysis of experiment
Experiment three increases an experimental group newly, and template uses Tmpt_4, compared with Tmpt_3, compare increase term characteristics The influence to naming Entity recognition is identified, experimental result is as shown in table 10.
10 optimal identification result of table
It can be obtained by the F values of all kinds of name entities in contrast table 10, best template is Tmpt_4, and accuracy rate is called together The rate of returning and F estimate average value and respectively reach:90.68%, 90.45%, 90.56%, recognition performance is enhanced. In the accuracy rate for improving name Entity recognition, abundant feature set has served critically important.For some special circumstances, also need It is modified by dictionary and rule.
(4) compared with existing method
By consulting literatures, Feng Lizhi proposes that the mixing based on Bootstrapping is known for tcm clinical practice case history language material Method, F values reach 87%;Yuan Yuhu carries out symptom terms name entity extraction using CRFs models and tests, in open test The optimal F values of evaluation result reach 87%;The present invention higher than the two, reaches the average F values of the name Entity recognition of case 90.51%.
Application conditions random field of the present invention proposes a kind of based on the modified Chinese medicine case name entity knowledge of multiple features template Other method proposes character feature, part of speech feature, left and right deictic words feature and term characteristics in conjunction with the characteristics of Chinese medicine case text Mask method, train CRFs models, the identification Chinese medicine four methods of diagnosis, card type and therapy entity to pass through experiment using the data after mark Verification, increase left and right deictic words feature and term characteristics mark after, accuracy rate, recall rate and F estimate had it is larger It improves.By the continuous accumulation of case and more rational parameter setting, and characteristic value, the name are further rationally set Entity recognition method can provide more for structure " the Chinese medicine four methods of diagnosis-card type-therapy " triple correspondence, scientific appraisal diagnosis and treatment Valuable reference and foundation.
The foregoing is merely the preferred embodiments of the application, are not intended to limit this application, for the skill of this field For art personnel, the application can have various modifications and variations.Within the spirit and principles of this application, any made by repair Change, equivalent replacement, improvement etc., should be included within the protection domain of the application.

Claims (10)

1. naming recognition methods based on the modified Chinese medicine case of multiple features template, characterized in that include the following steps:
Step (1):Sentence extraction is carried out to Chinese medicine case text;
Step (2):Classify to the sentence of extraction;
Step (3):Word segmentation processing is carried out to every a kind of sentence;
Step (4):Each word obtained to participle carries out character feature, part of speech feature, left deictic words feature, right instruction successively Word feature and term characteristics mark;The character features of all words, part of speech feature, left deictic words feature, right deictic words feature and Term characteristics form language material observation sequence, are labeled to each word according to Chinese medicine case class label and BIO labelling methods, raw At the output feature of each word;The output feature composition output characteristic sequence of all words;Language material observation sequence and output feature Sequence collectively constitutes language material;
Step (5):Formulate feature templates;
Step (6):The language material that step (4) obtains and the feature templates that step (5) obtains are input to conditional random field models In, conditional random field models are trained, trained conditional random field models are obtained;
Step (7):Chinese medicine case to be predicted is handled using the same method in step (1)-(4), builds language to be predicted Material;It using the identification language material of structure as input, is input in trained conditional random field models, exports Chinese medicine case classification mark The position of label and character is finally identified the Chinese medicine four methods of diagnosis, card type according to the position of Chinese medicine case class label and character and is controlled Method.
2. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that
Be that separator extracts sentence according to the punctuation mark in Chinese medicine case text, the punctuation mark refer to comma, Branch or fullstop;The sentence that the step (1) is extracted, including Chinese medicine four methods of diagnosis sentence, card type sentence and therapy sentence;The Chinese medicine four methods of diagnosis The patient's abnormal symptom and abnormal sign that refer to doctor obtained by the four methods of diagnosis;Card type refers to the symptom that doctor confirms;It controls Method refers to the therapy that doctor confirms.
3. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that foundation Qualifier classifies to sentence, and sentence is classified as to current sentence, the sentence and possible sentence denied;What rejecting was denied Sentence and possible sentence, retain current sentence;The qualifier, including it is current, denying or possible;It is described current Sentence, indicate the current malaise symptoms occurred certainly or disease;The possible sentence, indicate issuable symptom or The diagnosis that doctor makes before making a definite diagnosis;The sentence denied indicates the disease or symptom that do not betide sufferers themselves certainly.
4. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that first Remove meaningless word, modifies to wrong word;Every a kind of sentence is cut based on the segmenting method of word, cuts into list A word;The meaningless word, including:Number, unit and punctuation mark.
5. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that described Character feature refers to each word itself;
The part of speech feature, including:Verb, noun, adjective, adverbial word and preposition;
Left deictic words feature refers to the word for appearing in the name entity left side;If current word occurs from the name entity left side Word, then the left deictic words of current word be characterized as T, otherwise, the left deictic words of current word is characterized as F;
Right deictic words feature refers to the word appeared on the right of name entity;If current word occurs from the right of name entity Word, then the right deictic words of current word be characterized as T, otherwise, the right deictic words of current word is characterized as F;
Term characteristics refer to the word for describing human organ;If current word is human organ, the term characteristics of current word are T, Otherwise, the term characteristics of current word are F.
6. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that described Each word is labeled according to Chinese medicine case class label and BIO labelling methods:Chinese medicine case class label includes Chinese medicine four Examine ZS, card type ZX and therapy ZF;
If current word belongs to the Chinese medicine four methods of diagnosis, the Chinese medicine case class label of current word is ZS;If label is the current of ZS Word is first character, then the output of current word is characterized as ZS-B;If label is the right and wrong first of the current word of ZS Character, then the output of current word is characterized as ZS-I;
If current word belongs to card type, the Chinese medicine case class label of current word is ZX;If label is the current word of ZX It is first character, then the output of current word is characterized as ZX-B;If label is that the current word of ZX is non-first character, The output of so current word is characterized as ZX-I;
If current word belongs to therapy, the Chinese medicine case class label of current word is ZF;If label is the current word of ZF It is first character, then the output of current word is characterized as ZF-B;If label is that the current word of ZF is non-first character, The output of so current word is characterized as ZF-I;
If current word is not belonging to Chinese medicine four methods of diagnosis ZS, card type ZX and therapy ZF and any one of works as, the label of current word is O。
7. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that described The step of step (5) is:
All words that participle obtains are lined up into sequence;To each word that participle obtains, the context window for being 5 in window size Contextual feature is extracted in mouthful [- 2,2], by each character representation at the format of " letter+number ", wherein word is indicated by " W ", Part of speech is indicated that left deictic words is indicated by " L " by " P ", and right deictic words is indicated by " R ", and TCM-related Terms feature is indicated by " Y ", according to Context sets 19 public characteristic marks,
W_-2 indicates preceding second word;First character before W_-1 is indicated;W_0 indicates current word;W_1 indicates the latter word;
Second word after W_2 is indicated;
P_-2 indicates the part of speech of preceding second word;The part of speech of first character before P_-1 is indicated;P_0 indicates the part of speech of current word;
First character part of speech after P_1 is indicated;Second words after P_2 is indicated;
L_-2 indicates the left instruction of preceding second word;L_-1 indicates the left instruction of preceding first word;
The right instruction of first character after R_1 is indicated;The right instruction of second word after R_2 is indicated;
Y_-2 indicates the term characteristics of preceding second word;The term characteristics of first character before Y_-1 is indicated;
Y_0 indicates the term characteristics of current word;The term characteristics of first character after Y_1 is indicated;
The term characteristics of second word after Y_2 is indicated;
It is identified according to public characteristic, it is as follows to formulate feature templates:
W_-2, W_-1, W_0, W_1, W_2, W_-1/W_0, W_0/W_1, W_-2/W_0, W_0/W_2, P_-2, P_-1, P_0, P_ 1, P_2, P_-1/P_0, P_0/P_1, P_-2/P_0, P_0/P_2, L_-2/W_0, L_-1/W_0, W_0/R_1, W_0/R_2, Y_- 2/W_0, Y_-1/W_0, W_0/Y_0, W_0/Y_1, W_0/Y_2;Wherein, "/" indicates separator.
8. naming recognition methods based on the modified Chinese medicine case of multiple features template as described in claim 1, characterized in that step (6) the step of is:
Step (6.1):The language material that step (4) is obtained is as training corpus;
Step (6.2):By the word character representation in training corpus at observation sequence x, output characteristic sequence is expressed as to export sequence Y is arranged, input and output combination (x, y) are saved in training sample set;
Step (6.3):With training sample set training condition random field models;
Step (6.4):The conditional probability of prediction combination (x, y) is until convergence, obtains trained conditional random field models.
9. naming identifying system based on the modified Chinese medicine case of multiple features template, characterized in that including:Memory, processor with And the computer instruction run on a memory and on a processor is stored, it is complete when the computer instruction is run by processor The step of at the claims 1-8 either method.
10. a kind of computer readable storage medium, characterized in that be stored thereon with computer instruction, the computer instruction quilt Processor run when, complete the claims 1-8 either method the step of.
CN201810359240.9A 2018-04-20 2018-04-20 Based on the modified Chinese medicine case name recognition methods of multiple features template and system Pending CN108549639A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810359240.9A CN108549639A (en) 2018-04-20 2018-04-20 Based on the modified Chinese medicine case name recognition methods of multiple features template and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810359240.9A CN108549639A (en) 2018-04-20 2018-04-20 Based on the modified Chinese medicine case name recognition methods of multiple features template and system

Publications (1)

Publication Number Publication Date
CN108549639A true CN108549639A (en) 2018-09-18

Family

ID=63511841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810359240.9A Pending CN108549639A (en) 2018-04-20 2018-04-20 Based on the modified Chinese medicine case name recognition methods of multiple features template and system

Country Status (1)

Country Link
CN (1) CN108549639A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109166608A (en) * 2018-09-17 2019-01-08 新华三大数据技术有限公司 Electronic health record information extracting method, device and equipment
CN109215798A (en) * 2018-10-09 2019-01-15 北京科技大学 A kind of construction of knowledge base method towards Chinese medicine ancient Chinese prose
CN109635123A (en) * 2018-11-28 2019-04-16 北京工业大学 A kind of Chinese medicine text concept recognition methods of increment type
CN110175246A (en) * 2019-04-09 2019-08-27 山东科技大学 A method of extracting notional word from video caption
CN110502750A (en) * 2019-08-06 2019-11-26 山东师范大学 Disambiguation method, system, equipment and medium during Chinese medicine text participle
CN110516241A (en) * 2019-08-26 2019-11-29 北京三快在线科技有限公司 Geographical address analytic method, device, readable storage medium storing program for executing and electronic equipment
CN110879831A (en) * 2019-10-12 2020-03-13 杭州师范大学 Chinese medicine sentence word segmentation method based on entity recognition technology
CN111079377A (en) * 2019-12-03 2020-04-28 哈尔滨工程大学 Method for recognizing named entities oriented to Chinese medical texts
CN111259626A (en) * 2020-01-16 2020-06-09 上海国民集团健康科技有限公司 Traditional Chinese medicine entity recognition algorithm
CN111274391A (en) * 2020-01-15 2020-06-12 北京百度网讯科技有限公司 SPO extraction method and device, electronic equipment and storage medium
CN111832306A (en) * 2020-07-09 2020-10-27 昆明理工大学 Image diagnosis report named entity identification method based on multi-feature fusion
CN112017773A (en) * 2020-08-31 2020-12-01 吾征智能技术(北京)有限公司 Disease cognition model construction method based on nightmare and disease cognition system
CN112131862A (en) * 2020-07-20 2020-12-25 中国中医科学院中医药信息研究所 Traditional Chinese medicine medical record data processing method and device and electronic equipment
CN112380856A (en) * 2020-10-20 2021-02-19 湖南大学 Method, system, terminal and readable storage medium for automatically extracting component names in patent text
WO2021146831A1 (en) * 2020-01-20 2021-07-29 京东方科技集团股份有限公司 Entity recognition method and apparatus, dictionary creation method, device, and medium
CN113488035A (en) * 2020-04-28 2021-10-08 海信集团有限公司 Voice information processing method, device, equipment and medium
CN113807097A (en) * 2020-10-30 2021-12-17 北京中科凡语科技有限公司 Named entity recognition model establishing method and named entity recognition method
CN117708338A (en) * 2024-02-05 2024-03-15 成都中医药大学 Extraction method and model for Chinese electronic medical record entity identification and four-diagnosis classification
CN117708338B (en) * 2024-02-05 2024-04-26 成都中医药大学 Extraction method and model for Chinese electronic medical record entity identification and four-diagnosis classification

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
袁锋: "中医医案文本挖掘的若干关键技术研究", 《中国博士学位论文全文数据库-信息科技辑》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109166608A (en) * 2018-09-17 2019-01-08 新华三大数据技术有限公司 Electronic health record information extracting method, device and equipment
CN109215798A (en) * 2018-10-09 2019-01-15 北京科技大学 A kind of construction of knowledge base method towards Chinese medicine ancient Chinese prose
CN109635123A (en) * 2018-11-28 2019-04-16 北京工业大学 A kind of Chinese medicine text concept recognition methods of increment type
CN110175246A (en) * 2019-04-09 2019-08-27 山东科技大学 A method of extracting notional word from video caption
CN110502750B (en) * 2019-08-06 2023-08-11 山东师范大学 Disambiguation method, disambiguation system, disambiguation equipment and disambiguation medium in Chinese medicine text word segmentation process
CN110502750A (en) * 2019-08-06 2019-11-26 山东师范大学 Disambiguation method, system, equipment and medium during Chinese medicine text participle
CN110516241A (en) * 2019-08-26 2019-11-29 北京三快在线科技有限公司 Geographical address analytic method, device, readable storage medium storing program for executing and electronic equipment
CN110879831A (en) * 2019-10-12 2020-03-13 杭州师范大学 Chinese medicine sentence word segmentation method based on entity recognition technology
CN111079377A (en) * 2019-12-03 2020-04-28 哈尔滨工程大学 Method for recognizing named entities oriented to Chinese medical texts
CN111274391A (en) * 2020-01-15 2020-06-12 北京百度网讯科技有限公司 SPO extraction method and device, electronic equipment and storage medium
CN111274391B (en) * 2020-01-15 2023-09-01 北京百度网讯科技有限公司 SPO extraction method and device, electronic equipment and storage medium
CN111259626A (en) * 2020-01-16 2020-06-09 上海国民集团健康科技有限公司 Traditional Chinese medicine entity recognition algorithm
WO2021146831A1 (en) * 2020-01-20 2021-07-29 京东方科技集团股份有限公司 Entity recognition method and apparatus, dictionary creation method, device, and medium
CN113488035A (en) * 2020-04-28 2021-10-08 海信集团有限公司 Voice information processing method, device, equipment and medium
CN111832306A (en) * 2020-07-09 2020-10-27 昆明理工大学 Image diagnosis report named entity identification method based on multi-feature fusion
CN112131862A (en) * 2020-07-20 2020-12-25 中国中医科学院中医药信息研究所 Traditional Chinese medicine medical record data processing method and device and electronic equipment
CN112017773A (en) * 2020-08-31 2020-12-01 吾征智能技术(北京)有限公司 Disease cognition model construction method based on nightmare and disease cognition system
CN112017773B (en) * 2020-08-31 2024-03-26 吾征智能技术(北京)有限公司 Disease cognitive model construction method and disease cognitive system based on nightmare
CN112380856A (en) * 2020-10-20 2021-02-19 湖南大学 Method, system, terminal and readable storage medium for automatically extracting component names in patent text
CN112380856B (en) * 2020-10-20 2023-09-29 湖南大学 Automatic extraction method, system, terminal and readable storage medium for component naming in patent text
CN113807097A (en) * 2020-10-30 2021-12-17 北京中科凡语科技有限公司 Named entity recognition model establishing method and named entity recognition method
CN117708338A (en) * 2024-02-05 2024-03-15 成都中医药大学 Extraction method and model for Chinese electronic medical record entity identification and four-diagnosis classification
CN117708338B (en) * 2024-02-05 2024-04-26 成都中医药大学 Extraction method and model for Chinese electronic medical record entity identification and four-diagnosis classification

Similar Documents

Publication Publication Date Title
CN108549639A (en) Based on the modified Chinese medicine case name recognition methods of multiple features template and system
CN105894088B (en) Based on deep learning and distributed semantic feature medical information extraction system and method
CN111079377B (en) Method for recognizing named entities of Chinese medical texts
CN109670179B (en) Medical record text named entity identification method based on iterative expansion convolutional neural network
Yin et al. Chinese clinical named entity recognition with radical-level feature and self-attention mechanism
CN108628824A (en) A kind of entity recognition method based on Chinese electronic health record
CN108319605A (en) The structuring processing method and system of medical examination data
CN106844351B (en) Medical institution organization entity identification method and device oriented to multiple data sources
CN108509419A (en) Ancient TCM books document participle and part of speech indexing method and system
CN108647203B (en) Method for calculating text similarity of traditional Chinese medicine disease conditions
CN105138829B (en) A kind of natural language processing method and system of Chinese medical information
CN109102899A (en) Chinese medicine intelligent assistance system and method based on machine learning and big data
CN109947901B (en) Prescription efficacy prediction method based on multilayer perceptron and natural language processing technology
CN110032649A (en) Relation extraction method and device between a kind of entity of TCM Document
CN115310446A (en) Traditional Chinese medicine ancient book named entity identification method and device, electronic equipment and memory
CN109215798B (en) Knowledge base construction method for traditional Chinese medicine ancient languages
CN112949308A (en) Method and system for identifying named entities of Chinese electronic medical record based on functional structure
Cuffy et al. Measuring the quality of patient–physician communication
Pahta Code-switching in Early Modern English medical writing
CN107122582A (en) Towards the diagnosis and treatment class entity recognition method and device of multi-data source
CN116092699A (en) Cancer question-answer interaction method based on pre-training model
CN111627561B (en) Standard symptom extraction method, device, electronic equipment and storage medium
Liu et al. Cross-document attention-based gated fusion network for automated medical licensing exam
CN109977406A (en) A kind of Chinese medicine state of an illness text key word extracting method based on sick position
CN110516234A (en) Chinese medicine text segmenting method, system, equipment and medium based on GRU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination