CN109635123A - A kind of Chinese medicine text concept recognition methods of increment type - Google Patents
A kind of Chinese medicine text concept recognition methods of increment type Download PDFInfo
- Publication number
- CN109635123A CN109635123A CN201811436594.5A CN201811436594A CN109635123A CN 109635123 A CN109635123 A CN 109635123A CN 201811436594 A CN201811436594 A CN 201811436594A CN 109635123 A CN109635123 A CN 109635123A
- Authority
- CN
- China
- Prior art keywords
- sample
- feature
- chinese medicine
- confidence level
- mark
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000003814 drug Substances 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 17
- 238000007670 refining Methods 0.000 claims abstract description 3
- 208000024891 symptom Diseases 0.000 claims description 11
- 238000013461 design Methods 0.000 claims description 5
- 230000001174 ascending effect Effects 0.000 claims description 2
- 238000000926 separation method Methods 0.000 claims description 2
- 238000012216 screening Methods 0.000 abstract description 4
- 238000007792 addition Methods 0.000 description 5
- 206010019233 Headaches Diseases 0.000 description 4
- 231100000869 headache Toxicity 0.000 description 4
- 210000001015 abdomen Anatomy 0.000 description 2
- 208000019790 abdominal distention Diseases 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 206010013954 Dysphoria Diseases 0.000 description 1
- 206010028372 Muscular weakness Diseases 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 206010041349 Somnolence Diseases 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000004596 appetite loss Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 208000002173 dizziness Diseases 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 235000021266 loss of appetite Nutrition 0.000 description 1
- 208000019017 loss of appetite Diseases 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000011430 maximum method Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 206010029410 night sweats Diseases 0.000 description 1
- 230000036565 night sweats Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of Chinese medicine text concept recognition methods of increment type, and on the basis of a small amount of artificial mark corpus, mark corpus is added adjacent to the high forecast sample of sample confidence level with it by choosing forecast sample;Choose forecast sample confidence level it is low and its have higher contribution degree forecast sample return expert mark after be added mark corpus, with this come increase mark sample quantity.Subset ceaselessly Optimized model M is obtained using each iteration, finally obtains the model M after refining.The present invention analyzes Chinese medicine text feature and entity structure, is extracted multiple features to improve the accuracy to the identification of Chinese medicine text concept.Using a small amount of subset, subset is added to realize mark sample incremental training by the high and neighbouring sample confidence level of screening confidence level also high forecast sample, reduces human input.By screening confidence level, the low but return expert of the high forecast sample of contribution degree is marked, to reduce scale and the training time of disaggregated model.
Description
Technical field
The invention belongs to natural language processing field more particularly to a kind of Chinese medicine text concept identification methods of increment type.
Background technique
Traditional traditional Chinese medicine has used thousands of years, has formed more complete knowledge hierarchy, is Chinese traditional culture
Important component.In the modern times, prescription and drug data are widely used in clinical medicine.However, thousands of years of Chinese medicine clinical real
It tramples the Cheng Fang for accumulating magnanimity and constantly generates the new side of spreading out in clinical practice application, if looked by hand merely
It askes, arrange and analyzes, careless mistake inevitably occur, if computer technology can be based on, use state-of-the-art machine learning method centering
Doctor's ancient books prescription data set is analyzed, is excavated, and finally obtains effective information in TCM Gynecology prescription, this is for Chinese medical book
Succession and using can play an important role.With the continuous development of natural language processing technique and grinding towards Chinese text
Study carefully the increase of temperature, the Study of recognition of the medical terminology of Chinese is also gradually increased in recent years.However it is directed to the art of traditional Chinese medical science field
The research of language identification is still considerably less.Traditional medical data are the main knowledge resources of traditional Chinese medicine, and be richly stored with clinical warp
Knowledge is tested, these experiences are mostly recorded and propagated in the form of document, carry out Entity recognition research, energy to Chinese medicine medical data
Enough text knowledges for further excavating Chinese medicine make huge contribution for the integration and innovation of tcm knowledge.
In the identification of Chinese medicine text concept, is extracted using the training corpus manually marked and name entity performance best, but
It is to there is the method for supervision excessively to depend on mark corpus, the large-scale artificial labor intensive and meeting of marking is because of expert's subjective consciousness
The artificial many noises of increase of difference.And usually will cause error accumulation to sample classification with unsupervised learning entirely, causing property
It can decline.The increment type Chinese medicine text concept recognition methods of proposition, by putting into a small amount of seed set, repetitive exercise is to obtain
The scale of construction of seed set is gradually developed to certain scale by more seeds, the training pattern M after then training is refined.
Summary of the invention
The deficiency of a large amount of artificial mark corpus is needed to the identification of Chinese medicine text concept for the prior art, the present invention provides one
The Chinese medicine text recognition method of kind increment type.Specifically, being to be predicted on the basis of a small amount of artificial mark corpus by choosing
Sample and its forecast sample addition mark corpus high adjacent to sample confidence level;Choose forecast sample confidence level it is low and its have compared with
Mark corpus is added after returning to expert's mark in the forecast sample of high contribution degree, increases the quantity of mark sample with this.
To achieve the above object, the present invention adopts the following technical scheme that:
A kind of Chinese medicine text concept recognition methods of increment type, comprising the following steps:
Step 1: data prediction is carried out to initial Chinese medicine text data set;
Step 2: multiple features are chosen and CRF template redefines;
Step 3: preparing mark collection, using the training on CRF by mark collection of customized feature templates, obtain initial model
M;
Step 4: choosing forecast sample and its preceding k forecast sample addition subset high adjacent to sample confidence level;
Step 5: choosing forecast sample and it is a adjacent to the minimum K of sample confidence level and it has prediction of higher contribution degree
Subset is added after returning to expert's mark in sample;
Step 6: obtaining subset ceaselessly Optimized model M using each iteration, finally obtain the model M after refining.
The Chinese medicine text concept recognition methods feature of increment type of the present invention further include:
In the step 1, Chinese medicine text information is that ancient Chinese prose form is mostly single-tone word form, and it is more to have both interchangeability of Chinese characters word, institute
To first have to carry out necessary data cleansing, removal " person " " " and some tone etc. be some not to influence what context medicine was expressed
Stop words simultaneously to the words of some interchangeability of Chinese characters words and identification mistake (such as the complex form of Chinese characters is identified as "? ") manually corrected.
In the step 2, for terse, the more omissions of Chinese medicine text style of writing, Yan Wen separation, a word is equivalent to a word
The structure feature of feature and entity each section has chosen 1) part of speech feature, occurs being at most " n "+" a ", " n " curing mainly field
+ " v " structure adds part of speech feature as the identification on the boundary of entity and provides clue.2) physical feeling deictic words feature, this feature
It whether is the relevant word of physical feeling for marking current, this word often occurs in symptom description.3) context is special
Sign, there are correlation, the as feature on the side in CRF model between context, is selected different in the sequence of word composition
Length of window will be combined various features, form new feature.
In the step 2, according to feature selecting design feature template.Using the character modules of fixed format % [row, col]
Plate, wherein row determines the relative position of described word and current word, and col specifically belongs to described in this feature template for determining
Property columns locating in pretreatment corpus.The 0th column indicate that Chinese medicine text itself, the 1st column represent in pretreatment corpus file
Part of speech feature, the 2nd column represent physical feeling feature, and the 3rd column represent contextual feature, the i.e. status indication of Chinese medicine text.To every
Template, including two class forms is arranged in one column input feature vector t (0~3):
T1=num:%x [index, t], (1)
T2=num:%x [index, t]/%x [index+1, t], (2)
Wherein, num is the number of template, and index is the index (0~2) within the scope of window size, before and after T2 is by feature t
Situation is composed.
In the step 3, prepare mark collection, using the training on CRF by mark collection of customized feature templates, obtains just
Beginning model M.According to the selection of feature, each character representation label is designed, according to the requirement format that CRF is inputted, design mark collection.
Collect selection feature templates window size according to designed mark.
In the step 4, un-annotated data is trained by initial model M, obtains initial predicted sample.Choose prediction
Subset is added with its forecast sample adjacent to sample confidence level in sample.
Choosing method is as follows:
One all word of entity E are weighted with the confidence level of the entity, when calculating the confidence level of the entity
It is also contemplated that confidence level of the entity adjacent to sample, chooses the high entity forecast sample addition mark sample set of confidence level and carries out weight
Training.Assuming that being HtT obtained classifier of self study iteration, then taking turns in t for certain entity E in unmarked sample pooli
Are as follows:
Wherein, Ht(xi, tag) and each word that entity is included is represented in t confidence level provided of classifier iteration, Ht
(xn, tag) and represent EiNeighbouring sample is in t confidence level provided of classifier iteration, θiThe power of each label in presentation-entity concept
Weight, θiBy entity EiIn the word number that contains determine.By confidence (Ht, Ei) ascending order arrangement sees that K addition marks before choosing
Collection.
In the step 5, each iteration of algorithm, the sample in unmarked sample pool can be fewer and fewer, remaining sample
It is that classification confidence is lesser, this kind of sample would generally be dropped.These classification confidences are lower and adjacent to sample confidence level phase
The biggish sample of difference, and if being divided into same label, mis-marked probability increase adjacent to sample;And if neighbouring sample
It is divided into different labels, then a possibility that it is in decision boundary is very big, and two kinds of situations can generate maximum contribution degree to classifier.
It obtains the big label of these contribution degrees and returns to expert's mark.Choosing method is as follows:
It is small and differ biggish forecast sample with neighbouring sample confidence level and return to choose classification confidence in unmarked pond
Expert's mark.
Compared with the prior art, the beneficial effects of the invention are that:
1, the present invention analyzes Chinese medicine text feature and entity structure, and it is general to Chinese medicine text to improve to be extracted multiple features
Read the accuracy of identification.
2, the present invention uses a small amount of subset, passes through the high and neighbouring sample confidence level of screening confidence level also high forecast sample
Subset is added to realize mark sample incremental training, reduces human input.
3, the present invention is low by screening confidence level but the return expert of the high forecast sample of contribution degree marks, thus reduction point
The scale of class model and training time.
Detailed description of the invention
In conjunction with attached drawing, from the following detailed description to the embodiment of the present invention, it is better understood with the present invention, it is similar in attached drawing
Label indicate similar part, in which:
Fig. 1 is the flow chart of increment type concept identification method of the present invention;
Specific embodiment
The specific implementation of increment type Chinese medicine text recognition method of the present invention, which uses, is based on conditional random field models
(CRF) fundamental classifier is done, does model training data with 11000 Chinese medicine texts, wherein 4000 are labeled data collection,
7000 are unlabeled data collection.CRF training is carried out with training data, the feature of selection and customized template, and to not
Labeled data gives a forecast, and obtains initial predicted sample set, determines that forecast sample is that mark collection is added or returns by confidence level
Mark collection is added after artificial mark.
In the present embodiment:
Step 1 carries out data cleansing to Chinese medicine text, and similar text " man, married woman's hectic fever due to yin labor gas, flesh? win thin, four limbs
Powerless, the red face of cheek is yellow, dysphoria in chestpalms-soles, and the sleepy heart is alarmed and panicky or there are block, loss of appetite person in more night sweat, abdomen association." in traditional font " body " known
Not at "? ", " body " is processed into saying hello;Ending auxiliary words of mood " person " will also be removed.
Step 2 is taken based on the recognition strategy of word in the Entity recognition stage, segments tool and is added certainly using stammerer participle
The form for defining dictionary carries entry using stammerer Words partition system for the part of speech feature of selection and carries out part-of-speech tagging,
Such as " weakness of limbs " is labeled as " four limbs/n inability/n " by system, wherein n representation noun.Physical feeling feature is used to indicate currently
Whether word is physical feeling (Y/N), the display for always simultaneous phenomenon entity occur of physical feeling.On the basis of identifying classification
It is marked using " BIESO " method, wherein B (beginning) indicates that the initial character of term, I (intermediate) indicate term
Intermediate and E (end) indicates that termination character, S indicate single term character.O (other) indicates other, non-term character.Descriptive word
Label selects { B-symptom, I-symptom, E-symptom, S-symptom, B-pattern, I-pattern, E-
Pattern, S-pattern, O }, symptom, first word of card type, medium term, ending word and single term character and non-respectively
Term word.Then it is marked with the matching realized based on the Forward Maximum Method algorithm of dictionary to word segmentation result.
Step 4: Traditional Chinese Medical Concepts identification prediction is carried out to unlabeled set with initial model M, i.e., to existing observation sequence
Xi, i=1,2 ... n finds an optimal flag sequence Y from seven state tagsi, so that conditional probability P (Y | X) it is maximum,
I.e. Max (P (Y | X))=P (Yi|Xi, M), then P (Yi|Xi, M) i.e. as model M to sequence Xi, the confidence level of prediction.As P (Yi|
XI,M when) bigger, illustrate model M for observation sequence XI,Prediction result YiIt is more sure, as P (Yi|XI,M) smaller
When, illustrate model M for observation sequence Xi, prediction result YiMore it is not sure.It is real that CRF training selects CRF++ kit
Existing, CRF++ tool can be in the confidence level for showing its prediction behind each forecast sample.According to formula
Calculate the confidence level of prediction entity.For example, the fractional prediction result of first time initial training M is as follows:
Word | State tag | Confidence level |
, | O | 0.999867 |
Tripe abdomen | B-symptom | 0.959049 |
Expansion | E-symptom | 0.958962 |
, | O | 0.999978 |
Headache | S-symptom | 0,804793 |
It is dizzy | S-symptom | 0.431597 |
Predict the confidence level of entity tripe abdominal distention are as follows: tripe abdominal distention entity contains 2, so
Predict the confidence level of entity headache are as follows: headache entity contains 1 word, so θ=1,
confidence(H1, headache) and=0.999978+0.431597+1 × 0,804793=2.236368
The confidence level of prediction entity is ranked up, preceding 50 additions mark collection is chosen.
Step 5: sorting to all entities in step 4, chooses 100 minimum samples of confidence level
This, is according to formula:
The high expert that returns to of 50 contribution degrees is chosen to mark.
It is understood that be the example in order to illustrate the principle of the present invention and exploitativeness above, the present invention not office
It is limited to this.It for those skilled in the art, without departing from the spirit and substance in the present invention, can be with
All variations and modifications are made, these variations and modifications are also considered as protection scope of the present invention.
Claims (7)
1. a kind of Chinese medicine text concept recognition methods of increment type, it is characterised in that: include the following steps,
Step 1: data prediction is carried out to initial Chinese medicine text data set;
Step 2: multiple features are chosen and CRF template redefines;
Step 3: preparing mark collection, using the training on CRF by mark collection of customized feature templates, obtain initial model M;
Step 4: choosing forecast sample and its preceding k forecast sample addition subset high adjacent to sample confidence level;
Step 5: choosing forecast sample and it is a adjacent to the minimum K of sample confidence level and it has forecast sample of higher contribution degree
Subset is added after returning to expert's mark;
Step 6: obtaining subset ceaselessly Optimized model M using each iteration, finally obtain the model M after refining.
2. a kind of Chinese medicine text concept recognition methods of increment type according to claim 1, it is characterised in that: the step
In 1, Chinese medicine text information is that ancient Chinese prose form is mostly single-tone word form, and it is more to have both interchangeability of Chinese characters word, so first having to carry out necessary
Data cleansing, removal " person " " " and some tone do not influence context medicine expression stop words and meanwhile to some interchangeability of Chinese characters words
It is manually corrected with the words of identification mistake.
3. a kind of Chinese medicine text concept recognition methods of increment type according to claim 1, it is characterised in that: the step
It in 2, composes a piece of writing terse, more omissions for Chinese medicine text, the characteristics of Yan Wen separation a, word is equivalent to a word and entity each section
Structure feature, have chosen 1) part of speech feature, occur most being " n "+" a ", " n "+" v " structure curing mainly field, add word
Property feature provides clue for the identification on the boundary of entity;2) physical feeling deictic words feature, this feature are currently for marking
No is the relevant word of physical feeling, and this word often occurs in symptom description;3) contextual feature, in word composition
In sequence, there are correlation, the as feature on the side in CRF model between context, select different length of window will be to each
Kind feature is combined, and forms new feature.
4. a kind of Chinese medicine text concept recognition methods of increment type according to claim 1, it is characterised in that: the step
In 2, according to feature selecting design feature template;Using the feature templates of fixed format % [row, col], wherein row determines institute
The relative position of descriptor and current word, col is for determining specific object described in this feature template in pretreatment corpus
Locating columns;The 0th column indicate that Chinese medicine text itself, the 1st column represent part of speech feature, the 2nd column generation in pretreatment corpus file
Table physical feeling feature, the 3rd column represent contextual feature, the i.e. status indication of Chinese medicine text;To each column input feature vector t (0~
3) template, including two class forms are set:
T1=num:%x [index, t], (1)
T2=num:%x [index, t]/%x [index+1, t], (2)
Wherein, num is the number of template, and index is the index (0~2) within the scope of window size, and T2 is by feature t front-rear position
Situation is composed.
5. a kind of Chinese medicine text concept recognition methods of increment type according to claim 1, it is characterised in that: the step
In 3, prepare mark collection, using the training on CRF by mark collection of customized feature templates, obtains initial model M;According to feature
Selection, design each character representation label, according to CRF input requirement format, design mark collection;According to designed mark
Collection selection feature templates window size.
6. a kind of Chinese medicine text concept recognition methods of increment type according to claim 1, it is characterised in that: the step
In 4, un-annotated data is trained by initial model M, obtains initial predicted sample;Forecast sample is chosen with it adjacent to sample
Subset is added in the forecast sample of confidence level;
Choosing method is as follows:
One all word of entity E are weighted with the confidence level of the entity, is also wanted when calculating the confidence level of the entity
Consider that the entity adjacent to the confidence level of sample, is chosen the high entity forecast sample addition mark sample set of confidence level and instructed again
Practice;Assuming that being HtT obtained classifier of self study iteration, then taking turns in t for certain entity E in unmarked sample pooli
Are as follows:
Wherein, Ht(xi, tag) and each word that entity is included is represented in t confidence level provided of classifier iteration, Ht(xn,
Tag E) is representediNeighbouring sample is in t confidence level provided of classifier iteration, θiThe weight of each label in presentation-entity concept,
θiBy entity EiIn the word number that contains determine;By confidence (Ht, Ei) ascending order arrangement sees that K addition marks and collect before choosing.
7. a kind of Chinese medicine text concept recognition methods of increment type according to claim 1, it is characterised in that: the step
In 5, obtains the big label of these contribution degrees and return to expert's mark;Choosing method is as follows:
It is small and differ biggish forecast sample with neighbouring sample confidence level and return to expert to choose classification confidence in unmarked pond
Mark.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811436594.5A CN109635123A (en) | 2018-11-28 | 2018-11-28 | A kind of Chinese medicine text concept recognition methods of increment type |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811436594.5A CN109635123A (en) | 2018-11-28 | 2018-11-28 | A kind of Chinese medicine text concept recognition methods of increment type |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109635123A true CN109635123A (en) | 2019-04-16 |
Family
ID=66070065
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811436594.5A Pending CN109635123A (en) | 2018-11-28 | 2018-11-28 | A kind of Chinese medicine text concept recognition methods of increment type |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109635123A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516241A (en) * | 2019-08-26 | 2019-11-29 | 北京三快在线科技有限公司 | Geographical address analytic method, device, readable storage medium storing program for executing and electronic equipment |
CN111259626A (en) * | 2020-01-16 | 2020-06-09 | 上海国民集团健康科技有限公司 | Traditional Chinese medicine entity recognition algorithm |
CN112733869A (en) * | 2019-10-28 | 2021-04-30 | 中移信息技术有限公司 | Method, device and equipment for training text recognition model and storage medium |
CN112733869B (en) * | 2019-10-28 | 2024-05-28 | 中移信息技术有限公司 | Method, device, equipment and storage medium for training text recognition model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104318242A (en) * | 2014-10-08 | 2015-01-28 | 中国人民解放军空军工程大学 | High-efficiency SVM active half-supervision learning algorithm |
CN108549639A (en) * | 2018-04-20 | 2018-09-18 | 山东管理学院 | Based on the modified Chinese medicine case name recognition methods of multiple features template and system |
-
2018
- 2018-11-28 CN CN201811436594.5A patent/CN109635123A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104318242A (en) * | 2014-10-08 | 2015-01-28 | 中国人民解放军空军工程大学 | High-efficiency SVM active half-supervision learning algorithm |
CN108549639A (en) * | 2018-04-20 | 2018-09-18 | 山东管理学院 | Based on the modified Chinese medicine case name recognition methods of multiple features template and system |
Non-Patent Citations (4)
Title |
---|
JIANQIANG LI等: "Large Scale Sequential Learning from Partially Labeled Data", 《2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING》 * |
刘方驰: "基于文本的实体—关系抽取技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
张磊: "特定领域命名实体识别通用方法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
陈海虹等, 电子科技大学出版社 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516241A (en) * | 2019-08-26 | 2019-11-29 | 北京三快在线科技有限公司 | Geographical address analytic method, device, readable storage medium storing program for executing and electronic equipment |
CN112733869A (en) * | 2019-10-28 | 2021-04-30 | 中移信息技术有限公司 | Method, device and equipment for training text recognition model and storage medium |
CN112733869B (en) * | 2019-10-28 | 2024-05-28 | 中移信息技术有限公司 | Method, device, equipment and storage medium for training text recognition model |
CN111259626A (en) * | 2020-01-16 | 2020-06-09 | 上海国民集团健康科技有限公司 | Traditional Chinese medicine entity recognition algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109960800A (en) | Weakly supervised file classification method and device based on Active Learning | |
CN105808524A (en) | Patent document abstract-based automatic patent classification method | |
CN106980609A (en) | A kind of name entity recognition method of the condition random field of word-based vector representation | |
CN108829801A (en) | A kind of event trigger word abstracting method based on documentation level attention mechanism | |
CN109697285A (en) | Enhance the hierarchical B iLSTM Chinese electronic health record disease code mask method of semantic expressiveness | |
CN109710925A (en) | Name entity recognition method and device | |
CN108897989A (en) | A kind of biological event abstracting method based on candidate events element attention mechanism | |
CN110059185A (en) | A kind of medical files specialized vocabulary automation mask method | |
CN110188197B (en) | Active learning method and device for labeling platform | |
CN107463607A (en) | The domain entities hyponymy of bluebeard compound vector sum bootstrapping study obtains and method for organizing | |
CN110879831A (en) | Chinese medicine sentence word segmentation method based on entity recognition technology | |
CN110134946A (en) | A kind of machine reading understanding method for complex data | |
CN108664474A (en) | A kind of resume analytic method based on deep learning | |
Wang et al. | A framework and its empirical study of automatic diagnosis of traditional Chinese medicine utilizing raw free-text clinical records | |
CN108717413A (en) | It is a kind of based on the assumption that property semi-supervised learning Opening field answering method | |
CN110348017B (en) | Text entity detection method, system and related components | |
CN108446334A (en) | A kind of content-based image retrieval method of unsupervised dual training | |
CN108829823A (en) | A kind of file classification method | |
CN108038099A (en) | Low frequency keyword recognition method based on term clustering | |
CN111858896A (en) | Knowledge base question-answering method based on deep learning | |
CN103608805B (en) | Dictionary generation and method | |
CN109635123A (en) | A kind of Chinese medicine text concept recognition methods of increment type | |
Alqahtani et al. | A multitask learning approach for diacritic restoration | |
CN113160917B (en) | Electronic medical record entity relation extraction method | |
CN114579695A (en) | Event extraction method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190416 |