CN108511036A - A kind of method and system of Chinese symptom mark - Google Patents
A kind of method and system of Chinese symptom mark Download PDFInfo
- Publication number
- CN108511036A CN108511036A CN201810112718.8A CN201810112718A CN108511036A CN 108511036 A CN108511036 A CN 108511036A CN 201810112718 A CN201810112718 A CN 201810112718A CN 108511036 A CN108511036 A CN 108511036A
- Authority
- CN
- China
- Prior art keywords
- word
- symptom
- chinese
- chinese symptom
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/169—Annotation, e.g. comment data or footnotes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
Abstract
The present invention provides a kind of method and system of Chinese symptom mark.This method includes:Determine 16 constitution elements of the Chinese symptom mark system of structure:Atom symptom, conjunction, negative word, there are word, degree word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position word, centre word, feel word, Feature Words and qualifier;Obtain Chinese symptom data;The word or phrase of the Chinese symptom data are labeled, the sequence being made of one or more constitution elements is obtained.Compared with the prior art, the present invention can be good at identifying the constitution element of Chinese symptom, and the mark accuracy in two kinds of statistics granularities of symptom and constitution element is greatly improved.
Description
Technical field
The present invention relates to Chinese text label technology fields, more specifically more particularly to a kind of Chinese symptom mark side
Method and system.
Background technology
Electronic medical record system is at home and abroad widely used at present, and to realize the electronization of medical information, into
And data mining is carried out on it, the structuring of medical text is just particularly important.And Chinese symptom is labeled, it helps
In the meaning expressed by more accurately assurance symptom.
The common method of Chinese mask method includes:Maximum entropy model, conditional random field models, deep neural network etc..
However, the effect of Chinese word segmentation in practical applications is not fully up to expectations at present, one of reason is that different natures
Requirement of the Language Processing field to participle is different.Chinese symptom is needed to carry out special composition analysis, it is simple with normal
The segmenting method of rule carries out cutting to symptom, can not fully meet needs.
In the prior art, exist《Medical informatics magazine》The paper delivered on 7th phase in 2016《Electronic health record text symptom
Automatic identifying method》When research Chinese symptom identification method, it is indicated that Chinese symptom can be by negative word, qualifier, position word, disease
Written complaint is made up of corresponding rule of combination.However, they the 4 kinds of constitution elements proposed are not provided yet it is exact
Definition, at the same they only according to qualifier and position lexeme set it is different qualifier is divided into before modify to qualifier and backward
Word, and there is no the feature of semanteme for more considering qualifier.《Journal of Software》The paper delivered on o. 11th in 2016《In
Literary electronic health record name entity and entity relationship building of corpus》It studies electronic health record name entity and entity relationship marks system
When, seven kinds of modifications of symptom are defined from the angle of entity and patient's relationship:Deny, non-patient, it is current, have ready conditions
, it is possible, to be validated, occasionally have, however its qualifier for describing symptom self character does not account for but.
Invention content
For the defects in the prior art, the present invention provides a kind of method and system of Chinese symptom mark.
One side according to the present invention provides a kind of method of Chinese symptom mark, includes the following steps:
Determine 16 constitution elements of the Chinese symptom mark system of structure:Atom symptom, conjunction, negative word, there are word, journeys
Degree word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position word, centre word, feel word, feature
Word and qualifier;
Obtain Chinese symptom data;And
The word or phrase of the Chinese symptom data are labeled, obtain being made of one or more constitution elements
Sequence.
An embodiment wherein is determining the constitution element for building Chinese symptom mark system and is obtaining Chinese symptom number
According to the step of between, further include step:The Chinese symptom structure of 16 constitution elements structure of system is marked according to the Chinese symptom
At the dictionary sheet of element, dictionary sheet includes:Atom symptom table, conjunction table, negate vocabulary, there are vocabulary, degree vocabulary, development word
Table, can vocabulary, be unable to vocabulary, action vocabularies, scene limit vocabulary, orientation vocabulary, position vocabulary, center vocabulary, feel word
Table, feature vocabulary and modification vocabulary.
An embodiment wherein is right using the dictionary sheet of the constitution element or according to Chinese symptom constitution element
The word or phrase of the Chinese symptom data are labeled, and obtain the sequence being made of one or more constitution elements.
After the step of embodiment wherein, the word or phrase of Chinese symptom data are labeled, by the mark
As a result the constitution element of sequence is stored in the dictionary sheet of the Chinese symptom constitution element.
An embodiment wherein, using semi-automatic mask method to the word of the Chinese symptom data or phrase into rower
Note.
Semi-automatic mask method to the word or phrase of the Chinese symptom data be labeled including:For acquired disease
Shape data carry out part mark using rule, obtain Chinese symptom part annotation results;To the Chinese symptom part mark knot
Fruit carries out desk checking, is modified to the annotation results of mistake;Iteration executes described using rule progress part annotation process
With the desk checking process;The word not marked in the annotation results of the Chinese symptom part is labeled, is obtained final
Annotation results sequence.
An embodiment wherein, based on the mask method of CRF to the word or phrase to the Chinese symptom data
It is labeled.
Other side according to the present invention, a kind of system of Chinese symptom mark, including:Constitution element analysis module,
For determining 16 constitution elements for building Chinese symptom and marking system:Atom symptom, conjunction, negative word, there are word, degree
Word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position word, centre word, feel word, Feature Words
And qualifier;
Data acquisition module is connected with constitution element analysis module, for obtaining Chinese symptom data;
Data labeling module is connected with data acquisition module and constitution element analysis module, to the Chinese symptom number
According to word or phrase be labeled, Chinese symptom data is made of from text marking one or more constitution elements
Sequence.
An embodiment wherein, the system also includes:Chinese symptom constitution element memory module, with constitution element point
Analysis module is connected with data labeling module, the dictionary sheet for storing Chinese symptom constitution element.
This method and system can be good at identifying the constitution element of Chinese symptom, two kinds of statistics of symptom and constitution element
Mark accuracy in granularity has respectively reached 90.53% and 93.91%.
According to below with reference to the accompanying drawings becoming to detailed description of illustrative embodiments, other feature of the invention and aspect
It is clear.
Description of the drawings
Reader is after the specific implementation mode for having read the present invention with reference to attached drawing, it will more clearly understands the present invention's
Various aspects.Wherein,
Fig. 1 is shown according to one embodiment of the present invention, the flow diagram of Chinese symptom mask method;
Fig. 2A shows a preferred embodiment of the Chinese symptom mask method using Fig. 1;
Fig. 2 B show a preferred embodiment of the Chinese symptom mask method using Fig. 2A;
Fig. 3 shows a preferred embodiment of Chinese symptom mask method;
Fig. 4 shows the flow chart of the semi-automatic mask method of the present invention;
Fig. 5 is shown according to another embodiment of the present invention, the block diagram of Chinese symptom labeling system;
Fig. 6 is shown according to a preferred embodiment of the Chinese symptom labeling system using Fig. 4.
Specific implementation mode
In order to keep techniques disclosed in this application content more detailed with it is complete, can refer to attached drawing and the present invention it is following
Various specific embodiments, identical label represents same or analogous component in attached drawing.However, those skilled in the art
It should be appreciated that embodiment provided hereinafter is not to be used for limiting the range that the present invention is covered.In addition, attached drawing is used only for
It is schematically illustrated, and is drawn not according to its full size.
Fig. 1 is shown according to one embodiment of the present invention, the flow diagram of Chinese symptom mask method.
Referring to Fig.1, in this embodiment, the Chinese mask method of the present invention gives reality through step S110~step S130
It is existing.Specifically, step S110 is first carried out, determines 16 constitution elements of the Chinese symptom mark system of structure:Atom symptom,
Conjunction, negative word, there are word, degree word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position
Word, feels word, Feature Words and qualifier at centre word.Constitution element is as shown in table 1.
1 symptom constitution element of table
Then, step S120 is executed, Chinese symptom data is obtained.Chinese symptom is crawled from multiple health and fitness sites, to protect
Accuracy rate is demonstrate,proved, the tcm symptom and disease name wherein mixed is then manually rejected.
Finally, step S130 is executed, the word or phrase of the Chinese symptom data are labeled, obtained by one or more
The sequence of a constitution element composition.If S=e | e is symptom constitution element }, then appoint to a symptom X, it can be by table
It is shown as following form:
X=<x1,…,xn>, wherein xi ∈ S (i=1 ..., n).
Fig. 2A shows a preferred embodiment of the Chinese symptom mask method using Fig. 1.Fig. 2A is compared with Fig. 1,
In this embodiment, the main distinction is to be, step S140 is increased between step S110 and step S120, according to described in
The dictionary sheet of the Chinese symptom constitution element of 16 constitution elements structure of Chinese symptom mark system.Dictionary sheet includes:Atom disease
Shape table, conjunction table, negate vocabulary, there are vocabulary, degree vocabulary, development vocabulary, can vocabulary, be unable to vocabulary, action vocabularies, feelings
Scape limits vocabulary, orientation vocabulary, position vocabulary, center vocabulary, feels vocabulary, feature vocabulary and modification vocabulary.Build dictionary sheet
Process Manual definition's mode may be used.
Fig. 2 B show a preferred embodiment of the Chinese symptom mask method using Fig. 2A.In this embodiment, step
S150, using the dictionary sheet of the constitution element or according to Chinese symptom constitution element, to the word of the Chinese symptom data
Or phrase is labeled, and obtains the sequence being made of one or more constitution elements.
Fig. 3 shows a preferred embodiment of Chinese symptom mask method.In this embodiment, the main distinction is to be,
Step S160 is increased after the step s 150, and the constitution element deposit of the annotation results sequence Chinese symptom is constituted
The dictionary sheet of element.
An embodiment wherein, Fig. 4 are shown using semi-automatic mask method to described to the Chinese symptom data
The process that word or phrase are labeled.The specific steps are:First, step S210 is executed, for acquired symptom data, is used
Rule carries out part mark to Chinese symptom data, obtains Chinese symptom part annotation results;Secondly, execution step S220 is right
Chinese symptom part annotation results carry out desk checking, are modified to the mark of mistake;Again, iteration executes step
S210 and step S220;Finally, step S230 is executed, all iteration are completed and then are labeled to the word not marked, obtained
To final annotation results sequence.
It is to the process of Chinese symptom data progress part mark using rule:
If in the symptom data obtained including conjunction, symptom data is split as two symptom datas, conjunction deposit connects
In vocabulary;
If the length of the symptom data obtained is less than or equal to 2, symptom data is added in atom symptom table;
If obtain symptom data word or phrase with " when " or " preceding " or it is " rear " ending, and not with position word beginning, then
The word for marking symptom data is scene determiner, and the word of the mark symptom data, which is added to scene, limits vocabulary;
If if symptom obtain symptom data word or phrase with feel suffix word end up, and feel word include " sense ", then
The word for marking symptom data is to feel word, and the word of the mark symptom data is added to feeling vocabulary;
If the word or phrase of the symptom data obtained are ended up with " property " or " sample " or " shape ", the word for marking symptom data is
The word of the mark symptom data is added to feature vocabulary by Feature Words;
If if the word for the symptom data that symptom obtains or the part of speech of phrase are noun, and not including position word, then disease is marked
The word of the mark symptom data is added to center vocabulary by word centered on the word of shape data;
If if the word for the symptom data that symptom obtains or the part of speech of phrase are verb, the word for marking symptom data is action
The word of the mark symptom data is added to action vocabularies by word;
If if if the word for the symptom data that symptom symptom obtains or the part of speech of phrase are adjective, the word of symptom data is marked
For qualifier, the word of the mark symptom data is added to modification vocabulary.
An embodiment wherein can use word or word of the mask method based on CRF to the Chinese symptom data
Group is labeled.First using partial symptoms data therein as training sample, using CRF models, to remaining partial symptoms
The test set formed carries out automatic marking, and carries out post-processing amendment to CRF handling results.
Training sample carries out the word to the Chinese symptom data or phrase using above-mentioned semi-automatic mask method
The process of mark.
Specifically, if X={ x1,x2,…,xnIndicate observation sequence (list entries), Y={ y1, y2... ynIndicate shape
State sequence (output sequence), then in the case of given observation sequence X, for parameter={ λ1,λ2,…,λKLinear chain CRF
The combination condition probability of model, corresponding status switch Y is:
Z (X) is normalization factor, is the sum of the conditional probability of all possible status switch, is expressed as:
fk(yi-1,yi, X, i) and it is characteristic function, for expressing possible context language feature, generally a two-value table
Function is levied, is indicated as follows:
λkIt is fk(yi-1,yi, X, i) weighting parameter, by training obtain.
For given observation sequence X, output target is to find out its corresponding most probable status switch, such as formula institute
Show:
Using 1 symptom constitution element of table, for the mark of symptom sequence.Use the following two kinds labelling schemes:
Scheme one:Identical symptom element stamps identical label, and different symptom elements stamps different labels.Such as
For symptom X=<Evening, meal, after, satisfy, swollen, sense is bright, shows, adds, weight>, corresponding output sequence is<e7,e7,e7,e10,
e10,e10,e13,e13,e14,e14>
Scheme two:It on the basis of scheme one, is also distinguished inside packetized elementary to the ill, the beginning of packetized elementary to the ill carries out
Signalment.Such as similarly for symptom X=<Evening, meal, after, satisfy, swollen, sense is bright, shows, adds, weight>, corresponding output sequence
It is<e7_B,e7_I,e7_I,e10_B,e10_I,e10_I,e13_B,e13_I,e14_B,e14_I>.
In order to obtain preferably mark effect, the present invention for window size 7 (represent current character and its front and back 1,2,
3 characters), and whether include part of speech feature, devise 6 kinds of CRF feature templates.Table 2 is that window size is 7 including part of speech
The feature templates of feature, wherein Ci indicate that i-th of word of current symptomatic, Pi indicate i-th of word place word of current symptomatic
Part of speech, Unigram are unitary feature, i.e. current character;Bigram is binary feature, indicates the spy that two neighboring character is combined
Sign;Trigram is ternary feature, indicates that current character is combined generated feature with front and back two adjacent characters.
2 feature templates of table
In order to make up the deficiency of CRF models, for pre-determined symptom constitution element, after obtaining CRF results, such as
The constitution element that fruit CRF is marked appears in element dictionary, then just being marked again according to the element category of dictionary, to carry
The accuracy rate of height mark.
Fig. 5 is shown according to another embodiment of the present invention, the system of Chinese symptom mark, including:
Constitution element analysis module, 16 constitution elements for determining the Chinese symptom mark system of structure:Atom symptom,
Conjunction, negative word, there are word, degree word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position
Word, feels word, Feature Words and qualifier at centre word;
Data acquisition module, for obtaining Chinese symptom data;
Data labeling module is connected with data acquisition module and constitution element analysis module, for the Chinese disease
The word or phrase of shape data are labeled, by Chinese symptom data from text marking for by one or more constitution element groups
At sequence.
Fig. 6 shows a preferred embodiment of the Chinese symptom labeling system using Fig. 5.Compared with Fig. 5, the main distinction is
Chinese symptom constitution element memory module is increased, is connected with constitution element analysis module and data labeling module, for depositing
The dictionary sheet of the Chinese symptom constitution element of storage.
Above, the specific implementation mode of the present invention is described with reference to the accompanying drawings.But those skilled in the art
It is understood that without departing from the spirit and scope of the present invention, can also make to the specific implementation mode of the present invention each
Kind change and replacement.These changes and replacement are all fallen in claims of the present invention limited range.
Claims (9)
1. a kind of method of Chinese symptom mark, which is characterized in that include the following steps:
Determine 16 constitution elements of the Chinese symptom mark system of structure:Atom symptom, conjunction, negative word, there are word, degree
Word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position word, centre word, feel word, Feature Words
And qualifier;
Obtain Chinese symptom data;
The word or phrase of the Chinese symptom data are labeled, the sequence being made of one or more constitution elements is obtained
Row.
2. the method for Chinese symptom mark according to claim 1, which is characterized in that build Chinese symptom mark determining
Between the step of constitution element and acquisition Chinese symptom data of system, method further includes step:
The dictionary sheet of the Chinese symptom constitution element of 16 constitution elements structure of system, dictionary sheet are marked according to the Chinese symptom
Including:Atom symptom table, conjunction table, negate vocabulary, there are vocabulary, degree vocabulary, development vocabulary, can vocabulary, be unable to vocabulary,
Action vocabularies, scene limit vocabulary, orientation vocabulary, position vocabulary, center vocabulary, feel vocabulary, feature vocabulary and modification vocabulary.
3. the method for Chinese symptom mark according to claim 2, which is characterized in that using the dictionary of the constitution element
Table or according to Chinese symptom constitution element, is labeled the word or phrase of the Chinese symptom data.
4. the method for Chinese symptom mark according to claim 3, which is characterized in that in the word of the Chinese symptom data
Or after phrase the step of being labeled, including step:
The constitution element of the annotation results sequence is stored in the dictionary sheet of the Chinese symptom constitution element.
5. the method for Chinese symptom mark according to claim 4, which is characterized in that using semi-automatic mask method to institute
The word or phrase for stating Chinese symptom data are labeled.
6. the method for Chinese symptom mark according to claim 5, which is characterized in that using semi-automatic mask method to institute
State the word of Chinese symptom data or phrase be labeled including:
For acquired symptom data, part mark is carried out using rule, obtains Chinese symptom part annotation results;
Desk checking is carried out to Chinese symptom part annotation results, the annotation results of mistake are modified;
Iteration executes described using rule progress part annotation process and the desk checking process;
The word not marked in the annotation results of the Chinese symptom part is labeled, final annotation results sequence is obtained.
7. the method for Chinese symptom mark according to claim 1, which is characterized in that based on the mask method of CRF to institute
It states and the word or phrase of the Chinese symptom data is labeled.
8. a kind of system of Chinese symptom mark, which is characterized in that including:
Constitution element analysis module, 16 constitution elements for determining the Chinese symptom mark system of structure:Atom symptom connects
Word, negative word, there are word, degree word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position word,
Centre word feels word, Feature Words and qualifier;
Data acquisition module is connected with constitution element analysis module, for obtaining Chinese symptom data;
Data labeling module is connected with data acquisition module and constitution element analysis module, to the Chinese symptom data
Word or phrase are labeled, the sequence that Chinese symptom data is made of from text marking one or more constitution elements
Row.
9. the system of Chinese symptom mark according to claim 8, which is characterized in that in constitution element analysis module sum number
According between acquisition module, method further includes:
Chinese symptom constitution element memory module, is connected with constitution element analysis module and data labeling module, for storing
The dictionary sheet of Chinese symptom constitution element.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810112718.8A CN108511036A (en) | 2018-02-05 | 2018-02-05 | A kind of method and system of Chinese symptom mark |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810112718.8A CN108511036A (en) | 2018-02-05 | 2018-02-05 | A kind of method and system of Chinese symptom mark |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108511036A true CN108511036A (en) | 2018-09-07 |
Family
ID=63374459
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810112718.8A Pending CN108511036A (en) | 2018-02-05 | 2018-02-05 | A kind of method and system of Chinese symptom mark |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108511036A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110069779A (en) * | 2019-04-18 | 2019-07-30 | 腾讯科技(深圳)有限公司 | The symptom entity recognition method and relevant apparatus of medical text |
CN110263168A (en) * | 2019-06-20 | 2019-09-20 | 北京百度网讯科技有限公司 | Symptom word classification method, device and terminal |
CN110931128A (en) * | 2019-12-05 | 2020-03-27 | 中国科学院自动化研究所 | Method, system and device for automatically identifying unsupervised symptoms of unstructured medical texts |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095913A (en) * | 2016-06-08 | 2016-11-09 | 广州同构医疗科技有限公司 | A kind of electronic health record text structure method |
CN107527073A (en) * | 2017-09-05 | 2017-12-29 | 中南大学 | The recognition methods of entity is named in electronic health record |
-
2018
- 2018-02-05 CN CN201810112718.8A patent/CN108511036A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095913A (en) * | 2016-06-08 | 2016-11-09 | 广州同构医疗科技有限公司 | A kind of electronic health record text structure method |
CN107527073A (en) * | 2017-09-05 | 2017-12-29 | 中南大学 | The recognition methods of entity is named in electronic health record |
Non-Patent Citations (2)
Title |
---|
王婷 等: "基于症状构成成分的上下位关系自动抽取方法", 《计算机应用》 * |
阮彤 等: "基于电子病历的临床医疗大数据挖掘流程与方法", 《大数据》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110069779A (en) * | 2019-04-18 | 2019-07-30 | 腾讯科技(深圳)有限公司 | The symptom entity recognition method and relevant apparatus of medical text |
CN110069779B (en) * | 2019-04-18 | 2023-01-10 | 腾讯科技(深圳)有限公司 | Symptom entity identification method of medical text and related device |
CN110263168A (en) * | 2019-06-20 | 2019-09-20 | 北京百度网讯科技有限公司 | Symptom word classification method, device and terminal |
CN110931128A (en) * | 2019-12-05 | 2020-03-27 | 中国科学院自动化研究所 | Method, system and device for automatically identifying unsupervised symptoms of unstructured medical texts |
CN110931128B (en) * | 2019-12-05 | 2023-04-07 | 中国科学院自动化研究所 | Method, system and device for automatically identifying unsupervised symptoms of unstructured medical texts |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110489760B (en) | Text automatic correction method and device based on deep neural network | |
CN108717406B (en) | Text emotion analysis method and device and storage medium | |
CN110852087B (en) | Chinese error correction method and device, storage medium and electronic device | |
CN109902307B (en) | Named entity recognition method, named entity recognition model training method and device | |
CN110459282B (en) | Sequence labeling model training method, electronic medical record processing method and related device | |
Creutz et al. | Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0 | |
CN107391486B (en) | Method for identifying new words in field based on statistical information and sequence labels | |
CN104794169B (en) | A kind of subject terminology extraction method and system based on sequence labelling model | |
US20080228463A1 (en) | Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building | |
CN111079412A (en) | Text error correction method and device | |
CN105068997B (en) | The construction method and device of parallel corpora | |
CN111611775B (en) | Entity identification model generation method, entity identification device and equipment | |
WO2021032598A1 (en) | Training and applying structured data extraction models | |
CN111274239A (en) | Test paper structuralization processing method, device and equipment | |
CN111599340A (en) | Polyphone pronunciation prediction method and device and computer readable storage medium | |
Kim et al. | Figure text extraction in biomedical literature | |
CN108511036A (en) | A kind of method and system of Chinese symptom mark | |
CN113128203A (en) | Attention mechanism-based relationship extraction method, system, equipment and storage medium | |
CN109299467B (en) | Medical text recognition method and device and sentence recognition model training method and device | |
Alam et al. | A large multi-target dataset of common bengali handwritten graphemes | |
CN104572632B (en) | A kind of method in the translation direction for determining the vocabulary with proper name translation | |
CN114782965A (en) | Visual rich document information extraction method, system and medium based on layout relevance | |
JP7040155B2 (en) | Information processing equipment, information processing methods and programs | |
Wong et al. | isentenizer-: Multilingual sentence boundary detection model | |
CN113762100A (en) | Name extraction and standardization method and device in medical bill, computing equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180907 |
|
WD01 | Invention patent application deemed withdrawn after publication |