CN108511036A - A kind of method and system of Chinese symptom mark - Google Patents

A kind of method and system of Chinese symptom mark Download PDF

Info

Publication number
CN108511036A
CN108511036A CN201810112718.8A CN201810112718A CN108511036A CN 108511036 A CN108511036 A CN 108511036A CN 201810112718 A CN201810112718 A CN 201810112718A CN 108511036 A CN108511036 A CN 108511036A
Authority
CN
China
Prior art keywords
word
symptom
chinese
chinese symptom
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810112718.8A
Other languages
Chinese (zh)
Inventor
叶琪
阮彤
王祺
曾露
翟洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN201810112718.8A priority Critical patent/CN108511036A/en
Publication of CN108511036A publication Critical patent/CN108511036A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Abstract

The present invention provides a kind of method and system of Chinese symptom mark.This method includes:Determine 16 constitution elements of the Chinese symptom mark system of structure:Atom symptom, conjunction, negative word, there are word, degree word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position word, centre word, feel word, Feature Words and qualifier;Obtain Chinese symptom data;The word or phrase of the Chinese symptom data are labeled, the sequence being made of one or more constitution elements is obtained.Compared with the prior art, the present invention can be good at identifying the constitution element of Chinese symptom, and the mark accuracy in two kinds of statistics granularities of symptom and constitution element is greatly improved.

Description

A kind of method and system of Chinese symptom mark
Technical field
The present invention relates to Chinese text label technology fields, more specifically more particularly to a kind of Chinese symptom mark side Method and system.
Background technology
Electronic medical record system is at home and abroad widely used at present, and to realize the electronization of medical information, into And data mining is carried out on it, the structuring of medical text is just particularly important.And Chinese symptom is labeled, it helps In the meaning expressed by more accurately assurance symptom.
The common method of Chinese mask method includes:Maximum entropy model, conditional random field models, deep neural network etc.. However, the effect of Chinese word segmentation in practical applications is not fully up to expectations at present, one of reason is that different natures Requirement of the Language Processing field to participle is different.Chinese symptom is needed to carry out special composition analysis, it is simple with normal The segmenting method of rule carries out cutting to symptom, can not fully meet needs.
In the prior art, exist《Medical informatics magazine》The paper delivered on 7th phase in 2016《Electronic health record text symptom Automatic identifying method》When research Chinese symptom identification method, it is indicated that Chinese symptom can be by negative word, qualifier, position word, disease Written complaint is made up of corresponding rule of combination.However, they the 4 kinds of constitution elements proposed are not provided yet it is exact Definition, at the same they only according to qualifier and position lexeme set it is different qualifier is divided into before modify to qualifier and backward Word, and there is no the feature of semanteme for more considering qualifier.《Journal of Software》The paper delivered on o. 11th in 2016《In Literary electronic health record name entity and entity relationship building of corpus》It studies electronic health record name entity and entity relationship marks system When, seven kinds of modifications of symptom are defined from the angle of entity and patient's relationship:Deny, non-patient, it is current, have ready conditions , it is possible, to be validated, occasionally have, however its qualifier for describing symptom self character does not account for but.
Invention content
For the defects in the prior art, the present invention provides a kind of method and system of Chinese symptom mark.
One side according to the present invention provides a kind of method of Chinese symptom mark, includes the following steps:
Determine 16 constitution elements of the Chinese symptom mark system of structure:Atom symptom, conjunction, negative word, there are word, journeys Degree word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position word, centre word, feel word, feature Word and qualifier;
Obtain Chinese symptom data;And
The word or phrase of the Chinese symptom data are labeled, obtain being made of one or more constitution elements Sequence.
An embodiment wherein is determining the constitution element for building Chinese symptom mark system and is obtaining Chinese symptom number According to the step of between, further include step:The Chinese symptom structure of 16 constitution elements structure of system is marked according to the Chinese symptom At the dictionary sheet of element, dictionary sheet includes:Atom symptom table, conjunction table, negate vocabulary, there are vocabulary, degree vocabulary, development word Table, can vocabulary, be unable to vocabulary, action vocabularies, scene limit vocabulary, orientation vocabulary, position vocabulary, center vocabulary, feel word Table, feature vocabulary and modification vocabulary.
An embodiment wherein is right using the dictionary sheet of the constitution element or according to Chinese symptom constitution element The word or phrase of the Chinese symptom data are labeled, and obtain the sequence being made of one or more constitution elements.
After the step of embodiment wherein, the word or phrase of Chinese symptom data are labeled, by the mark As a result the constitution element of sequence is stored in the dictionary sheet of the Chinese symptom constitution element.
An embodiment wherein, using semi-automatic mask method to the word of the Chinese symptom data or phrase into rower Note.
Semi-automatic mask method to the word or phrase of the Chinese symptom data be labeled including:For acquired disease Shape data carry out part mark using rule, obtain Chinese symptom part annotation results;To the Chinese symptom part mark knot Fruit carries out desk checking, is modified to the annotation results of mistake;Iteration executes described using rule progress part annotation process With the desk checking process;The word not marked in the annotation results of the Chinese symptom part is labeled, is obtained final Annotation results sequence.
An embodiment wherein, based on the mask method of CRF to the word or phrase to the Chinese symptom data It is labeled.
Other side according to the present invention, a kind of system of Chinese symptom mark, including:Constitution element analysis module, For determining 16 constitution elements for building Chinese symptom and marking system:Atom symptom, conjunction, negative word, there are word, degree Word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position word, centre word, feel word, Feature Words And qualifier;
Data acquisition module is connected with constitution element analysis module, for obtaining Chinese symptom data;
Data labeling module is connected with data acquisition module and constitution element analysis module, to the Chinese symptom number According to word or phrase be labeled, Chinese symptom data is made of from text marking one or more constitution elements Sequence.
An embodiment wherein, the system also includes:Chinese symptom constitution element memory module, with constitution element point Analysis module is connected with data labeling module, the dictionary sheet for storing Chinese symptom constitution element.
This method and system can be good at identifying the constitution element of Chinese symptom, two kinds of statistics of symptom and constitution element Mark accuracy in granularity has respectively reached 90.53% and 93.91%.
According to below with reference to the accompanying drawings becoming to detailed description of illustrative embodiments, other feature of the invention and aspect It is clear.
Description of the drawings
Reader is after the specific implementation mode for having read the present invention with reference to attached drawing, it will more clearly understands the present invention's Various aspects.Wherein,
Fig. 1 is shown according to one embodiment of the present invention, the flow diagram of Chinese symptom mask method;
Fig. 2A shows a preferred embodiment of the Chinese symptom mask method using Fig. 1;
Fig. 2 B show a preferred embodiment of the Chinese symptom mask method using Fig. 2A;
Fig. 3 shows a preferred embodiment of Chinese symptom mask method;
Fig. 4 shows the flow chart of the semi-automatic mask method of the present invention;
Fig. 5 is shown according to another embodiment of the present invention, the block diagram of Chinese symptom labeling system;
Fig. 6 is shown according to a preferred embodiment of the Chinese symptom labeling system using Fig. 4.
Specific implementation mode
In order to keep techniques disclosed in this application content more detailed with it is complete, can refer to attached drawing and the present invention it is following Various specific embodiments, identical label represents same or analogous component in attached drawing.However, those skilled in the art It should be appreciated that embodiment provided hereinafter is not to be used for limiting the range that the present invention is covered.In addition, attached drawing is used only for It is schematically illustrated, and is drawn not according to its full size.
Fig. 1 is shown according to one embodiment of the present invention, the flow diagram of Chinese symptom mask method.
Referring to Fig.1, in this embodiment, the Chinese mask method of the present invention gives reality through step S110~step S130 It is existing.Specifically, step S110 is first carried out, determines 16 constitution elements of the Chinese symptom mark system of structure:Atom symptom, Conjunction, negative word, there are word, degree word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position Word, feels word, Feature Words and qualifier at centre word.Constitution element is as shown in table 1.
1 symptom constitution element of table
Then, step S120 is executed, Chinese symptom data is obtained.Chinese symptom is crawled from multiple health and fitness sites, to protect Accuracy rate is demonstrate,proved, the tcm symptom and disease name wherein mixed is then manually rejected.
Finally, step S130 is executed, the word or phrase of the Chinese symptom data are labeled, obtained by one or more The sequence of a constitution element composition.If S=e | e is symptom constitution element }, then appoint to a symptom X, it can be by table It is shown as following form:
X=<x1,…,xn>, wherein xi ∈ S (i=1 ..., n).
Fig. 2A shows a preferred embodiment of the Chinese symptom mask method using Fig. 1.Fig. 2A is compared with Fig. 1, In this embodiment, the main distinction is to be, step S140 is increased between step S110 and step S120, according to described in The dictionary sheet of the Chinese symptom constitution element of 16 constitution elements structure of Chinese symptom mark system.Dictionary sheet includes:Atom disease Shape table, conjunction table, negate vocabulary, there are vocabulary, degree vocabulary, development vocabulary, can vocabulary, be unable to vocabulary, action vocabularies, feelings Scape limits vocabulary, orientation vocabulary, position vocabulary, center vocabulary, feels vocabulary, feature vocabulary and modification vocabulary.Build dictionary sheet Process Manual definition's mode may be used.
Fig. 2 B show a preferred embodiment of the Chinese symptom mask method using Fig. 2A.In this embodiment, step S150, using the dictionary sheet of the constitution element or according to Chinese symptom constitution element, to the word of the Chinese symptom data Or phrase is labeled, and obtains the sequence being made of one or more constitution elements.
Fig. 3 shows a preferred embodiment of Chinese symptom mask method.In this embodiment, the main distinction is to be, Step S160 is increased after the step s 150, and the constitution element deposit of the annotation results sequence Chinese symptom is constituted The dictionary sheet of element.
An embodiment wherein, Fig. 4 are shown using semi-automatic mask method to described to the Chinese symptom data The process that word or phrase are labeled.The specific steps are:First, step S210 is executed, for acquired symptom data, is used Rule carries out part mark to Chinese symptom data, obtains Chinese symptom part annotation results;Secondly, execution step S220 is right Chinese symptom part annotation results carry out desk checking, are modified to the mark of mistake;Again, iteration executes step S210 and step S220;Finally, step S230 is executed, all iteration are completed and then are labeled to the word not marked, obtained To final annotation results sequence.
It is to the process of Chinese symptom data progress part mark using rule:
If in the symptom data obtained including conjunction, symptom data is split as two symptom datas, conjunction deposit connects In vocabulary;
If the length of the symptom data obtained is less than or equal to 2, symptom data is added in atom symptom table;
If obtain symptom data word or phrase with " when " or " preceding " or it is " rear " ending, and not with position word beginning, then The word for marking symptom data is scene determiner, and the word of the mark symptom data, which is added to scene, limits vocabulary;
If if symptom obtain symptom data word or phrase with feel suffix word end up, and feel word include " sense ", then The word for marking symptom data is to feel word, and the word of the mark symptom data is added to feeling vocabulary;
If the word or phrase of the symptom data obtained are ended up with " property " or " sample " or " shape ", the word for marking symptom data is The word of the mark symptom data is added to feature vocabulary by Feature Words;
If if the word for the symptom data that symptom obtains or the part of speech of phrase are noun, and not including position word, then disease is marked The word of the mark symptom data is added to center vocabulary by word centered on the word of shape data;
If if the word for the symptom data that symptom obtains or the part of speech of phrase are verb, the word for marking symptom data is action The word of the mark symptom data is added to action vocabularies by word;
If if if the word for the symptom data that symptom symptom obtains or the part of speech of phrase are adjective, the word of symptom data is marked For qualifier, the word of the mark symptom data is added to modification vocabulary.
An embodiment wherein can use word or word of the mask method based on CRF to the Chinese symptom data Group is labeled.First using partial symptoms data therein as training sample, using CRF models, to remaining partial symptoms The test set formed carries out automatic marking, and carries out post-processing amendment to CRF handling results.
Training sample carries out the word to the Chinese symptom data or phrase using above-mentioned semi-automatic mask method The process of mark.
Specifically, if X={ x1,x2,…,xnIndicate observation sequence (list entries), Y={ y1, y2... ynIndicate shape State sequence (output sequence), then in the case of given observation sequence X, for parameter={ λ12,…,λKLinear chain CRF The combination condition probability of model, corresponding status switch Y is:
Z (X) is normalization factor, is the sum of the conditional probability of all possible status switch, is expressed as:
fk(yi-1,yi, X, i) and it is characteristic function, for expressing possible context language feature, generally a two-value table Function is levied, is indicated as follows:
λkIt is fk(yi-1,yi, X, i) weighting parameter, by training obtain.
For given observation sequence X, output target is to find out its corresponding most probable status switch, such as formula institute Show:
Using 1 symptom constitution element of table, for the mark of symptom sequence.Use the following two kinds labelling schemes:
Scheme one:Identical symptom element stamps identical label, and different symptom elements stamps different labels.Such as For symptom X=<Evening, meal, after, satisfy, swollen, sense is bright, shows, adds, weight>, corresponding output sequence is<e7,e7,e7,e10, e10,e10,e13,e13,e14,e14>
Scheme two:It on the basis of scheme one, is also distinguished inside packetized elementary to the ill, the beginning of packetized elementary to the ill carries out Signalment.Such as similarly for symptom X=<Evening, meal, after, satisfy, swollen, sense is bright, shows, adds, weight>, corresponding output sequence It is<e7_B,e7_I,e7_I,e10_B,e10_I,e10_I,e13_B,e13_I,e14_B,e14_I>.
In order to obtain preferably mark effect, the present invention for window size 7 (represent current character and its front and back 1,2, 3 characters), and whether include part of speech feature, devise 6 kinds of CRF feature templates.Table 2 is that window size is 7 including part of speech The feature templates of feature, wherein Ci indicate that i-th of word of current symptomatic, Pi indicate i-th of word place word of current symptomatic Part of speech, Unigram are unitary feature, i.e. current character;Bigram is binary feature, indicates the spy that two neighboring character is combined Sign;Trigram is ternary feature, indicates that current character is combined generated feature with front and back two adjacent characters.
2 feature templates of table
In order to make up the deficiency of CRF models, for pre-determined symptom constitution element, after obtaining CRF results, such as The constitution element that fruit CRF is marked appears in element dictionary, then just being marked again according to the element category of dictionary, to carry The accuracy rate of height mark.
Fig. 5 is shown according to another embodiment of the present invention, the system of Chinese symptom mark, including:
Constitution element analysis module, 16 constitution elements for determining the Chinese symptom mark system of structure:Atom symptom, Conjunction, negative word, there are word, degree word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position Word, feels word, Feature Words and qualifier at centre word;
Data acquisition module, for obtaining Chinese symptom data;
Data labeling module is connected with data acquisition module and constitution element analysis module, for the Chinese disease The word or phrase of shape data are labeled, by Chinese symptom data from text marking for by one or more constitution element groups At sequence.
Fig. 6 shows a preferred embodiment of the Chinese symptom labeling system using Fig. 5.Compared with Fig. 5, the main distinction is Chinese symptom constitution element memory module is increased, is connected with constitution element analysis module and data labeling module, for depositing The dictionary sheet of the Chinese symptom constitution element of storage.
Above, the specific implementation mode of the present invention is described with reference to the accompanying drawings.But those skilled in the art It is understood that without departing from the spirit and scope of the present invention, can also make to the specific implementation mode of the present invention each Kind change and replacement.These changes and replacement are all fallen in claims of the present invention limited range.

Claims (9)

1. a kind of method of Chinese symptom mark, which is characterized in that include the following steps:
Determine 16 constitution elements of the Chinese symptom mark system of structure:Atom symptom, conjunction, negative word, there are word, degree Word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position word, centre word, feel word, Feature Words And qualifier;
Obtain Chinese symptom data;
The word or phrase of the Chinese symptom data are labeled, the sequence being made of one or more constitution elements is obtained Row.
2. the method for Chinese symptom mark according to claim 1, which is characterized in that build Chinese symptom mark determining Between the step of constitution element and acquisition Chinese symptom data of system, method further includes step:
The dictionary sheet of the Chinese symptom constitution element of 16 constitution elements structure of system, dictionary sheet are marked according to the Chinese symptom Including:Atom symptom table, conjunction table, negate vocabulary, there are vocabulary, degree vocabulary, development vocabulary, can vocabulary, be unable to vocabulary, Action vocabularies, scene limit vocabulary, orientation vocabulary, position vocabulary, center vocabulary, feel vocabulary, feature vocabulary and modification vocabulary.
3. the method for Chinese symptom mark according to claim 2, which is characterized in that using the dictionary of the constitution element Table or according to Chinese symptom constitution element, is labeled the word or phrase of the Chinese symptom data.
4. the method for Chinese symptom mark according to claim 3, which is characterized in that in the word of the Chinese symptom data Or after phrase the step of being labeled, including step:
The constitution element of the annotation results sequence is stored in the dictionary sheet of the Chinese symptom constitution element.
5. the method for Chinese symptom mark according to claim 4, which is characterized in that using semi-automatic mask method to institute The word or phrase for stating Chinese symptom data are labeled.
6. the method for Chinese symptom mark according to claim 5, which is characterized in that using semi-automatic mask method to institute State the word of Chinese symptom data or phrase be labeled including:
For acquired symptom data, part mark is carried out using rule, obtains Chinese symptom part annotation results;
Desk checking is carried out to Chinese symptom part annotation results, the annotation results of mistake are modified;
Iteration executes described using rule progress part annotation process and the desk checking process;
The word not marked in the annotation results of the Chinese symptom part is labeled, final annotation results sequence is obtained.
7. the method for Chinese symptom mark according to claim 1, which is characterized in that based on the mask method of CRF to institute It states and the word or phrase of the Chinese symptom data is labeled.
8. a kind of system of Chinese symptom mark, which is characterized in that including:
Constitution element analysis module, 16 constitution elements for determining the Chinese symptom mark system of structure:Atom symptom connects Word, negative word, there are word, degree word, development word, can word, be unable to word, action word, scene determiner, the noun of locality, position word, Centre word feels word, Feature Words and qualifier;
Data acquisition module is connected with constitution element analysis module, for obtaining Chinese symptom data;
Data labeling module is connected with data acquisition module and constitution element analysis module, to the Chinese symptom data Word or phrase are labeled, the sequence that Chinese symptom data is made of from text marking one or more constitution elements Row.
9. the system of Chinese symptom mark according to claim 8, which is characterized in that in constitution element analysis module sum number According between acquisition module, method further includes:
Chinese symptom constitution element memory module, is connected with constitution element analysis module and data labeling module, for storing The dictionary sheet of Chinese symptom constitution element.
CN201810112718.8A 2018-02-05 2018-02-05 A kind of method and system of Chinese symptom mark Pending CN108511036A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810112718.8A CN108511036A (en) 2018-02-05 2018-02-05 A kind of method and system of Chinese symptom mark

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810112718.8A CN108511036A (en) 2018-02-05 2018-02-05 A kind of method and system of Chinese symptom mark

Publications (1)

Publication Number Publication Date
CN108511036A true CN108511036A (en) 2018-09-07

Family

ID=63374459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810112718.8A Pending CN108511036A (en) 2018-02-05 2018-02-05 A kind of method and system of Chinese symptom mark

Country Status (1)

Country Link
CN (1) CN108511036A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069779A (en) * 2019-04-18 2019-07-30 腾讯科技(深圳)有限公司 The symptom entity recognition method and relevant apparatus of medical text
CN110263168A (en) * 2019-06-20 2019-09-20 北京百度网讯科技有限公司 Symptom word classification method, device and terminal
CN110931128A (en) * 2019-12-05 2020-03-27 中国科学院自动化研究所 Method, system and device for automatically identifying unsupervised symptoms of unstructured medical texts

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095913A (en) * 2016-06-08 2016-11-09 广州同构医疗科技有限公司 A kind of electronic health record text structure method
CN107527073A (en) * 2017-09-05 2017-12-29 中南大学 The recognition methods of entity is named in electronic health record

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095913A (en) * 2016-06-08 2016-11-09 广州同构医疗科技有限公司 A kind of electronic health record text structure method
CN107527073A (en) * 2017-09-05 2017-12-29 中南大学 The recognition methods of entity is named in electronic health record

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王婷 等: "基于症状构成成分的上下位关系自动抽取方法", 《计算机应用》 *
阮彤 等: "基于电子病历的临床医疗大数据挖掘流程与方法", 《大数据》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069779A (en) * 2019-04-18 2019-07-30 腾讯科技(深圳)有限公司 The symptom entity recognition method and relevant apparatus of medical text
CN110069779B (en) * 2019-04-18 2023-01-10 腾讯科技(深圳)有限公司 Symptom entity identification method of medical text and related device
CN110263168A (en) * 2019-06-20 2019-09-20 北京百度网讯科技有限公司 Symptom word classification method, device and terminal
CN110931128A (en) * 2019-12-05 2020-03-27 中国科学院自动化研究所 Method, system and device for automatically identifying unsupervised symptoms of unstructured medical texts
CN110931128B (en) * 2019-12-05 2023-04-07 中国科学院自动化研究所 Method, system and device for automatically identifying unsupervised symptoms of unstructured medical texts

Similar Documents

Publication Publication Date Title
CN110489760B (en) Text automatic correction method and device based on deep neural network
CN108717406B (en) Text emotion analysis method and device and storage medium
CN110852087B (en) Chinese error correction method and device, storage medium and electronic device
CN109902307B (en) Named entity recognition method, named entity recognition model training method and device
CN110459282B (en) Sequence labeling model training method, electronic medical record processing method and related device
Creutz et al. Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0
CN107391486B (en) Method for identifying new words in field based on statistical information and sequence labels
CN104794169B (en) A kind of subject terminology extraction method and system based on sequence labelling model
US20080228463A1 (en) Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building
CN111079412A (en) Text error correction method and device
CN105068997B (en) The construction method and device of parallel corpora
CN111611775B (en) Entity identification model generation method, entity identification device and equipment
WO2021032598A1 (en) Training and applying structured data extraction models
CN111274239A (en) Test paper structuralization processing method, device and equipment
CN111599340A (en) Polyphone pronunciation prediction method and device and computer readable storage medium
Kim et al. Figure text extraction in biomedical literature
CN108511036A (en) A kind of method and system of Chinese symptom mark
CN113128203A (en) Attention mechanism-based relationship extraction method, system, equipment and storage medium
CN109299467B (en) Medical text recognition method and device and sentence recognition model training method and device
Alam et al. A large multi-target dataset of common bengali handwritten graphemes
CN104572632B (en) A kind of method in the translation direction for determining the vocabulary with proper name translation
CN114782965A (en) Visual rich document information extraction method, system and medium based on layout relevance
JP7040155B2 (en) Information processing equipment, information processing methods and programs
Wong et al. isentenizer-: Multilingual sentence boundary detection model
CN113762100A (en) Name extraction and standardization method and device in medical bill, computing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180907

WD01 Invention patent application deemed withdrawn after publication