CN110008473A - A kind of medical text name Entity recognition mask method based on alternative manner - Google Patents

A kind of medical text name Entity recognition mask method based on alternative manner Download PDF

Info

Publication number
CN110008473A
CN110008473A CN201910257482.1A CN201910257482A CN110008473A CN 110008473 A CN110008473 A CN 110008473A CN 201910257482 A CN201910257482 A CN 201910257482A CN 110008473 A CN110008473 A CN 110008473A
Authority
CN
China
Prior art keywords
text
medical
dictionary
word
corpus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910257482.1A
Other languages
Chinese (zh)
Other versions
CN110008473B (en
Inventor
陈储培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Shanghai Intelligent Technology Co Ltd
Original Assignee
Unisound Shanghai Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Shanghai Intelligent Technology Co Ltd filed Critical Unisound Shanghai Intelligent Technology Co Ltd
Priority to CN201910257482.1A priority Critical patent/CN110008473B/en
Publication of CN110008473A publication Critical patent/CN110008473A/en
Application granted granted Critical
Publication of CN110008473B publication Critical patent/CN110008473B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The embodiment of the present invention proposes a kind of medical text name Entity recognition mask method based on alternative manner, is related to medical information technical field.A large amount of man power and material is consumed using traditional annotation tool for extensive medical corpus labeling tool.The method that the method is combined using model and automation tools is suitable for large-scale medical text marking, reduces the mark period, to facilitate the raising of research and development of products efficiency.

Description

A kind of medical text name Entity recognition mask method based on alternative manner
Technical field
The present invention relates to medical information technical fields, order in particular to a kind of medical text based on alternative manner Name Entity recognition mask method.
Background technique
Medical field is different from general field, itself has centainly professional.The research of medical field be unable to do without medical treatment The support of corpus, in medical research field, sequence labelling is a basic and very important job.But since name is real Body identification mark needs a large amount of man power and material, and current mainstream sequence labelling be all by means of annotation tool of increasing income, because This mark period is long, further relates to professional very strong knowledge in medical field, so bringing to medical sequence labelling task tired It is difficult.In order to improve annotating efficiency, a kind of medical treatment name entity automatic marking method based on iteration is proposed.
With the development of internet, mobile Internet and big data technology, the scale of various text data resources is presented Explosive growth mainly includes social media (such as microblogging number, public platform, Facebook, Twitter etc.) and news media Unstructured data and Baidupedia and wikipedia on (such as People's Daily, phoenix news, Sohu's news etc.) website Semi-structured data on equal encyclopaedias website, natural language processing (Natural Language Processing, NLP) is in text Play the part of very important role in this information extraction process.During text mining, how to be extracted in mass text data Useful information is all of great value to enterprise or user.Sequence labelling is a kind of most basic and most common side NLP Method.How in Chinese sequence each word corresponding label is quickly and effectively predicted (for example, noun, name, place name, time Deng), for relation excavation, the important artificial intelligence task such as knowledge mapping plays a significant role.
In the prior art, medical treatment mark corpus is few, brings difficulty to medical text basis research work;Meanwhile medical treatment text This mark depends on annotation tool, and the mark period is long, consumes a large amount of man power and material.
Summary of the invention
The purpose of the present invention is to provide a kind of, and the medical text based on alternative manner names Entity recognition mask method, tool Have the advantages that annotating efficiency is high, mark is accurate and method is simple.
To achieve the goals above, technical solution used in the embodiment of the present invention is as follows:
A kind of medical text name Entity recognition mask method based on alternative manner, the method execute following steps:
Step 1: according to name entity class, preparing initialization seed word, basis of the seed words as successive iterations;
Step 2: it is based on the existing free text of medical treatment, seed words label is stamped to text, in name Entity recognition task, The beginning ending of seed words is respectively B and E, and middle word I, remaining word is O;
Step 3: model training is carried out to the corpus after first run mark, it is pre- to the completion of medical corpus of text according to model is generated It surveys, and extracts the entity word after prediction;
Step 4: web analysis being carried out using search-engine tool to the new round entity word of generation, according to the presence or absence of hundred The principle of section's entry is filtered, while can further supplement entity word resource according to the relational language of Internet resources, will be handled Entity word afterwards supplements dictionary;
Step 5: repeating step 2, step 3, step 4, complete more wheel iteration, when the number of iterations for reaching setting or increase newly Number of entries does not increase, then stops iteration;Using automation tools, the reality gone out according to the entity word of dictionary mark and model prediction Pronouns, general term for nouns, numerals and measure words, by boundary is inconsistent and the inconsistent entity word of classification extracts;To the inconsistent entity word corpus extracted, pass through Regular further amendment, is finally completed the mark of medical treatment name entity.
Further, described to be based on the existing free text of medical treatment, the method that text stamps seed words label is executed following Step:
Step S1: obtaining different keywords, generates the corresponding lists of keywords of different medical text, and be stored in database In;
Step S2: lists of keywords is read from database, and according to the pass of different medical text and medical text Keyword generates the corresponding unique identifier of different medical text, and constructs a unique dictionary tree according to the unique identifier, All unique dictionary trees constitute a basic dictionary tree object pool for being used for Chinese Word Segmentation Service;
Step S3: pending data is received, and base word is corresponded to according to the corresponding medical text to be processed of pending data Dictionary tree segments pending data in allusion quotation tree object pool;Keyword filtering is realized according to word segmentation result.
Further, in the step S1 the different keywords of different medical text by user maintenance to database.
Further, the step S2 further include:
Lists of keywords constructs different dictionaries in the way of a medical corresponding dictionary of text;The dictionary Format be X.dic, wherein X be dictionary title.
Further, the step S3 includes following sub-step:
S31: receiving pending data, judges the corresponding medical text of pending data, and jump to step S32;
S32: it is retrieved from basic dictionary tree object pool and the medical treatment text according to the corresponding medical text of pending data Corresponding dictionary tree;In the presence of jump to step S33;Otherwise step S34 is jumped to;
S33: segmenting pending data by the dictionary tree, realizes keyword filtering according to word segmentation result, terminates;
S34: judging whether there is the corresponding dictionary of corresponding with pending data medical text, in the presence of jump to step Otherwise rapid S35 jumps to step S36;
S35: according to the corresponding dictionary dynamic construction dictionary tree of the corresponding medical treatment text of pending data, and according to building Dictionary tree segments pending data, realizes keyword filtering according to word segmentation result, terminates;
S36: pre-set general dictionary is called, and general dictionary tree is constructed according to general dictionary, and according to building General dictionary tree segments pending data, realizes keyword filtering according to word segmentation result, terminates.
Further, the corpus after the mark to the first run carries out model training, according to generation model to medical text language Material completes prediction, and the method for extracting the entity word after prediction executes following steps:
Step A1: the corpus that will acquire is pre-processed;
Step A2: corpus pretreated in step A1 is inputted into preset learning model, the parameter of regularized learning algorithm model And it saves;
Step A3: being that the corpus obtained adds corresponding pre- mark respectively according to the sequence classification results that learning model exports Label carry out minimizing of the optimization to be fitted prediction label and manual tag using loss function of the manual tag to learning model Match, for unknown corpus, segmented using segmentation methods, using learning model adjusted to the unknown corpus after participle into The first mark of row;
Step A4: the unknown corpus marked for the first time in step A3 is subjected to tuning, the corpus after tuning is finally marked Note.
Further, the pretreatment in the step A1 includes merging big granularity participle and unified format.
A kind of medical text based on alternative manner provided in an embodiment of the present invention names Entity recognition mask method, has Below the utility model has the advantages that consuming a large amount of manpower and object using traditional annotation tool for extensive medical corpus labeling tool Power.The method that the method is combined using model and automation tools is suitable for large-scale medical text marking, reduces The period is marked, to facilitate the raising of research and development of products efficiency.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows the medical text name Entity recognition mask method provided in an embodiment of the present invention based on alternative manner Method flow schematic diagram.
Specific embodiment
Below in conjunction with attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Usually exist The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause This, is not intended to limit claimed invention to the detailed description of the embodiment of the present invention provided in the accompanying drawings below Range, but it is merely representative of selected embodiment of the invention.Based on the embodiment of the present invention, those skilled in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.Meanwhile of the invention In description, term " first ", " second " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
Embodiment 1:
As shown in Figure 1, a kind of medical text based on alternative manner names Entity recognition mask method, the method is executed Following steps:
Step 1: according to name entity class, preparing initialization seed word, basis of the seed words as successive iterations;
Step 2: it is based on the existing free text of medical treatment, seed words label is stamped to text, in name Entity recognition task, The beginning ending of seed words is respectively B and E, and middle word I, remaining word is O;
Step 3: model training is carried out to the corpus after first run mark, it is pre- to the completion of medical corpus of text according to model is generated It surveys, and extracts the entity word after prediction;
Step 4: web analysis being carried out using search-engine tool to the new round entity word of generation, according to the presence or absence of hundred The principle of section's entry is filtered, while can further supplement entity word resource according to the relational language of Internet resources, will be handled Entity word afterwards supplements dictionary;
Step 5: repeating step 2, step 3, step 4, complete more wheel iteration, when the number of iterations for reaching setting or increase newly Number of entries does not increase, then stops iteration;Using automation tools, the reality gone out according to the entity word of dictionary mark and model prediction Pronouns, general term for nouns, numerals and measure words, by boundary is inconsistent and the inconsistent entity word of classification extracts;To the inconsistent entity word corpus extracted, pass through Regular further amendment, is finally completed the mark of medical treatment name entity.
The technical principle of above-mentioned technical proposal are as follows: by extracting keyword, then the keyword of extraction is matched, in fact Existing function.
The technical effect of above-mentioned technical proposal are as follows: to facilitate the raising of research and development of products efficiency.
Embodiment 2:
It is described to be based on the existing free text of medical treatment on the basis of a upper embodiment, seed words label is stamped to text Method executes following steps:
Step S1: obtaining different keywords, generates the corresponding lists of keywords of different medical text, and be stored in database In;
Step S2: lists of keywords is read from database, and according to the pass of different medical text and medical text Keyword generates the corresponding unique identifier of different medical text, and constructs a unique dictionary tree according to the unique identifier, All unique dictionary trees constitute a basic dictionary tree object pool for being used for Chinese Word Segmentation Service;
Step S3: pending data is received, and base word is corresponded to according to the corresponding medical text to be processed of pending data Dictionary tree segments pending data in allusion quotation tree object pool;Keyword filtering is realized according to word segmentation result.
The technical principle of above-mentioned technical proposal are as follows: receive pending data, and corresponding to be processed according to pending data Medical text corresponds to dictionary tree in basic dictionary tree object pool and segments to pending data;It is realized according to word segmentation result crucial Word filtering
The technical effect of above-mentioned technical proposal are as follows: the accuracy of system, method can be promoted.
Embodiment 3:
On the basis of a upper embodiment, the different keywords of different medical text are tieed up by user in the step S1 Protect database.
The technical principle of above-mentioned technical proposal are as follows: keyword is inserted into database, it is ensured that keyword is permanently effective.
The technical effect of above-mentioned technical proposal are as follows: ensure that the reliability of method.
Embodiment 4:
On the basis of a upper embodiment, the step S2 further include:
Lists of keywords constructs different dictionaries in the way of a medical corresponding dictionary of text;The dictionary Format be X.dic, wherein X be dictionary title.
The technical principle of above-mentioned technical proposal are as follows: constructed in the way of a medical corresponding dictionary of text different Dictionary;The format of the dictionary is X.dic, and wherein X is dictionary title.
The technical effect of above-mentioned technical proposal are as follows: the efficiency of method for improving.
Embodiment 5:
On the basis of a upper embodiment, the step S3 includes following sub-step:
S31: receiving pending data, judges the corresponding medical text of pending data, and jump to step S32;
S32: it is retrieved from basic dictionary tree object pool and the medical treatment text according to the corresponding medical text of pending data Corresponding dictionary tree;In the presence of jump to step S33;Otherwise step S34 is jumped to;
S33: segmenting pending data by the dictionary tree, realizes keyword filtering according to word segmentation result, terminates;
S34: judging whether there is the corresponding dictionary of corresponding with pending data medical text, in the presence of jump to step Otherwise rapid S35 jumps to step S36;
S35: according to the corresponding dictionary dynamic construction dictionary tree of the corresponding medical treatment text of pending data, and according to building Dictionary tree segments pending data, realizes keyword filtering according to word segmentation result, terminates;
S36: pre-set general dictionary is called, and general dictionary tree is constructed according to general dictionary, and according to building General dictionary tree segments pending data, realizes keyword filtering according to word segmentation result, terminates.
The technical principle of above-mentioned technical proposal are as follows: call pre-set general dictionary, and constructed and led to according to general dictionary With dictionary tree, and pending data is segmented according to the general dictionary tree of building, keyword mistake is realized according to word segmentation result Filter.
The technical effect of above-mentioned technical proposal are as follows: the accuracy of method for improving.
Embodiment 6:
On the basis of a upper embodiment, model training is carried out to the corpus after first run mark, according to generation model to doctor It treats corpus of text and completes prediction, and the method for extracting the entity word after prediction executes following steps:
Step A1: the corpus that will acquire is pre-processed;
Step AA2: corpus pretreated in step A1 is inputted into preset learning model, the parameter of regularized learning algorithm model And it saves;
Step A3: being that the corpus obtained adds corresponding pre- mark respectively according to the sequence classification results that learning model exports Label carry out minimizing of the optimization to be fitted prediction label and manual tag using loss function of the manual tag to learning model Match, for unknown corpus, segmented using segmentation methods, using learning model adjusted to the unknown corpus after participle into The first mark of row;
Step A4: the unknown corpus marked for the first time in step A3 is subjected to tuning, the corpus after tuning is finally marked Note.
The technical principle of above-mentioned technical proposal are as follows: according to the corpus point that the sequence classification results that learning model exports are acquisition Corresponding prediction label is not added, carries out minimizing optimization using loss function of the manual tag to learning model to be fitted prediction The matching of label and manual tag segments unknown corpus using segmentation methods, utilizes learning model pair adjusted Unknown corpus after participle is marked for the first time.
The technical effect of above-mentioned technical proposal are as follows: method has learning-oriented, growth.
Embodiment 7
On the basis of a upper embodiment, pretreatment in the step A1 includes merging big granularity participle and uniformly Format.
The technical principle of above-mentioned technical proposal are as follows: merged using unified format, so that result is more accurate.
The technical effect of above-mentioned technical proposal are as follows: improve the accuracy of method.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and block diagram in attached drawing Show the device of multiple embodiments according to the present invention, the architectural framework in the cards of method and computer program product, Function and operation.In this regard, each box in flowchart or block diagram can represent the one of a unit, program segment or code Part, a part of the unit, program segment or code, which includes that one or more is for implementing the specified logical function, to be held Row instruction.It should also be noted that function marked in the box can also be to be different from some implementations as replacement The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes It can execute in the opposite order, this depends on the function involved.It is also noted that every in block diagram and or flow chart The combination of box in a box and block diagram and or flow chart can use the dedicated base for executing defined function or movement It realizes, or can realize using a combination of dedicated hardware and computer instructions in the system of hardware.
In addition, each functional unit in each embodiment of the present invention can integrate one independent portion of formation together Point, it is also possible to each unit individualism, an independent part can also be integrated to form with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Onl8Memor8), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memor8), magnetic or disk.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.It should also be noted that similar label and letter exist Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing It is further defined and explained.

Claims (7)

1. a kind of medical text based on alternative manner names Entity recognition mask method, which is characterized in that the method executes Following steps:
Step 1: according to name entity class, preparing initialization seed word, basis of the seed words as successive iterations;
Step 2: it is based on the existing free text of medical treatment, seed words label is stamped to text, in name Entity recognition task, seed The beginning ending of word is respectively B and E, and middle word I, remaining word is O;
Step 3: model training is carried out to the corpus after first run mark, medical corpus of text is completed to predict according to model is generated, And extract the entity word after prediction;
Step 4: web analysis being carried out using search-engine tool to the new round entity word of generation, according to the presence or absence of encyclopaedia word The principle of item is filtered, while can further supplement entity word resource according to the relational language of Internet resources, by treated Entity word supplements dictionary;
Step 5: repeating step 2, step 3, step 4, more wheel iteration are completed, when the number of iterations or newly-increased entry for reaching setting Quantity does not increase, then stops iteration;Using automation tools, the entity gone out according to the entity word of dictionary mark and model prediction Word, by boundary is inconsistent and the inconsistent entity word of classification extracts;To the inconsistent entity word corpus extracted, pass through rule Then further amendment is finally completed the mark of medical treatment name entity.
2. the medical text based on alternative manner names Entity recognition mask method as described in claim 1, which is characterized in that Described to be based on the existing free text of medical treatment, the method for stamping seed words label to text executes following steps:
Step S1: obtaining different keywords, generates the corresponding lists of keywords of different medical text, and be stored in database profession;
Step S2: lists of keywords is read from database, and according to the keyword of different medical text and medical text The corresponding unique identifier of different medical text is generated, and a unique dictionary tree is constructed according to the unique identifier, is owned Unique dictionary tree constitutes a basic dictionary tree object pool for being used for Chinese Word Segmentation Service;
Step S3: pending data is received, and basic dictionary tree is corresponded to according to the corresponding medical text to be processed of pending data Dictionary tree segments pending data in object pool;Keyword filtering is realized according to word segmentation result.
3. the medical text based on alternative manner names Entity recognition mask method as claimed in claim 2, which is characterized in that The different keywords of different medical text are by user maintenance to database in the step S1.
4. as claim 3 names Entity recognition mask method based on the medical text of alternative manner, which is characterized in that the step Rapid S2 further include:
Lists of keywords constructs different dictionaries in the way of a medical corresponding dictionary of text;The lattice of the dictionary Formula is X.dic, and wherein X is dictionary title.
5. as claim 4 names Entity recognition mask method based on the medical text of alternative manner, which is characterized in that the step Rapid S3 includes following sub-step:
S31: receiving pending data, judges the corresponding medical text of pending data, and jump to step S32;
S32: it is retrieved from basic dictionary tree object pool according to the corresponding medical text of pending data corresponding with the medical treatment text Dictionary tree;In the presence of jump to step S33;Otherwise step S34 is jumped to;
S33: segmenting pending data by the dictionary tree, realizes keyword filtering according to word segmentation result, terminates;
S34: judging whether there is the corresponding dictionary of corresponding with pending data medical text, in the presence of jump to step Otherwise S35 jumps to step S36;
S35: according to the corresponding dictionary dynamic construction dictionary tree of the corresponding medical treatment text of pending data, and according to the dictionary of building Tree segments pending data, realizes keyword filtering according to word segmentation result, terminates;
S36: pre-set general dictionary is called, and general dictionary tree is constructed according to general dictionary, and according to the general of building Dictionary tree segments pending data, realizes keyword filtering according to word segmentation result, terminates.
6. the medical text based on alternative manner names Entity recognition mask method as claimed in claim 5, which is characterized in that Corpus after the mark to the first run carries out model training, completes to predict to medical corpus of text according to model is generated, and extract The method of entity word after prediction executes following steps:
Step A1: the corpus that will acquire is pre-processed;
Step A2: inputting preset learning model for corpus pretreated in step A1, the parameter of regularized learning algorithm model and guarantor It deposits;
Step A3: being that the corpus obtained adds corresponding prediction label respectively according to the sequence classification results that learning model exports, It carries out minimizing matching of the optimization to be fitted prediction label and manual tag using loss function of the manual tag to learning model, It for unknown corpus, is segmented using segmentation methods, unknown corpus after participle is carried out using learning model adjusted First mark;
Step A4: the unknown corpus marked for the first time in step A3 is subjected to tuning, the corpus after tuning is finally marked.
7. the medical text based on alternative manner names Entity recognition mask method as claimed in claim 6, which is characterized in that Pretreatment in the step A1 includes merging big granularity participle and unified format.
CN201910257482.1A 2019-04-01 2019-04-01 Medical text named entity identification and labeling method based on iteration method Active CN110008473B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910257482.1A CN110008473B (en) 2019-04-01 2019-04-01 Medical text named entity identification and labeling method based on iteration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910257482.1A CN110008473B (en) 2019-04-01 2019-04-01 Medical text named entity identification and labeling method based on iteration method

Publications (2)

Publication Number Publication Date
CN110008473A true CN110008473A (en) 2019-07-12
CN110008473B CN110008473B (en) 2022-11-25

Family

ID=67169242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910257482.1A Active CN110008473B (en) 2019-04-01 2019-04-01 Medical text named entity identification and labeling method based on iteration method

Country Status (1)

Country Link
CN (1) CN110008473B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178080A (en) * 2020-01-02 2020-05-19 杭州涂鸦信息技术有限公司 Named entity identification method and system based on structured information
WO2021114632A1 (en) * 2020-05-13 2021-06-17 平安科技(深圳)有限公司 Disease name standardization method, apparatus, device, and storage medium
WO2021139257A1 (en) * 2020-06-24 2021-07-15 平安科技(深圳)有限公司 Method and apparatus for selecting annotated data, and computer device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140310207A1 (en) * 2013-04-10 2014-10-16 Lifecom, Inc. Chronology-centric, case-entity information handling system and methodology
CN107622050A (en) * 2017-09-14 2018-01-23 武汉烽火普天信息技术有限公司 Text sequence labeling system and method based on Bi LSTM and CRF

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140310207A1 (en) * 2013-04-10 2014-10-16 Lifecom, Inc. Chronology-centric, case-entity information handling system and methodology
CN107622050A (en) * 2017-09-14 2018-01-23 武汉烽火普天信息技术有限公司 Text sequence labeling system and method based on Bi LSTM and CRF

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田家源等: "面向互联网资源的医学命名实体识别研究", 《计算机科学与探索》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178080A (en) * 2020-01-02 2020-05-19 杭州涂鸦信息技术有限公司 Named entity identification method and system based on structured information
CN111178080B (en) * 2020-01-02 2023-07-18 杭州涂鸦信息技术有限公司 Named entity identification method and system based on structured information
WO2021114632A1 (en) * 2020-05-13 2021-06-17 平安科技(深圳)有限公司 Disease name standardization method, apparatus, device, and storage medium
WO2021139257A1 (en) * 2020-06-24 2021-07-15 平安科技(深圳)有限公司 Method and apparatus for selecting annotated data, and computer device and storage medium

Also Published As

Publication number Publication date
CN110008473B (en) 2022-11-25

Similar Documents

Publication Publication Date Title
CN104598535B (en) A kind of event extraction method based on maximum entropy
CN107705066A (en) Information input method and electronic equipment during a kind of commodity storage
CN103853834B (en) Text structure analysis-based Web document abstract generation method
CN104199972A (en) Named entity relation extraction and construction method based on deep learning
CN103823824A (en) Method and system for automatically constructing text classification corpus by aid of internet
CN109635288A (en) A kind of resume abstracting method based on deep neural network
CN103077164A (en) Text analysis method and text analyzer
CN103294781A (en) Method and equipment used for processing page data
CN110008473A (en) A kind of medical text name Entity recognition mask method based on alternative manner
CN108287911A (en) A kind of Relation extraction method based on about fasciculation remote supervisory
CN103324700A (en) Noumenon concept attribute learning method based on Web information
KR101724398B1 (en) A generation system and method of a corpus for named-entity recognition using knowledge bases
KR101801257B1 (en) Text-Mining Application Technique for Productive Construction Document Management
CN103530429A (en) Webpage content extracting method
CN104699797A (en) Webpage data structured analytic method and device
Azir et al. Wrapper approaches for web data extraction: A review
CN111143571B (en) Entity labeling model training method, entity labeling method and device
CN109710930A (en) A kind of Chinese Resume analytic method based on deep neural network
CN107436931B (en) Webpage text extraction method and device
Owen et al. Towards a scientific workflow featuring Natural Language Processing for the digitisation of natural history collections.
CN106372232B (en) Information mining method and device based on artificial intelligence
CN115186015A (en) Network security knowledge graph construction method and system
CN104834718A (en) Recognition method and system for event argument based on maximum entropy model
CN105335446A (en) Short text classification model generation method and classification method based on word vector
CN109002561A (en) Automatic document classification method, system and medium based on sample keyword learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant