CN110008473B - Medical text named entity identification and labeling method based on iteration method - Google Patents

Medical text named entity identification and labeling method based on iteration method Download PDF

Info

Publication number
CN110008473B
CN110008473B CN201910257482.1A CN201910257482A CN110008473B CN 110008473 B CN110008473 B CN 110008473B CN 201910257482 A CN201910257482 A CN 201910257482A CN 110008473 B CN110008473 B CN 110008473B
Authority
CN
China
Prior art keywords
medical
word
processed
text
medical text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910257482.1A
Other languages
Chinese (zh)
Other versions
CN110008473A (en
Inventor
陈储培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Shanghai Intelligent Technology Co Ltd
Original Assignee
Unisound Shanghai Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Shanghai Intelligent Technology Co Ltd filed Critical Unisound Shanghai Intelligent Technology Co Ltd
Priority to CN201910257482.1A priority Critical patent/CN110008473B/en
Publication of CN110008473A publication Critical patent/CN110008473A/en
Application granted granted Critical
Publication of CN110008473B publication Critical patent/CN110008473B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The embodiment of the invention provides a medical text named entity identification and labeling method based on an iteration method, and relates to the technical field of medical information. For a large-scale medical corpus labeling tool, a traditional labeling tool is used, and a large amount of manpower and material resources are consumed. The method combines a model and an automatic tool, is suitable for large-scale medical text labeling, and reduces the labeling period, thereby being beneficial to improving the product research and development efficiency.

Description

Medical text named entity identification and labeling method based on iteration method
Technical Field
The invention relates to the technical field of medical information, in particular to a medical text named entity identification and labeling method based on an iteration method.
Background
The medical field is different from the general field and has a certain specialty. The research in the medical field cannot leave the support of medical corpus, and in the medical research field, sequence annotation is a basic and very important work. However, the named entity identification and labeling needs a large amount of manpower and material resources, and the current mainstream sequence labeling is performed by means of an open source labeling tool, so that the labeling period is long, and the medical field also relates to the knowledge with strong specialty, so that the medical sequence labeling task is difficult. In order to improve the labeling efficiency, an iteration-based automatic labeling method for medical named entities is provided.
With the development of the internet, the mobile internet and big data technology, the scale of various text data resources is showing explosive growth, mainly including unstructured data on social media (e.g. microblog number, public number, facebook, twitter, etc.) and news media (e.g. people's daily news, phoenix news, fox search news, etc.) websites, and semi-structured data on encyclopedia websites, such as encyclopedia and wiki, natural Language Processing (NLP) plays a very important role in the text information extraction process. In the text mining process, how to extract useful information from massive text data is valuable to enterprises or users. Sequence labeling is one of the most basic and most commonly used NLP methods. How to quickly and effectively predict the corresponding labels (such as nouns, names of people, names of places, time and the like) of each word in the Chinese sequence plays an important role in important artificial intelligence tasks such as relationship mining, knowledge graph spectrums and the like.
In the prior art, the medical labeling corpus is less, which brings difficulty to the basic research work of medical texts; meanwhile, the medical text labeling depends on a labeling tool, the labeling period is long, and a large amount of manpower and material resources are consumed.
Disclosure of Invention
The invention aims to provide a medical text named entity identification and marking method based on an iteration method, which has the advantages of high marking efficiency, accurate marking and simple method.
In order to achieve the above object, the embodiments of the present invention adopt the following technical solutions:
a medical text named entity identification labeling method based on an iteration method comprises the following steps:
step 1: preparing an initialized seed word according to the category of the named entity, wherein the seed word is used as the basis of subsequent iteration;
step 2: based on the existing medical free text, marking a seed word label on the text, wherein in the named entity recognition task, the beginning and the end of the seed word are respectively B and E, the middle character is I, and the rest words are O;
and step 3: performing model training on the first round labeled corpus, completing prediction on the medical text corpus according to the generated model, and extracting predicted entity words;
and 4, step 4: performing webpage analysis on the generated new round of entity words by using a search engine tool, filtering according to the principle of whether encyclopedia entries exist or not, simultaneously further supplementing entity word resources according to related terms of network resources, and supplementing the processed entity words to a dictionary base;
and 5: repeating the step 2, the step 3 and the step 4 to complete multiple rounds of iteration, and stopping iteration when the set iteration times are reached or the number of newly added entries is not increased; extracting entity words with inconsistent boundaries and inconsistent categories by using an automatic tool according to the entity words marked by the dictionary and the entity words predicted by the model; and further correcting the extracted inconsistent entity word material through rules, and finally completing the labeling of the medical named entity.
Further, the method for labeling the text with the seed words based on the existing medical free text comprises the following steps:
step S1: acquiring different keywords, generating keyword lists corresponding to different medical texts, and storing the keyword lists in a database;
step S2: reading a keyword list from a database, generating unique identifiers corresponding to different medical texts according to different medical texts and keywords of the medical texts, and constructing a unique dictionary tree according to the unique identifiers, wherein all the unique dictionary trees form a basic dictionary tree object pool for word segmentation service;
and step S3: receiving data to be processed, and performing word segmentation on the data to be processed according to a dictionary tree in a basic dictionary tree object pool corresponding to medical text to be processed corresponding to the data to be processed; and filtering the keywords according to the word segmentation result.
Further, different keywords of different medical texts in the step S1 are maintained to the database by the user.
Further, the step S2 further includes:
the keyword list is used for constructing different word banks according to the mode that one medical text corresponds to one word bank; the format of the lexicon is X.dic, wherein X is the name of the lexicon.
Further, the step S3 includes the following sub-steps:
s31: receiving data to be processed, judging a medical text corresponding to the data to be processed, and jumping to the step S32;
s32: retrieving a dictionary tree corresponding to the medical text from a basic dictionary tree object pool according to the medical text corresponding to the data to be processed; if yes, jumping to step S33; otherwise, jumping to step S34;
s33: performing word segmentation on the data to be processed through the dictionary tree, filtering the keywords according to word segmentation results, and ending;
s34: judging whether a word stock corresponding to the medical text corresponding to the data to be processed exists or not, if so, skipping to the step S35, otherwise, skipping to the step S36;
s35: dynamically constructing a dictionary tree according to a word bank corresponding to the medical text corresponding to the data to be processed, segmenting words according to the constructed dictionary tree, realizing keyword filtering according to word segmentation results, and ending;
s36: and calling a preset general word bank, constructing a general dictionary tree according to the general word bank, segmenting words of the data to be processed according to the constructed general dictionary tree, filtering the keywords according to a word segmentation result, and ending.
Further, the method for performing model training on the corpus after the first round of labeling, completing prediction on the medical text corpus according to the generated model, and extracting the predicted entity words executes the following steps:
step A1: preprocessing the acquired corpus;
step A2: inputting the linguistic data preprocessed in the step A1 into a preset learning model, adjusting parameters of the learning model and storing the parameters;
step A3: respectively adding corresponding prediction labels to the obtained corpora according to a sequence classification result output by the learning model, performing minimum optimization on a loss function of the learning model by using the artificial labels to fit the matching of the prediction labels and the artificial labels, performing word segmentation on unknown corpora by using a word segmentation algorithm, and performing primary labeling on the unknown corpora subjected to word segmentation by using the adjusted learning model;
step A4: and D, adjusting the unknown corpus which is labeled for the first time in the step A3, and finally labeling the adjusted corpus.
Further, the preprocessing in the step A1 includes merging large-granularity participles and unifying formats.
The medical text named entity identification and labeling method based on the iteration method has the following beneficial effects that: for a large-scale medical corpus labeling tool, a traditional labeling tool is used, and a large amount of manpower and material resources are consumed. The method combines a model and an automatic tool, is suitable for large-scale medical text labeling, and reduces the labeling period, thereby being beneficial to improving the product research and development efficiency.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 shows a method flow diagram of a medical text named entity identification and tagging method based on an iterative method according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Example 1:
as shown in fig. 1, a medical text named entity identification and labeling method based on an iterative method performs the following steps:
step 1: preparing an initialization seed word according to the category of the named entity, wherein the seed word is used as the basis of subsequent iteration;
step 2: based on the existing medical free text, marking a seed word label on the text, wherein in the named entity recognition task, the beginning and the end of the seed word are respectively B and E, the middle character is I, and the rest words are O;
and step 3: performing model training on the first round labeled corpus, completing prediction on the medical text corpus according to the generated model, and extracting predicted entity words;
and 4, step 4: performing webpage analysis on the generated new round of entity words by using a search engine tool, filtering according to the principle of whether encyclopedia entries exist or not, simultaneously further supplementing entity word resources according to related terms of network resources, and supplementing the processed entity words to a dictionary base;
and 5: repeating the step 2, the step 3 and the step 4 to complete multiple rounds of iteration, and stopping iteration when the set iteration times are reached or the number of newly added entries is not increased; extracting entity words with inconsistent boundaries and inconsistent categories by using an automatic tool according to the entity words marked by the dictionary and the entity words predicted by the model; and further correcting the extracted inconsistent entity word material through rules, and finally completing the labeling of the medical named entity.
The technical principle of the technical scheme is as follows: the functions are realized by extracting the keywords and then matching the extracted keywords.
The technical effect of the technical scheme is as follows: thereby contributing to the improvement of the research and development efficiency of the product.
Example 2:
on the basis of the previous embodiment, the method for labeling the text with the seed words based on the existing medical free text performs the following steps:
step S1: acquiring different keywords, generating a keyword list corresponding to different medical texts, and storing the keyword list in a database;
step S2: reading the keyword list from the database, generating unique identifiers corresponding to different medical texts according to different medical texts and keywords of the medical texts, constructing a unique dictionary tree according to the unique identifiers, and forming a basic dictionary tree object pool for word segmentation service by all the unique dictionary trees;
and step S3: receiving data to be processed, and performing word segmentation on the data to be processed according to a dictionary tree in a basic dictionary tree object pool corresponding to medical text to be processed corresponding to the data to be processed; and filtering the keywords according to the word segmentation result.
The technical principle of the technical scheme is as follows: receiving data to be processed, and performing word segmentation on the data to be processed according to a dictionary tree in a basic dictionary tree object pool corresponding to medical text to be processed corresponding to the data to be processed; keyword filtering according to word segmentation result
The technical effect of the technical scheme is as follows: the accuracy of the method can be improved.
Example 3:
on the basis of the above embodiment, different keywords of different medical texts in the step S1 are maintained to the database by the user.
The technical principle of the technical scheme is as follows: the keywords are filled into the database, so that the keywords can be ensured to be effective for a long time.
The technical effect of the technical scheme is as follows: the reliability of the method is ensured.
Example 4:
on the basis of the above embodiment, the step S2 further includes:
the keyword list is used for constructing different word banks according to the mode that one medical text corresponds to one word bank; the format of the lexicon is X.dic, wherein X is the name of the lexicon.
The technical principle of the technical scheme is as follows: constructing different word banks according to the mode that one medical text corresponds to one word bank; the format of the lexicon is X.dic, wherein X is the name of the lexicon.
The technical effect of the technical scheme is as follows: the efficiency of the method is improved.
Example 5:
on the basis of the above embodiment, the step S3 includes the following sub-steps:
s31: receiving data to be processed, judging a medical text corresponding to the data to be processed, and jumping to the step S32;
s32: retrieving a dictionary tree corresponding to the medical text from a basic dictionary tree object pool according to the medical text corresponding to the data to be processed; if yes, jumping to step S33; otherwise, jumping to step S34;
s33: performing word segmentation on the data to be processed through the dictionary tree, filtering the keywords according to word segmentation results, and ending;
s34: judging whether a word stock corresponding to the medical text corresponding to the data to be processed exists or not, if so, skipping to the step S35, otherwise, skipping to the step S36;
s35: dynamically constructing a dictionary tree according to a word bank corresponding to the medical text corresponding to the data to be processed, segmenting words according to the constructed dictionary tree, filtering keywords according to a word segmentation result, and ending;
s36: and calling a preset general word bank, constructing a general dictionary tree according to the general word bank, segmenting words of the data to be processed according to the constructed general dictionary tree, filtering the keywords according to a word segmentation result, and ending.
The technical principle of the technical scheme is as follows: and calling a preset general word bank, constructing a general dictionary tree according to the general word bank, segmenting words of the data to be processed according to the constructed general dictionary tree, and filtering the keywords according to a word segmentation result.
The technical effect of the technical scheme is as follows: the accuracy of the method is improved.
Example 6:
on the basis of the previous embodiment, model training is performed on the corpus after the first round of labeling, prediction is completed on the corpus of the medical text according to the generated model, and the method for extracting the predicted entity words executes the following steps:
step A1: preprocessing the acquired corpus;
step AA2: inputting the linguistic data preprocessed in the step A1 into a preset learning model, adjusting parameters of the learning model and storing the parameters;
step A3: respectively adding corresponding prediction labels to the obtained corpora according to a sequence classification result output by the learning model, performing minimum optimization on a loss function of the learning model by using the artificial labels to fit the matching of the prediction labels and the artificial labels, performing word segmentation on unknown corpora by using a word segmentation algorithm, and performing primary labeling on the unknown corpora subjected to word segmentation by using the adjusted learning model;
step A4: and D, adjusting the unknown corpus which is labeled for the first time in the step A3, and finally labeling the adjusted corpus.
The technical principle of the technical scheme is as follows: respectively adding corresponding prediction labels to the obtained corpora according to sequence classification results output by the learning model, performing minimum optimization on a loss function of the learning model by using the artificial labels to fit the matching of the prediction labels and the artificial labels, performing word segmentation on unknown corpora by using a word segmentation algorithm, and performing primary labeling on the unknown corpora subjected to word segmentation by using the adjusted learning model.
The technical effect of the technical scheme is as follows: the method has learning and growth promoting effects.
Example 7
On the basis of the above embodiment, the preprocessing in step A1 includes merging large-granularity participles and uniform formats.
The technical principle of the technical scheme is as follows: and the unified format is used for merging, so that the result is more accurate.
The technical effect of the technical scheme is as follows: the accuracy of the method is improved.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a unit, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional unit in the embodiments of the present invention may be integrated together to form an independent part, or each unit may exist separately, or two or more units may be integrated to form an independent part.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a Read-only memory (ROM, read-on 8 memory 8), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Claims (7)

1. A medical text named entity identification and labeling method based on an iteration method is characterized by comprising the following steps:
step 1: preparing an initialized seed word according to the category of the named entity, wherein the seed word is used as the basis of subsequent iteration;
step 2: based on the existing medical free text, marking a seed word label on the text, wherein in the named entity recognition task, the beginning and the end of the seed word are respectively B and E, the middle character is I, and the rest words are O;
and step 3: performing model training on the first round labeled corpus, completing prediction on the medical text corpus according to the generated model, and extracting predicted entity words;
and 4, step 4: performing webpage analysis on the generated new round of entity words by using a search engine tool, filtering according to the principle of whether encyclopedic entries exist or not, further supplementing entity word resources according to related terms of network resources, and supplementing the processed entity words to a dictionary base;
and 5: repeating the step 2, the step 3 and the step 4 to complete multiple rounds of iteration, and stopping iteration when the set iteration times are reached or the number of newly added entries is not increased; extracting entity words with inconsistent boundaries and inconsistent categories by using an automatic tool according to the entity words marked by the dictionary and the entity words predicted by the model; and further correcting the extracted inconsistent entity word material through rules, and finally completing the labeling of the medical named entity.
2. The medical text named entity recognition tagging method based on an iterative approach as recited in claim 1, wherein said method for tagging a text with seed words based on existing medical free text comprises the following steps:
step S1: acquiring different keywords, generating keyword lists corresponding to different medical texts, and storing the keyword lists in a database;
step S2: reading the keyword list from the database, generating unique identifiers corresponding to different medical texts according to different medical texts and keywords of the medical texts, constructing a unique dictionary tree according to the unique identifiers, and forming a basic dictionary tree object pool for word segmentation service by all the unique dictionary trees;
and step S3: receiving data to be processed, and performing word segmentation on the data to be processed according to a dictionary tree in a basic dictionary tree object pool corresponding to medical text to be processed corresponding to the data to be processed; and filtering the keywords according to the word segmentation result.
3. The medical text named entity recognition tagging method based on iterative approach as claimed in claim 2, wherein different keywords of different medical texts in step S1 are maintained by the user to the database.
4. The method for medical text named entity recognition tagging based on iterative approach as recited in claim 3, wherein said step S2 further comprises:
the keyword list is used for constructing different word banks according to the mode that one medical text corresponds to one word bank; the format of the lexicon is X.dic, wherein X is the name of the lexicon.
5. The iterative method-based medical text named entity recognition tagging method of claim 4, wherein said step S3 comprises the sub-steps of:
s31: receiving data to be processed, judging a medical text corresponding to the data to be processed, and jumping to the step S32;
s32: retrieving a dictionary tree corresponding to the medical text from a basic dictionary tree object pool according to the medical text corresponding to the data to be processed; if yes, jumping to step S33; otherwise, jumping to step S34;
s33: performing word segmentation on the data to be processed through the dictionary tree, filtering the keywords according to word segmentation results, and ending;
s34: judging whether a word stock corresponding to the medical text corresponding to the data to be processed exists or not, if so, skipping to the step S35, otherwise, skipping to the step S36;
s35: dynamically constructing a dictionary tree according to a word bank corresponding to the medical text corresponding to the data to be processed, segmenting words according to the constructed dictionary tree, realizing keyword filtering according to word segmentation results, and ending;
s36: and calling a preset general word bank, constructing a general dictionary tree according to the general word bank, segmenting words of the data to be processed according to the constructed general dictionary tree, filtering the keywords according to a word segmentation result, and ending.
6. The iterative process-based medical text named entity recognition tagging method of claim 5, wherein the method for model training the first labeled corpus, performing prediction on the medical text corpus according to the generated model, and extracting the predicted entity words performs the following steps:
step A1: preprocessing the acquired corpus;
step A2: inputting the corpus preprocessed in the step A1 into a preset learning model, adjusting parameters of the learning model and storing the parameters;
step A3: respectively adding corresponding prediction labels to the obtained corpora according to a sequence classification result output by the learning model, performing minimum optimization on a loss function of the learning model by using the artificial labels to fit the matching of the prediction labels and the artificial labels, performing word segmentation on unknown corpora by using a word segmentation algorithm, and performing primary labeling on the unknown corpora subjected to word segmentation by using the adjusted learning model;
step A4: and B, tuning the unknown corpus primarily labeled in the step A3, and finally labeling the tuned corpus.
7. The iterative method-based medical text named entity recognition tagging method of claim 6, wherein said preprocessing in step A1 comprises merging large-grained participles and uniform formatting.
CN201910257482.1A 2019-04-01 2019-04-01 Medical text named entity identification and labeling method based on iteration method Active CN110008473B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910257482.1A CN110008473B (en) 2019-04-01 2019-04-01 Medical text named entity identification and labeling method based on iteration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910257482.1A CN110008473B (en) 2019-04-01 2019-04-01 Medical text named entity identification and labeling method based on iteration method

Publications (2)

Publication Number Publication Date
CN110008473A CN110008473A (en) 2019-07-12
CN110008473B true CN110008473B (en) 2022-11-25

Family

ID=67169242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910257482.1A Active CN110008473B (en) 2019-04-01 2019-04-01 Medical text named entity identification and labeling method based on iteration method

Country Status (1)

Country Link
CN (1) CN110008473B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178080B (en) * 2020-01-02 2023-07-18 杭州涂鸦信息技术有限公司 Named entity identification method and system based on structured information
CN111696635A (en) * 2020-05-13 2020-09-22 平安科技(深圳)有限公司 Disease name standardization method and device
CN111832294B (en) * 2020-06-24 2022-08-16 平安科技(深圳)有限公司 Method and device for selecting marking data, computer equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622050A (en) * 2017-09-14 2018-01-23 武汉烽火普天信息技术有限公司 Text sequence labeling system and method based on Bi LSTM and CRF

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014169063A1 (en) * 2013-04-10 2014-10-16 Lifecom, Inc. Chronology-centric, case-entity information handling system and methodology

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622050A (en) * 2017-09-14 2018-01-23 武汉烽火普天信息技术有限公司 Text sequence labeling system and method based on Bi LSTM and CRF

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向互联网资源的医学命名实体识别研究;田家源等;《计算机科学与探索》;20171016(第06期);全文 *

Also Published As

Publication number Publication date
CN110008473A (en) 2019-07-12

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intention identification method and device, application search method and server
CN109189942B (en) Construction method and device of patent data knowledge graph
CN109145153B (en) Intention category identification method and device
US11030199B2 (en) Systems and methods for contextual retrieval and contextual display of records
US20210064821A1 (en) System and method to extract customized information in natural language text
US10423649B2 (en) Natural question generation from query data using natural language processing system
US9645988B1 (en) System and method for identifying passages in electronic documents
Kejriwal et al. Information extraction in illicit web domains
Sanyal et al. Resume parser with natural language processing
CN110008473B (en) Medical text named entity identification and labeling method based on iteration method
CN108399157B (en) Dynamic extraction method of entity and attribute relationship, server and readable storage medium
CN114036930A (en) Text error correction method, device, equipment and computer readable medium
Consoli et al. Embeddings for named entity recognition in geoscience Portuguese literature
CN114153978A (en) Model training method, information extraction method, device, equipment and storage medium
Leonandya et al. A semi-supervised algorithm for Indonesian named entity recognition
CN111930936A (en) Method and system for excavating platform message text
CN108345694B (en) Document retrieval method and system based on theme database
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
Putra et al. Document Classification using Naïve Bayes for Indonesian Translation of the Quran
Acs et al. Hunaccent: Small footprint diacritic restoration for social media
CN110727764A (en) Phone operation generation method and device and phone operation generation equipment
CN114298048A (en) Named entity identification method and device
Altınel et al. Performance Analysis of Different Sentiment Polarity Dictionaries on Turkish Sentiment Detection
CN109597879B (en) Service behavior relation extraction method and device based on 'citation relation' data
Mande et al. Regular Expression Rule-Based Algorithm for Multiple Documents Key Information Extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant