CN114169338A - Medical named entity identification method and device and electronic equipment - Google Patents
Medical named entity identification method and device and electronic equipment Download PDFInfo
- Publication number
- CN114169338A CN114169338A CN202210125810.4A CN202210125810A CN114169338A CN 114169338 A CN114169338 A CN 114169338A CN 202210125810 A CN202210125810 A CN 202210125810A CN 114169338 A CN114169338 A CN 114169338A
- Authority
- CN
- China
- Prior art keywords
- data
- ner
- labeled
- models
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a medical named entity identification method, a medical named entity identification device and electronic equipment. The method comprises the following steps: training by utilizing a labeling data set to obtain a plurality of named entity recognition NER models of different types; selecting data to be labeled from unlabeled data by using an active learning method based on a plurality of NER models; predicting the category of the data to be labeled by utilizing a plurality of NER models respectively; and fusing the predicted results to obtain the category of the data to be labeled. The technical scheme realizes the effect of achieving equivalent performance of a large amount of data by using a small amount of data. Actual use data shows that the method provided by the invention can achieve the performance of about 90% of full data under 10% of labeled data. Therefore, the method of the invention well meets the actual requirements of the information extraction application scene under the condition that the medical scene lacks enough labeling information.
Description
Technical Field
The invention relates to the technical field of medical data processing, in particular to a medical named entity identification method and device and electronic equipment.
Background
Named Entity Recognition (NER) in the medical field is a foundation for constructing medical knowledge maps and medical big data and is an important foundation for realizing intelligent analysis of cases and medical intellectualization.
At present, medical NER tasks are mainly realized by applying deep learning technology. In the application process of the deep learning technology, a large amount of labeled data is needed to train the model. Medical data is scarce due to privacy and sensitivity of the medical data, and data labeled for recognition by named entities is scarce. Therefore, the deep learning technology meets a great bottleneck on the medical NER task, and the medical NER task under the condition of a small amount of labeled data cannot be met.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides the following technical scheme.
The invention provides a medical named entity identification method on one hand, which comprises the following steps:
training by utilizing a labeling data set to obtain a plurality of named entity recognition NER models of different types;
selecting data to be labeled from unlabeled data by using an active learning method based on a plurality of NER models;
predicting the category of the data to be labeled by utilizing a plurality of NER models respectively;
and fusing the predicted results to obtain the category of the data to be labeled.
Preferably, the plurality of named entity recognition NER models of different types comprises: deep learning models, statistical learning models, and/or knowledge-based models.
Preferably, the selecting data to be labeled from unlabeled data by using an active learning method based on the plurality of NER models includes:
respectively predicting the distribution of the unlabeled data in each category by using each NER model;
calculating the distribution consistency of the unlabeled data in each category;
and determining the data to be labeled from all the unlabeled data according to the consistency.
Preferably, the consistency of the distribution of each unlabeled data in each category is calculated; determining data to be marked from all unmarked data according to the consistency, and adopting the following formula:
in the formula (I), the compound is shown in the specification,in order to not label the data,is the M-th entity class, M is the total number of entity classes,is as followsThe number of the NER models is determined,is as followsPredicted by NER modelIs the probability of the mth category,is as followsThe number of the NER models is determined,) Is as followsPredicted by NER modelIs the probability of the mth category, D is the KL distance of the two distributions,the data with the largest KL distance in all the finally obtained unlabeled data is obtained.
Preferably, the predicted result is fused to obtain the category of the data to be labeled, and the following formula is adopted:
in the formula (I), the compound is shown in the specification,for unlabelled dataIn the final category of the video data to be displayed,the number of the NER models is,is as followsThe number of the NER models is determined,for the m-th entity class,is as followsPredicted by NER modelIs the probability of the mth category,is as followsThe weights of the individual NER models are such that,are learnable parameters.
Preferably, the method further comprises the steps of:
and labeling the data to be labeled by using the obtained categories, adding the data to be labeled into the labeled data set, and iteratively training a plurality of NER models.
The invention provides a medical named entity identification method in a second aspect, which comprises the following steps:
inputting data into a plurality of named entity recognition NER models to obtain a plurality of recognition results; a plurality of NER models are obtained by training according to the method;
and fusing the plurality of identification results to obtain a final entity identification result.
A third aspect of the present invention provides a medical named entity recognition apparatus, comprising:
the NER model training module is used for training a plurality of named entity recognition NER models of different types by utilizing the labeling data set;
the to-be-labeled data selection module is used for selecting data to be labeled from the unlabeled data by utilizing an active learning method based on the NER models;
the data to be labeled category prediction module is used for predicting the category of the data to be labeled by utilizing the NER models respectively;
and the prediction result fusion module is used for fusing the prediction result to obtain the category of the data to be labeled.
The invention also provides a memory storing a plurality of instructions for implementing the method as described above.
The invention also provides an electronic device comprising a processor and a memory connected to the processor, wherein the memory stores a plurality of instructions which can be loaded and executed by the processor to enable the processor to execute the method.
The invention has the beneficial effects that: according to the technical scheme provided by the invention, a plurality of NER models are obtained by utilizing a small amount of medical labeling data for training, data with the strongest model uncertainty in unlabeled data are selected by utilizing an active learning method based on the NER models, the data labels are given by fusing the prediction results of the NER models, and finally the labeled data are added into a training data set to optimize the models. Finally, the effect of achieving equivalent performance of a large amount of data by using a small amount of data is achieved. Actual use data shows that the method provided by the invention can achieve the performance of about 90% of full data under 10% of labeled data. Therefore, the method of the invention well meets the actual requirements of the information extraction application scene under the condition that the medical scene lacks enough labeling information.
Drawings
FIG. 1 is a schematic flow chart of a medical named entity recognition method according to the present invention;
FIG. 2 is a schematic diagram of an exemplary implementation of the medical named entity recognition method according to the invention;
FIG. 3 is a schematic view illustrating a process of identifying a named entity in unlabeled data according to the present invention;
fig. 4 is a functional structure diagram of the medical named entity recognition device according to the present invention.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
The method provided by the invention can be implemented in the following terminal environment, and the terminal can comprise one or more of the following components: a processor, a memory, and a display screen. Wherein the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the methods described in the embodiments described below.
A processor may include one or more processing cores. The processor connects various parts within the overall terminal using various interfaces and lines, performs various functions of the terminal and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory, and calling data stored in the memory.
The Memory may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). The memory may be used to store instructions, programs, code sets, or instructions.
The display screen is used for displaying user interfaces of all the application programs.
In addition, those skilled in the art will appreciate that the above-described terminal configurations are not intended to be limiting, and that the terminal may include more or fewer components, or some components may be combined, or a different arrangement of components. For example, the terminal further includes a radio frequency circuit, an input unit, a sensor, an audio circuit, a power supply, and other components, which are not described herein again.
Example one
As shown in fig. 1-2, an embodiment of the present invention provides a medical named entity identification method, including:
s101, training by using a labeling data set to obtain a plurality of named entity recognition NER models of different types;
s102, selecting data to be labeled from unlabeled data by using an active learning method based on the NER models;
s103, predicting the category of the data to be labeled by utilizing the NER models respectively;
and S104, fusing the predicted results to obtain the category of the data to be labeled.
At present, due to the particularity of the medical industry, less data and less labeled data are used in the medical named entity recognition task, but the existing available model can only utilize a small amount of labeled data and cannot fully utilize a large amount of unlabeled data, and a single active learning method is usually used, so that the advantages brought by different types of model combinations are not fully utilized.
The method provided by the invention is provided aiming at the particularity of the medical data and the problems in the prior art. The problem of insufficient labeled data is solved by fully utilizing the advantages of massive unlabeled data and multi-model complementation, and the performance of medical named entity identification is improved. Specifically, a small amount of labeled data is used for training to obtain a plurality of NER models of different types, based on the NER models, an active learning method is used for selecting data with the strongest uncertainty in unlabeled data as data to be labeled, then prediction results of the NER models are fused to give a data label, and finally labeled data are added into a training data set to be used for optimizing the models.
The method provided by the invention utilizes a small amount of medical labeled data, adopts a plurality of different active learning strategy combinations, selects the data with the strongest uncertainty of the model from the unlabeled data, gives the data label by fusing the prediction results of a plurality of models, and finally adds the labeled data into the training data set to optimize the model. Finally, the effect of achieving equivalent performance of a large amount of data by using a small amount of data is achieved. Actual use data shows that the method provided by the invention can achieve the performance of about 90% of full data under 10% of labeled data. Therefore, the method of the invention well meets the actual requirements of the information extraction application scene under the condition that the medical scene lacks enough labeling information.
In step S101, initially, since the labeled data in the medical field is less, the labeled data in the labeled data set is less, but with the implementation of the method, after the category of the unlabeled data is obtained, the data may be labeled and added to the labeled data set, so that the labeled data therein is more and more, and the trained model has higher and higher performance until the performance is stable.
In the process of training the model, because the amount of training data which can be used is relatively small, in a preferred embodiment of the invention, a pre-training language model + fine-tuning method is adopted to train a plurality of NER models of different types by using a small amount of labeled data.
Wherein the plurality of NER models of different types obtained by training may include: deep learning models, statistical learning models, and/or knowledge-based models. Among these plural NER models, there may be only one type of model, and there may be plural types of models, that is, plural types of models are combined into one NER model. As an example, the plurality of NER models may include, for example: FCRF model, Emb + MLP, Bert + CRF, Bert + BilSTM + CRF, FLAT model, GloalPointer, and Prompt model. Wherein, FCRF is a statistical learning model, Emb + MLP, Bert + CRF, and Bert + BilSTM + CRF are the combination of the statistical learning model and the deep learning model, and FLAT model, GloalPointer, and Prompt are the deep learning models.
The models or the combination of the models of different types adopt a plurality of different active learning strategies, thereby realizing the advantage complementation between the single learning strategies and making up the defect of less training data.
The FCRF model adopts a method combination of features + CRF based on statistical learning. The features based on statistical learning can be selected from common features, such as context window words, vocabulary length and other prior information; emb + MLP can be obtained completely based on existing training data; the Bert + MLP can directly utilize the information of the pre-training language model; the Bert + CRF can better model the input sequence by using the CRF; bert + BiLSTM + CRF can better model context information by using the BiLSTM; the FLAT model may utilize location information to better model information of lexical context; the GlobalPointer model can simultaneously model nested and non-nested named entities; the Prompt model may utilize PLM to convert NER to a production question, modeling named entity recognition from a text production perspective. The models obtained by combining various different active learning strategies can be trained from different sides by utilizing training data to obtain information except the data, entities with different types and lengths, and the like. Thus, these NER models: the FCRF model, Emb + MLP, Bert + CRF, Bert + BilSTM + CRF, FLAT model, GloalPointer and Prompt model also have good complementarity. Therefore, based on a plurality of NER models of different types, when the data to be labeled is selected from the unlabeled data by an active learning method, the most valuable data to be labeled can be determined by utilizing the advantage complementation among the models.
In step S102, the selecting data to be labeled from unlabeled data by using an active learning method based on the plurality of NER models may include the following steps:
respectively predicting the distribution of the unlabeled data in each category by using each NER model;
calculating the distribution consistency of the unlabeled data in each category;
and determining the data to be labeled from all the unlabeled data according to the consistency.
Because each NER model obtained by training is different in distribution of the unlabeled data among different classes, in order to find the unlabeled data with the highest labeling value, in the invention, the consistency of the distribution of the same unlabeled data among different classes by a plurality of models is calculated for judgment, and when the distribution consistency of different models is lower, the uncertainty of the unlabeled data is higher, and the labeling value is higher.
The method comprises the steps of firstly, predicting the probability that certain unlabeled data is in a certain class by using a certain NER model, then sequentially predicting the probability that the certain unlabeled data is in the certain class by using other NER models to obtain probability distributions which are predicted by using the NER models to the certain class respectively, and finally calculating the consistency of the probability distributions to obtain the consistency of the probability distributions of the certain unlabeled data in the classes. And in the same way, obtaining the consistency of the probability distribution of other unlabeled data in each category. And finally, taking the unmarked data with the lowest consistency as the most valuable data to be marked from all consistencies.
Wherein, as an embodiment, for example, the plurality of NER models may include: at least two of FCRF model, Emb + MLP, Bert + CRF, Bert + BilSTM + CRF, FLAT model, GloalPointer, and Prompt model.
In a preferred embodiment of the present invention, the consistency of the distribution is calculated based on the KL distance. The KL distance is an abbreviation for the Kullback-Leibler difference (Kullback-Leibler bias), also called Relative Entropy (Relative Entropy). It measures the difference between two probability distributions in the same event space. Therefore, the greater the KL distance, the lower the consistency.
In a preferred embodiment of the present invention, the calculating the consistency of the distribution of each unlabeled data in each category; determining data to be marked from all unmarked data according to the consistency, and adopting the following formula:
in the formula (I), the compound is shown in the specification,in order to not label the data,is the M-th entity class, M is the total number of entity classes,is as followsThe number of the NER models is determined,is as followsPredicted by NER modelIs the probability of the mth category,is as followsThe number of the NER models is determined,is as followsPredicted by NER modelIs the probability of the mth category, D is the KL distance of the two distributions,the data with the largest KL distance in all the finally obtained unlabeled data is obtained.
I.e. for each dataCalculating different model prediction dataBetween probabilities for the m-th classDistance, to allAveraging after calculation of the individual entity classes, argmaxxIndicating the data at which the subsequent function takes the maximum value, i.e.Data of maximum distance。
In step S103, the categories of the data to be labeled are predicted by using a plurality of NER models, and how many prediction results are obtained by using how many NER models. For example, in a preferred embodiment of the present invention, the plurality of NER models may include: 8 prediction results can be obtained by 8 models including an FCRF model, an Emb + MLP model, a Bert + CRF model, a Bert + BilSTM + CRF model, a FLAT model, a GloalPointer model and a Prompt model.
In another preferred embodiment of the present invention, a dictionary + RULE based method (RULE) is additionally introduced, which determines names and categories of entities by a dictionary retrieval and text similarity calculation method.
In step S104, after obtaining a plurality of prediction results corresponding to the plurality of models, the prediction results of all models (FCRF model, Emb + MLP, Bert + CRF, Bert + BiLSTM + CRF, FLAT model, GloalPointer, Prompt model, RULE) are fused by using the concept of ensemble learning.
In a preferred embodiment of the present invention, the predicted result may be fused by using the following formula:
in the formula (I), the compound is shown in the specification,for unlabelled dataIn the final category of the video data to be displayed,the number of the NER models is,is as followsThe number of the NER models is determined,for the m-th entity class,is as followsPredicted by NER modelIs the probability of the mth category,is as followsThe weights of the individual NER models are such that,are learnable parameters. argmaxcIndicating the class of the function when the subsequent function takes the maximum value.
In the invention, the fusion result is used as the category of the data to be labeled. Furthermore, the class can be used for labeling data to be labeled, the labeled data is added into the labeled data set, the data set added with new labeled data is used as a training set for iterative training of a plurality of NER models of different types, and the performance of the NER models is stable and can not be improved any more.
Example two
As shown in fig. 3, an embodiment of the present invention provides a medical named entity identification method, including:
inputting data into a plurality of named entity recognition NER models to obtain a plurality of recognition results; a plurality of the NER models are trained according to the following method provided in example one:
marking data to be marked by utilizing a result obtained by fusing a plurality of NER models in a prediction mode, adding the marked data into the marked data set, and iteratively training a plurality of NER models of different types by taking the data set added with new marked data as a training set until the performance of the NER models is stable and is not promoted any more.
And fusing the plurality of identification results to obtain a final entity identification result.
Specifically, the method as described in the first embodiment may be adopted to fuse a plurality of recognition results obtained by using a plurality of NER models to obtain a final entity recognition result. Specifically, the following formula can be adopted:
in the formula (I), the compound is shown in the specification,for unlabelled dataIn the final category of the video data to be displayed,the number of the NER models is,is as followsThe number of the NER models is determined,for the m-th entity class,is as followsPredicted by NER modelIs the probability of the mth category,is as followsThe weights of the individual NER models are such that,are learnable parameters. argmaxcIndicating the class of the function when the subsequent function takes the maximum value.
EXAMPLE III
As shown in fig. 4, another aspect of the present invention further includes a functional module architecture completely corresponding to the foregoing method flow, that is, an embodiment of the present invention further provides a medical named entity recognition apparatus, including:
the NER model training module 401 is used for training a plurality of named entity recognition NER models of different types by using the labeling data set;
a to-be-labeled data selection module 402, configured to select, based on the plurality of NER models, data to be labeled from unlabeled data by using an active learning method;
a to-be-labeled data category prediction module 403, configured to use the plurality of NER models to respectively predict categories of the to-be-labeled data;
and a prediction result fusion module 404, configured to fuse the prediction results to obtain the category of the data to be labeled.
Wherein, in the NER model training module, the plurality of NER models of different types include: deep learning models, statistical learning models, and/or knowledge-based models.
In the to-be-labeled data selection module, selecting, based on the plurality of NER models, to-be-labeled data from unlabeled data by using an active learning method includes:
respectively predicting the distribution of the unlabeled data in each category by using each NER model;
calculating the distribution consistency of the unlabeled data in each category;
and determining the data to be labeled from all the unlabeled data according to the consistency.
Calculating the distribution consistency of the unlabeled data in each category; determining data to be marked from all unmarked data according to the consistency, and adopting the following formula:
in the formula (I), the compound is shown in the specification,in order to not label the data,for the m-th entity class,is the total amount of the entity class,is as followsThe number of the NER models is determined,is as followsPredicted by NER modelIs the probability of the mth category,is as followsThe number of the NER models is determined,is as followsPredicted by NER modelProbability of m-th class, D being two distributionsThe distance between the first and second electrodes,for all the unmarked data obtained finallyThe data with the largest distance.
In the prediction result fusion module, the predicted result is fused by using the following formula:
in the formula (I), the compound is shown in the specification,for unlabelled dataIn the final category of the video data to be displayed,the number of the NER models is,is as followsThe number of the NER models is determined,for the m-th entity class,is as followsPredicted by NER modelIs the probability of the mth category,is as followsThe weights of the individual NER models are such that,are learnable parameters. argmaxcIndicating the class of the function when the subsequent function takes the maximum value.
The medical named entity recognition device provided by the embodiment of the invention further comprises a model optimization module, wherein the model optimization module is used for labeling the data to be labeled by utilizing the obtained categories, adding the data to be labeled into the labeled data set, and training a plurality of NER models in an iterative manner until the performance of the NER models is stable.
The device can be implemented by the medical named entity identification method provided in the first embodiment, and specific implementation methods can be referred to the description in the first embodiment and are not described herein again.
The invention also provides a memory storing a plurality of instructions for implementing the method according to the first embodiment.
The invention also provides an electronic device comprising a processor and a memory connected to the processor, wherein the memory stores a plurality of instructions, and the instructions can be loaded and executed by the processor to enable the processor to execute the method according to the first embodiment.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (10)
1. A medical named entity recognition method, comprising:
training by utilizing a labeling data set to obtain a plurality of named entity recognition NER models of different types;
selecting data to be labeled from unlabeled data by using an active learning method based on a plurality of NER models;
predicting the category of the data to be labeled by utilizing a plurality of NER models respectively;
and fusing the predicted results to obtain the category of the data to be labeled.
2. The medical named entity recognition method of claim 1, wherein the plurality of named entity recognition NER models of different types comprises: deep learning models, statistical learning models, and/or knowledge-based models.
3. The medical named entity recognition method of claim 1, wherein selecting data to be labeled from unlabeled data using an active learning method based on a plurality of the NER models comprises:
respectively predicting the distribution of the unlabeled data in each category by using each NER model;
calculating the distribution consistency of the unlabeled data in each category;
and determining the data to be labeled from all the unlabeled data according to the consistency.
4. The medical named entity recognition method according to claim 3, wherein the consistency of distribution of each unlabeled data in each category is calculated, and the data to be labeled is determined from all the unlabeled data according to the consistency by using the following formula:
in the formula (I), the compound is shown in the specification,in order to not label the data,for the m-th entity class,is the total amount of the entity class,is as followsThe number of the NER models is determined,is as followsPredicted by NER modelIs the probability of the mth category,is as followsThe number of the NER models is determined,is as followsPredicted by NER modelProbability of m-th class, D being two distributionsThe distance between the first and second electrodes,for all the unmarked data obtained finallyThe data with the largest distance.
5. The medical named entity recognition method of claim 1, wherein the fusing of the predicted results to obtain the category of the data to be labeled employs the following formula:
in the formula (I), the compound is shown in the specification,for unlabelled dataIn the final category of the video data to be displayed,the number of the NER models is,is as followsThe number of the NER models is determined,for the m-th entity class,is as followsPredicted by NER modelIs the probability of the mth category,is the weight of the ith NER model,are learnable parameters.
6. The medical named entity recognition method of claim 1, further comprising the steps of:
and labeling the data to be labeled by using the obtained categories, adding the data to be labeled into the labeled data set, and iteratively training a plurality of NER models.
7. A medical named entity recognition method, comprising:
inputting data into a plurality of named entity recognition NER models to obtain a plurality of recognition results; a plurality of the NER models are trained according to the method of claim 6;
and fusing the plurality of identification results to obtain a final entity identification result.
8. A medical named entity recognition apparatus, comprising:
the NER model training module is used for training a plurality of named entity recognition NER models of different types by utilizing the labeling data set;
the to-be-labeled data selection module is used for selecting data to be labeled from the unlabeled data by utilizing an active learning method based on the NER models;
the data to be labeled category prediction module is used for predicting the category of the data to be labeled by utilizing the NER models respectively;
and the prediction result fusion module is used for fusing the prediction result to obtain the category of the data to be labeled.
9. A memory storing a plurality of instructions for implementing the method of any one of claims 1-7.
10. An electronic device comprising a processor and a memory coupled to the processor, the memory storing a plurality of instructions that are loadable and executable by the processor to enable the processor to perform the method according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210125810.4A CN114169338B (en) | 2022-02-10 | 2022-02-10 | Medical named entity identification method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210125810.4A CN114169338B (en) | 2022-02-10 | 2022-02-10 | Medical named entity identification method and device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114169338A true CN114169338A (en) | 2022-03-11 |
CN114169338B CN114169338B (en) | 2022-05-17 |
Family
ID=80489602
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210125810.4A Active CN114169338B (en) | 2022-02-10 | 2022-02-10 | Medical named entity identification method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114169338B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114580422A (en) * | 2022-03-14 | 2022-06-03 | 昆明理工大学 | Named entity identification method combining two-stage classification of neighbor analysis |
CN117577348A (en) * | 2024-01-15 | 2024-02-20 | 中国医学科学院医学信息研究所 | Identification method and related device for evidence-based medical evidence |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111062215A (en) * | 2019-12-10 | 2020-04-24 | 金蝶软件(中国)有限公司 | Named entity recognition method and device based on semi-supervised learning training |
CN111797629A (en) * | 2020-06-23 | 2020-10-20 | 平安医疗健康管理股份有限公司 | Medical text data processing method and device, computer equipment and storage medium |
CN112001177A (en) * | 2020-08-24 | 2020-11-27 | 浪潮云信息技术股份公司 | Electronic medical record named entity identification method and system integrating deep learning and rules |
CN113343696A (en) * | 2021-05-31 | 2021-09-03 | 郑州大学第一附属医院 | Electronic medical record named entity identification method, device, remote terminal and system |
WO2021218024A1 (en) * | 2020-04-29 | 2021-11-04 | 平安科技(深圳)有限公司 | Method and apparatus for training named entity recognition model, and computer device |
-
2022
- 2022-02-10 CN CN202210125810.4A patent/CN114169338B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111062215A (en) * | 2019-12-10 | 2020-04-24 | 金蝶软件(中国)有限公司 | Named entity recognition method and device based on semi-supervised learning training |
WO2021218024A1 (en) * | 2020-04-29 | 2021-11-04 | 平安科技(深圳)有限公司 | Method and apparatus for training named entity recognition model, and computer device |
CN111797629A (en) * | 2020-06-23 | 2020-10-20 | 平安医疗健康管理股份有限公司 | Medical text data processing method and device, computer equipment and storage medium |
CN112001177A (en) * | 2020-08-24 | 2020-11-27 | 浪潮云信息技术股份公司 | Electronic medical record named entity identification method and system integrating deep learning and rules |
CN113343696A (en) * | 2021-05-31 | 2021-09-03 | 郑州大学第一附属医院 | Electronic medical record named entity identification method, device, remote terminal and system |
Non-Patent Citations (1)
Title |
---|
曾钰婷: "《基于主动学习的中文医学实体识别方法》", 《中国优秀博硕士学位论文全文数据库(硕士) 医药卫生科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114580422A (en) * | 2022-03-14 | 2022-06-03 | 昆明理工大学 | Named entity identification method combining two-stage classification of neighbor analysis |
CN117577348A (en) * | 2024-01-15 | 2024-02-20 | 中国医学科学院医学信息研究所 | Identification method and related device for evidence-based medical evidence |
CN117577348B (en) * | 2024-01-15 | 2024-03-29 | 中国医学科学院医学信息研究所 | Identification method and related device for evidence-based medical evidence |
Also Published As
Publication number | Publication date |
---|---|
CN114169338B (en) | 2022-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111581229B (en) | SQL statement generation method and device, computer equipment and storage medium | |
CN114169338B (en) | Medical named entity identification method and device and electronic equipment | |
WO2022022152A1 (en) | Video clip positioning method and apparatus, and computer device and storage medium | |
CN111428021A (en) | Text processing method and device based on machine learning, computer equipment and medium | |
CN110717039A (en) | Text classification method and device, electronic equipment and computer-readable storage medium | |
CN111461301B (en) | Serialized data processing method and device, and text processing method and device | |
US20240046644A1 (en) | Video classification method, device and system | |
CN111666427A (en) | Entity relationship joint extraction method, device, equipment and medium | |
CN111653274B (en) | Wake-up word recognition method, device and storage medium | |
CN113836925B (en) | Training method and device for pre-training language model, electronic equipment and storage medium | |
CN114647732B (en) | Weak supervision-oriented text classification system, method and device | |
EP4099333A2 (en) | Method and apparatus for training compound property pediction model, storage medium and computer program product | |
CN112926308B (en) | Method, device, equipment, storage medium and program product for matching text | |
CN113065013A (en) | Image annotation model training and image annotation method, system, device and medium | |
CN110909768B (en) | Method and device for acquiring marked data | |
CN112418291A (en) | Distillation method, device, equipment and storage medium applied to BERT model | |
CN113780365A (en) | Sample generation method and device | |
CN112735564A (en) | Mental health state prediction method, mental health state prediction apparatus, mental health state prediction medium, and computer program product | |
CN115129902B (en) | Media data processing method, device, equipment and storage medium | |
CN113688232B (en) | Method and device for classifying bid-inviting text, storage medium and terminal | |
CN115617975A (en) | Intention identification method and device for few-sample and multi-turn conversations | |
CN113010687B (en) | Exercise label prediction method and device, storage medium and computer equipment | |
CN115098722A (en) | Text and image matching method and device, electronic equipment and storage medium | |
CN114627085A (en) | Target image identification method and device, storage medium and electronic equipment | |
CN113987136A (en) | Method, device and equipment for correcting text classification label and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |