CN115146641A

CN115146641A - NER noisy training method and device

Info

Publication number: CN115146641A
Application number: CN202210796239.9A
Authority: CN
Inventors: 姜姗; 刘升平; 梁家恩
Original assignee: Unisound Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-10-04

Abstract

The invention relates to a named entity recognition NER noisy training method which comprises the steps of obtaining a plurality of text data and an entity label corresponding to each text data, constructing an original noisy training set according to the text data and the entity label, and dividing the original noisy training set to obtain K data sets; training the K data sets through a preset model to obtain K trained models; selecting any one of K data sets as a training set, using the other K-1 data sets as a test set, and predicting each text data in the test set through K trained models to obtain K-1 prediction results of each text data in the test set; predicting the result as a new label; counting a first number of inconsistent entity tags and new tags; setting a noise weight parameter for each text data according to the first number; calculating loss functions of the K trained models according to the noise weight parameters; and determining a target training model according to the score of the loss function.

Description

NER noisy training method and device

Technical Field

The invention relates to the technical field of data processing, in particular to a noise-carrying NER training method and device.

Background

Named Entity Recognition (NER) aims to recognize Named entities in texts and generalize the Named entities into corresponding Entity types, and is one of the most important underlying tasks of Natural Language Processing (NLP) in the medical field.

The difficulty in constructing the NER data set in the medical field is high, when labeling personnel lack corresponding medical background knowledge, the conditions of label error, label leakage, inconsistent labeling and the like are easy to occur, and the performance optimization of the NER model on a noisy data set is still difficult.

Disclosure of Invention

The invention aims to provide a noise-carrying NER training method and a noise-carrying NER training device, which aim to solve the problem that an NER model in the prior art is difficult to perform performance optimization on a noisy data set.

The invention provides a NER noisy training method in a first aspect, which comprises the following steps:

acquiring a plurality of text data and an entity label corresponding to each text data, constructing an original noisy training set according to the text data and the entity label, and dividing the original noisy training set to obtain K data sets;

training the K data sets through a preset model to obtain K trained models;

selecting any one of K data sets as a training set, using the other K-1 data sets as a test set, and predicting each text data in the test set through K trained models to obtain K-1 prediction results of each text data in the test set; the prediction result is a new label;

counting a first number of the entity labels inconsistent with the new labels;

setting a noise weight parameter for each text data according to the first number;

calculating loss functions of the K trained models according to the noise weight parameters;

and determining a target training model according to the score of the loss function.

In a possible implementation manner, before setting a noise weight parameter for each text data according to the number, the method further includes:

extracting artificially marked text data from the original noisy training set;

counting a second number of noise data in the extracted text data;

and calculating the noise weight parameter according to the second number and the first number of the noise data.

In a possible implementation manner, the calculating the noise weight parameter according to the second number and the first number of the noise data specifically includes:

calculating the probability that the second number occupies the manually marked text data;

calculating a difference value obtained by subtracting the probability from a preset value;

and calculating a second power of the difference to obtain the weight parameter.

In a possible implementation manner, the preset model is an NER model.

In a possible implementation manner, the calculating the loss functions of the K trained models according to the noise weight parameter specifically includes:

calculating the conventional loss functions of the K trained models;

and calculating the loss functions of the K trained models according to the product of the conventional loss function and the noise fountain parameter.

In one possible implementation, the conventional loss function includes cross entropy.

In a second aspect, the present invention provides a NER noisy training device, the device comprising:

the acquisition module is used for acquiring a plurality of text data and an entity label corresponding to each text data;

the building module is used for building an original noisy training set according to the text data and the entity labels, and dividing the original noisy training set to obtain K data sets;

the model training module is used for training the K data sets through a preset model to obtain K trained models;

the prediction module is used for selecting any one of the K data sets as a training set, taking the other K-1 data sets as a test set, and predicting each text data in the test set through the K trained models to obtain K-1 prediction results of each text data in the test set; the prediction result is a new label;

a statistics module for counting a first number of the entity tags inconsistent with the new tags;

the weight setting module is used for setting a noise weight parameter for each text data according to the first number;

the calculation module is used for calculating loss functions of the K trained models according to the noise weight parameters;

a determination module to determine a target training model according to the score of the loss function.

In a third aspect, the present invention provides a chip system comprising a processor coupled to a memory, the memory storing program instructions, which when executed by the processor implement the NER noisy training method of any one of the first aspects.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program for executing, by a processor, the NER noisy training method according to any one of the first aspects.

In a fifth aspect, the present invention provides a computer program product which, when run on a computer, causes the computer to perform the NER noisy training method according to any one of the first aspects.

By applying the NER noisy training method provided by the embodiment of the invention, the noise weight parameter is calculated for each text data in the training set, so that the target model is obtained by calculation, the influence of low confidence data on model training is reduced, and the model performance is improved. Furthermore, when the noise weight parameter is calculated, due to the fact that the data labeling difficulty in the medical field is large, only a small amount of data needs to be checked and labeled, and labor labeling cost is saved.

Drawings

Fig. 1 is a schematic flow chart of an NER noisy training method according to an embodiment of the present invention;

fig. 2 is a second schematic flow chart of the NER noisy training method according to the first embodiment of the present invention;

fig. 3 is a schematic structural diagram of an NER noisy training device according to a second embodiment of the present invention;

fig. 4 is a second schematic structural diagram of the NER noisy training device according to the second embodiment of the present invention;

fig. 5 is a schematic structural diagram of a chip system according to a third embodiment of the present invention;

FIG. 6 is a diagram illustrating a computer-readable storage medium according to a fourth embodiment of the present invention;

fig. 7 is a schematic diagram of a computer program product according to a fifth embodiment of the present invention.

Detailed Description

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Example one

Fig. 1 is a schematic flow chart of an NER noisy training method according to an embodiment of the present invention, which is applied to entity recognition for recognizing a named entity in a text, such as a text in a medical field. As shown in fig. 1, the method comprises the steps of:

step 110, acquiring a plurality of text data and an entity label corresponding to each text data, constructing an original noisy training set according to the text data and the entity labels, and dividing the original noisy training set to obtain K data sets;

specifically, assume that the original noisy training set is D = { x = ₁ ，x ₂ ，...，x _n }，{y ₁ ，y ₂ ，...，y _n In which { x } ₁ ，x ₂ ，...，x _n X in (b) } ₁ ，x ₂ ，...，x _n For text data, { y ₁ ，y ₂ ，...，y _n Y in (b) } ₁ ，y ₂ ，...，y _n In order of x ₁ ，x ₂ ，...，x _n A corresponding label. Dividing an original noisy training set into K parts, which are sequentially expressed as: d ₁ ，D ₂ ...，D _k 。

The division may be performed randomly, or may be performed in a group of a set number according to a preset grouping setting, and the specific manner of the division is not limited in the present application.

Step 120, training the K data sets through a preset model to obtain K trained models;

specifically, one NER model is selected, for example and without limitation, such as a depth bi-directional Representation (BERT), and K NER models, denoted as M, are obtained by training K training sets respectively _i (1≤i≤k)。

Step 130, selecting any one of K data sets as a training set, using the other K-1 data sets as a test set, and predicting each text data in the test set through K trained models to obtain K-1 prediction results of each text data in the test set; predicting the result as a new label;

specifically, any one of the K data sets may be used as a training set, the rest may be used as a test set, the training set is trained sequentially through each model in the K modules, K-1 prediction results may be obtained from text data in each training set, and each prediction result has a new label.

Examples are, for instance: when the training set is D _i Then, the remaining K-1 data sets D ₁ ，D ₂ ...，D _i-1 As a test set, trained K models M are used _i And predicting each piece of text data in the test set in turn. At this time, for each piece of text data x in the noisy training set D _i K-1 prediction results can be obtained, namely K-1 new labels are obtained.

Step 140, counting a first number of the entity labels inconsistent with the new labels;

specifically, for each text data, a first number, denoted as c, of new labels in the prediction result inconsistent with the original labels of the data may be counted _i (0≤i≤k-1)。

Step 150, setting a noise weight parameter for each text data according to the first number;

specifically, before step 150 is performed, steps for calculating the noise weight parameter are required, as shown in fig. 2, including steps 210-230.

Step 210, extracting artificially labeled text data from an original noisy training set;

specifically, manually labeled text data, such as manually labeled text data of 100 pieces of data in D, may be extracted.

Step 220, counting a second number of noise data in the extracted text data;

and counting the number of the noise data as alpha for the manually marked text data.

And step 230, calculating a noise weight parameter according to the second number and the first number of the noise data.

Specifically, step 230 includes:

calculating the probability that the second number occupies the artificially labeled text data; calculating a difference value of the preset value minus the probability; and calculating a second power of the difference to obtain a weight parameter.

For example, in the present application, if the manually labeled data is set to 100, the probability p = α%. The preset value is an empirical value obtained through a plurality of experiments, and for example, the preset value may be set to 1. The difference is then: 1-p. Noise weight parameter

From this formula, c _i The larger, then x _i The greater the probability of noisy data, the noise weight parameter ω _i The smaller.

Step 160, calculating loss functions of the K trained models according to the noise weight parameters;

specifically, step 160 includes: calculating the conventional loss functions of the K trained models; and calculating the loss functions of the K trained models according to the product of the conventional loss function and the noise fountain parameter. Wherein the conventional loss function includes cross entropy. The grandchild function of each model after training is:

wherein L is _i Is a conventional loss function such as cross entropy.

Step 170, determining a target training model according to the score of the loss function.

Specifically, the score of the loss function is calculated, and the model corresponding to the loss function with the highest score is selected as the target training model.

By applying the NER noisy training method provided by the embodiment of the invention, the noise weight parameter is calculated for each text data in the training set, so that the target model is obtained by calculation, the influence of low confidence data on model training is reduced, and the model performance is improved. Furthermore, when the noise weight parameter is calculated, due to the fact that the data labeling difficulty in the medical field is large, only a small amount of data needs to be checked and labeled, and the manual labeling cost is saved.

Example two

An embodiment of the present invention provides an NER noisy training device, as shown in fig. 3, the device including: an obtaining module 310, a constructing module 320, a model training module 330, a predicting module 340, a counting module 350, a weight setting module 360, a calculating module 370 and a determining module 380.

The obtaining module 310 is configured to obtain a plurality of text data and an entity tag corresponding to each text data;

the constructing module 320 is configured to construct an original noisy training set according to the text data and the entity labels, and divide the original noisy training set to obtain K data sets;

the model training module 330 is configured to train K data sets through a preset model to obtain K trained models;

the prediction module 340 is configured to select any one of the K data sets as a training set, use the remaining K-1 data sets as a test set, and predict each text data in the test set through the K trained models to obtain K-1 prediction results of each text data in the test set; predicting the result as a new label;

the counting module 350 is configured to count a first number of the entity labels inconsistent with the new label;

the weight setting module 360 is used for setting a noise weight parameter for each text data according to the first number;

the calculating module 370 is configured to calculate loss functions of the K trained models according to the noise weight parameter;

the determining module 380 is used for determining the target training model according to the score of the loss function.

Further, as shown in fig. 4, the apparatus may further include: a decimation module 410.

The extraction module 410 is used for extracting artificially labeled text data from the original noisy training set;

the statistical module 350 is further configured to count a second number of noise data in the extracted text data;

the calculating module 370 is further configured to calculate a noise weight parameter according to the second number and the first number of the noise data.

Further, the calculating module 370 specifically calculates the noise weight parameter according to the second number and the first number of the noise data, including: calculating the probability that the second number occupies the artificially labeled text data; calculating a difference value of the preset value minus the probability; and calculating a second power of the difference to obtain a weight parameter.

Further, the preset model is an NER model.

The calculating module 370 calculates the loss functions of the K trained models according to the noise weight parameters, specifically including: calculating the conventional loss functions of the K trained models; and calculating the loss functions of the K trained models according to the product of the conventional loss function and the noise quazhou parameter.

Wherein the conventional loss function includes cross entropy.

The apparatus provided in the second embodiment of the present invention can execute the method steps in the first embodiment of the method, and the implementation principle and the technical effect are similar, which are not described herein again.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the determining module may be a processing element separately set up, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the determining module is called and executed by a processing element of the apparatus. The other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, when some of the above modules are implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor that can call the program code. As another example, these modules may be integrated together and implemented in the form of a System-on-a-chip (SOC).

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, bluetooth, microwave, etc.).

EXAMPLE III

A chip system according to a third embodiment of the present invention is provided, as shown in fig. 5, and includes a processor, where the processor is coupled to a memory, and the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the method for training NER with noise according to any one of the methods provided in the first embodiment of the present invention is implemented.

Example four

An embodiment of the present invention provides a computer-readable storage medium, as shown in fig. 6, which includes a program or instructions, and when the program or instructions are run on a computer, the method for noise-added training of an NER according to any one of the embodiments is implemented.

EXAMPLE five

Embodiment five provides a computer program product comprising instructions, as shown in fig. 7, which when run on a computer, cause the computer to perform any one of the NER noisy training methods provided in embodiment one.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A Named Entity Recognition (NER) noisy training method is characterized by comprising the following steps:

training the K data sets through a preset model to obtain K trained models;

counting a first number of the entity labels inconsistent with the new labels;

2. The method of claim 1, wherein before setting a noise weighting parameter for each text data according to the number, the method further comprises:

extracting artificially labeled text data from the original noisy training set;

counting a second number of noise data in the extracted text data;

3. The method of claim 2, wherein said calculating the noise weight parameter based on the second number and the first number of noise data comprises:

calculating the probability that the second number occupies the artificially marked text data;

calculating a difference value of subtracting the probability from a preset value;

4. The method of claim 1, wherein the predetermined model is a NER model.

5. The method according to claim 1, wherein said calculating the loss function of the K trained models based on the noise weight parameter specifically comprises:

calculating the conventional loss functions of the K trained models;

6. The method of claim 5, wherein the regular loss function comprises cross entropy.

7. A Named Entity Recognition (NER) noisy training device, the device comprising:

the acquisition module is used for acquiring a plurality of text data and an entity tag corresponding to each text data;

8. A chip system comprising a processor coupled to a memory, the memory storing program instructions that, when executed by the processor, implement the NER noisy training method of any of claims 1-6.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which is executed by a processor to perform the NER noisy training method according to any one of claims 1-6.

10. A computer program product, characterized in that it, when run on a computer, causes the computer to perform the NER noisy training method according to any of the claims 1-6.