CN114036948B

CN114036948B - Named entity identification method based on uncertainty quantification

Info

Publication number: CN114036948B
Application number: CN202111246467.0A
Authority: CN
Inventors: 吴偶; 叶迎春; 张吉
Original assignee: Tianjin University; Zhejiang Lab
Current assignee: Tianjin University; Zhejiang Lab
Priority date: 2021-10-26
Filing date: 2021-10-26
Publication date: 2024-05-31
Anticipated expiration: 2041-10-26
Also published as: CN114036948A

Abstract

The invention discloses a named entity identification method based on uncertainty quantification, which comprises the following steps: step 1, collecting a sample set positioned to the entity position, and constructing a detection model of a candidate entity; step 2, for the entities in the sample set, adopting BILSTM and self_ attention network structures suitable for long text memory dependence to respectively obtain the representation of the entity context characteristics and the entity self characteristics; step 3, learning an uncertainty quantization model of an entity by adopting the ideas of contrast loss and parameter sharing, and giving an uncertainty value of each entity; step 4, converting the uncertainty value into a dropout probability of each entity, giving a threshold value, and removing samples with uncertainty larger than the threshold value; and 5, training a new named entity recognition model by introducing the concept of Monte Carlo dropoff training in the Bayesian neural network through the dropoff probability of the entity in the step 4.

Description

Named entity identification method based on uncertainty quantification

Technical Field

The invention relates to the technical field of computer application, in particular to a novel method for quantifying uncertainty of entity context characteristics and entity self characteristics and for identifying named entities.

Background

In machine learning, there is always unavoidable uncertainty, and two main uncertainties are model uncertainty and data uncertainty, wherein the model uncertainty is derived from whether the structural selection and the model parameters can best describe data distribution, the data uncertainty is derived from that even if the observation and evaluation of the data are accurate, noise still exists in data generation, and especially for the named entity recognition task of supervised learning, the uncertainty of supervision information itself can have great influence on the final recognition result.

In recent years, with the proposal of bayesian neural networks, quantization uncertainty becomes possible. In the field of computer vision, bayesian neural network quantization uncertainty has been applied to semantic segmentation and monocular depth estimation tasks, and experimental comparison has found that model uncertainty and data uncertainty in both tasks are quantized by introducing a bayesian neural network, which brings about an effect improvement. Then, the Bayesian neural network is also used for quantifying the uncertainty in the natural language processing task, and the analysis and comparison result is that the uncertainty in the Bayesian neural network quantification is improved for the three natural language processing tasks by carrying out experiments on emotion analysis, named entity recognition and language model.

Although bayesian neural networks have been used to quantify uncertainties in named entity recognition, their influencing factors and quantification strategies for uncertainties in named entity recognition tasks are not well defined and lack interpretability for uncertainties in named entity recognition.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a named entity recognition method based on uncertainty quantification.

The invention aims at realizing the following technical scheme:

A named entity identification method based on uncertainty quantification comprises the following steps:

step1, collecting a sample set positioned to the entity position, and constructing a detection model of a candidate entity;

Step 2, for the entities in the sample set, adopting BiLSTM and self_ attention network structures suitable for long text memory dependence to respectively obtain the representation of the entity context characteristics and the entity self characteristics;

Step 3, learning an uncertainty quantization model of an entity by adopting the ideas of contrast loss and parameter sharing, and giving an uncertainty value of each entity;

Step 4, converting the uncertainty value into a dropout probability of each entity, giving a threshold value, and removing samples with uncertainty larger than the threshold value;

and 5, training a new named entity recognition model by introducing the concept of Monte Carlo dropoff training in the Bayesian neural network through the dropoff probability of the entity in the step 4.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

1. The invention provides a new method for named entity recognition based on uncertainty quantification, which is characterized in that the main factors of uncertainty sources in named entity recognition are determined: the ambiguity of the characteristics of the entity and the characteristics of the context provides the uncertainty of the context-entity contrast loss quantization entity, and introduces the concept of Monte Carlo dropout training of the Bayesian neural network to train a named entity recognition model;

2. After the uncertainty of the entity is quantified, once the uncertainty is larger than a given threshold value, the entity sample is removed, and the model can be enabled to be free from too difficult samples with too difficult learning characteristics, so that the convergence rate of model learning is increased.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention.

FIG. 2 is a model of uncertainty measurements of an entity of the method of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and the specific examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In this embodiment, an execution environment using a named entity recognition method based on uncertainty quantization adopts a server with a 3.0 ghz central processing unit, nvida GPU processor and 16 gbytes of memory, and compiles an uncertainty program in quantized named entity recognition by using python language, so that the new method for quantifying uncertainty in named entity recognition by using ambiguity of entity context characteristics and entity own characteristics is realized, and other execution environments can be adopted, which are not described herein.

FIG. 1 is a flowchart of a new method for identifying named entities based on uncertainty quantization, which comprises the following steps:

Step 101: collecting and constructing a sample set for locating the entity position, and constructing a detection model of a candidate entity;

Step 102: then, for sequences in the sample set Wherein s _i,t,...,s_i,t+l is an entity, an ELMO model is adopted to obtain word-level word vectors, meanwhile, CNN is used to obtain character-level word vector representation of words for character-level features in words, and the vector formed by splicing the word-level word vectors and the character-level word vectors is a sample sequence word vector representation/>Next, using BiLSTM commonly used in the sequence task to obtain hidden layer representation h _i,j＝BiLSTM(e_i,j of each word in the sequence, adding self-attention to obtain hidden layer representation of each word in the sequence after weight, and finally respectively obtaining context feature representation of the entity and feature representation of the entity according to the position information of the positioning entity in the sequence; wherein the weights and contextual characteristics represent/>And entity itself feature representation/>The calculation mode of (2) is as follows:

α_i,j＝Attention(h_i,j)

Step 103: the method mainly aims at constructing an uncertainty measurement model of an entity, and is obtained by adopting context-entity self contrast loss; taking the context feature obtained in the step 102 as a negative example, taking the entity self feature obtained in the step 102 as a positive example, and respectively inputting the entity self feature into a model shared by two parameters, so that the higher the probability of obtaining the correct category from the entity prediction is, the better the probability of obtaining the correct category from the entity prediction is in the learning process, and the lower the probability of obtaining the correct category from the entity context feature is, the better the probability of obtaining the correct category from the entity prediction is. The specific calculation mode of the model 1 is as follows:

Model 2 is calculated in a similar manner to model 1, with the loss form:

Finally, given a candidate entity and its context, the class of the candidate entity is c, the model can output a probability, and 1 minus the probability value is the uncertainty metric v _i,t of the given candidate entity, which is calculated as follows:

step 104: calculating uncertainty of the entity context features and the entity itself features (between interventions 0 to 1); given a threshold, samples with uncertainties greater than the threshold are removed. The selection of the threshold may be obtained in accordance with a cross-validation approach.

Step 105: converting the uncertainty value of each entity of the sentence into a dropout probability; in the final entity recognition model training process, training to obtain a final named entity recognition model by adopting a Monte Carlo dropout training mode in a Bayesian neural network.

The specific method for converting the uncertainty value of each entity of the sentence into the dropout probability can be set according to the actual task. The simplest way is to directly take the uncertainty value as a dropout probability, or set the dropout value range to 0,0.5, and then map the uncertainty value linearly to this interval.

The invention is not limited to the embodiments described above. The above description of specific embodiments is intended to describe and illustrate the technical aspects of the present invention, and is intended to be illustrative only and not limiting. Numerous specific modifications can be made by those skilled in the art without departing from the spirit of the invention and scope of the claims, which are within the scope of the invention.

Claims

1. The named entity identification method based on uncertainty quantification is characterized by comprising the following steps of:

step 2, for the entities in the sample set, adopting BiLSTM and self_ attention network structures suitable for long text memory dependence to respectively obtain the representation of the entity context characteristics and the entity self characteristics; for sequences in a sample set Wherein s _i,t,...,s_i,t+l is an entity, a word-level word vector is obtained by adopting an ELMO model, meanwhile, for character-level features in words, CNN is used for obtaining character-level word vector representation of the words, and a vector formed by splicing the word-level word vector and the character-level word vector is a sample sequence word vector representation/>Next, using BiLSTM to obtain hidden layer representation h _i,j＝BiLSTM(e_i,j of each word in the sequence), adding self-attention to obtain hidden layer representation of each word in the sequence after weight, and finally respectively obtaining context characteristic representation of the entity and self characteristic representation of the entity according to the position information of the positioning entity in the sequence; wherein the weights and contextual characteristics represent/>

And entity itself feature representationThe calculation mode of (2) is as follows:

α_i,j＝Attention(h_i,j)

Step3, learning an uncertainty quantization model of an entity by adopting the ideas of contrast loss and parameter sharing, and giving an uncertainty value of each entity; taking the context feature obtained in the step 2 as a negative example, and taking the entity feature obtained in the step 2 as a positive example, respectively inputting the entity feature into two models with shared parameters, wherein the specific calculation mode and the loss function of the first model are as follows:

The second model is calculated by the following method and loss function:

Finally, given a candidate entity and its context, the class of the candidate entity is c, and a probability can be output, and 1 minus the probability value is the uncertainty metric v _i,t of the given candidate entity, which is specifically calculated as follows: