CN116384401A

CN116384401A - Named entity recognition method based on prompt learning

Info

Publication number: CN116384401A
Application number: CN202310399388.6A
Authority: CN
Inventors: 胡政; 胡岩峰; 乔雪; 彭晨; 索荣田; 李熙雨; 程嘉远; 向镐鹏; 闵飞; 潘宇顺
Original assignee: Suzhou Aerospace Information Research Institute
Current assignee: Suzhou Aerospace Information Research Institute
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2023-07-04

Abstract

The invention discloses a named entity recognition method based on prompt learning, which utilizes a text representation model consistency to calculate the similarity between a text sequence and a candidate sample template, selects the most similar candidate sample template to splice into the text sequence in a context mode, uses a transducer-1 encoder to encode, maps into an entity boundary discrimination vector through a linear mapping layer, and obtains a candidate entity boundary predicted value through a conditional random field to obtain a candidate entity fragment; inserting a candidate entity segment separator into the text sequence by utilizing the candidate entity boundary predicted value, constructing entity boundary perception template input, encoding by using a transducer-2 encoder, and averaging character vectors in the candidate entity segment to obtain a candidate entity segment vector; and mapping the candidate entity category discrimination vector into a candidate entity category discrimination vector through a linear mapping layer, and obtaining a candidate entity category predicted value by using a softmax function to obtain the identified named entity. The invention improves the accuracy of named entity identification.

Description

Named entity recognition method based on prompt learning

Technical Field

The invention relates to a computer natural language processing technology, in particular to a named entity recognition method based on prompt learning.

Background

The recognition of the famous entity is used as a basic research task in natural language processing, aims to detect entity boundaries from texts and divide entity categories, is a necessary and key preprocessing step for many natural language processing tasks, and the quality of the result performance can directly influence the results of other tasks such as follow-up relation extraction and the like. Therefore, the named entity recognition can be efficiently and accurately completed, and the performance of other natural language processing tasks can be effectively improved.

In recent years, with the development of deep pre-training language models, the sequence labeling architecture based on the deep pre-training language models such as BERT [1], XLNET [2] and ERNIE [3] makes breakthrough progress in the task of identifying named entities by utilizing large-scale labeling data. For example, document [4] uses a BERT pre-trained language model for text representation learning, extracting features with an iteratively expanding convolutional network and a long-term memory network, achieving excellent performance over multiple data sets. Document [5] obtains the enhancement word embedding by using BERT on the basis of BiLSTM-CRF model, and realizes the named entity recognition based on the enhancement word embedding. And the literature [6] utilizes ALBERT, and is based on a deep multi-network collaboration mechanism, so that the accuracy of named entity identification is effectively improved. Document [7] uses stroke features for named entity recognition, inputs stroke sequences, and improves the ELMo model. However, in low-resource scenes such as military national defense, medical images and the like which lack large-scale labeling data, the methods all suffer from a certain degree of characteristic collapse problem (namely, the quality of feature vectors derived from a pre-training language model in the low-resource scene is lower), so that named entities cannot be accurately and efficiently identified.

[1]Devlin J,Chang M W,Lee K,et al.Bert:Pre-training ofdeep bidirectional transformers for languageunderstanding[J].arXivpreprintarXiv:1810.04805,2018.

[2]Yang Z,Dai Z,Yang Y,et al.Xlnet:Generalized autoregressive pretraining for language understanding[J].Advances in neuralinformationprocessing systems,2019,32.

[3]Zhang Z,Han X,Liu Z,et al.ERNIE:Enhanced Language Representation with Informative Entities[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:1441-1451.

[4]Chang Y,Kong L,Jia K,et al.Chinese named entity recognition method based on BERT[C]//2021IEEE International Conference on Data Science and Computer Application(ICDSCA).IEEE,2021:294-299.

[5]Jia B,Wu Z,Wu B,et al.Enhanced character embedding for Chinese named entity recognition[J].MeasurementandControl,2020,53(9-10):1669-1681.

[6]Yao L,Huang H,Wang KW,et al.Fine-grained mechanical Chinese named entity recognition basedonALBERT-AttBiLSTM-CRF andtransfer learning[J].Symmetry,2020,12(12):1986.

[7] Luo Ling, yang Zhihao, song Yawen, etc. Chinese electronic medical record naming entity recognition study based on stroke ELMo and multitasking learning [ J ] computer science newspaper 2020,43 (10): 15.

Disclosure of Invention

The invention aims to provide a named entity recognition method based on prompt learning so as to solve the problems of representation collapse and the like in a low-resource scene.

The technical solution for realizing the purpose of the invention is as follows: a named entity recognition method based on prompt learning comprises the following steps:

step 1, calculating the similarity between a text sequence and a candidate sample template by using a text representation model Consert, selecting the most similar candidate sample template to splice the template into the text sequence in a context mode, and encoding by using a transducer-1 encoder;

step 2, mapping the output of the transducer-1 encoder into an entity boundary discrimination vector through a layer of linear mapping layer, and obtaining a candidate entity boundary predicted value through a conditional random field to obtain a candidate entity fragment;

step 3, inserting a candidate entity segment separator/in the text sequence by using the candidate entity boundary predicted value, constructing entity boundary perception template input, encoding by using a transducer-2 encoder, and averaging character vectors in the candidate entity segments to obtain candidate entity segment vectors;

and 4, mapping the model into a candidate entity category discrimination vector through a linear mapping layer, and obtaining a candidate entity category predicted value by using a softmax function to obtain the identified named entity.

Further, in step 1, similarity between the text sequence and the candidate sample template is calculated by using a text representation model Consert, the candidate sample template which is the most similar is selected to be spliced into the text sequence in the form of context, and is encoded by using a transducer-1 encoder, and the specific formula is as follows:

wherein,,

representing a candidate sample example template, t representing a candidate sample representationTemplate length,/represents separator, +.>

Representation->

Is the t candidate entity fragment of (1), x= { X ₁ ,x ₂ ,…,x _n -text sequence is represented, n text sequence length is represented, consert (·) similarity between two text sequences is calculated, D represents candidate sample example template set, < >>

Representing the selected sample example template, h= { H ₁ ,h ₂ ,…,h _n The text sequence X encoded output, +.>

Sample template representation +.>

And outputting the coded output.

In step 2, the output of the transducer-1 encoder is mapped into an entity boundary discrimination vector through a linear mapping layer, and a candidate entity boundary predicted value is obtained through a conditional random field, so that a candidate entity fragment is obtained, and the specific formula is as follows:

o _i ＝W·h _i +b (3)

wherein h is _i Is the output of the transducer-1 encoder for the ith character, o _i Entity boundary discrimination vectors representing the ith character, W, b represent trainable parameters,

candidate entity boundary prediction representing the ith characterValue of->

The candidate entity boundary which represents the ith character possibly takes a value, T= { B, I, E, D } represents a candidate entity boundary taking a value set, and +.>

And

representation modeling from +.>

Transfer to->

Is used for training the parameters of the system.

Further, in step 3, a candidate entity boundary predictor is utilized to insert a candidate entity segment separator "/" into the text sequence, an entity boundary perception template is constructed and input, a transducer-2 encoder is used for encoding, character vectors in the candidate entity segment are averaged, and a candidate entity segment vector is obtained, wherein the specific formula is as follows:

wherein,,

representing entity boundary perception template input, m represents candidate entity fragment number, w _m Representing the mth candidate entity segment obtained from the candidate entity boundary prediction value,/representing the candidate entity segment separator, average () representing the character vector in the Average candidate entity segment,/and>

representing candidate entity fragment vectors.

In step 4, a linear mapping layer is mapped to a candidate entity class discrimination vector, and a candidate entity class predicted value is obtained by using a softmax function, so that the identified named entity is obtained, and the specific formula is as follows:

wherein,,

and->

Representing trainable parameters->

Representing the i candidate entity fragment vector, +.>

Representing the i candidate entity fragment class prediction value.

Compared with the prior art, the invention has the remarkable advantages that: prompt learning is integrated, additional priori knowledge is provided by using a sample example template and an entity boundary perception template, entity perception characterization is generated, the problem of characterization collapse of a mainstream named entity recognition method in a low-resource scene is effectively avoided, and named entity recognition accuracy is improved.

Drawings

FIG. 1 is a flow chart of a named entity recognition method based on prompt learning;

FIG. 2 is a diagram of a named entity recognition model based on prompt learning.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

A named entity recognition method based on prompt learning comprises the following steps:

step 1, calculating the similarity between a text sequence and a candidate sample template by using a text representation model Consert, selecting the most similar candidate sample template to splice the template into the text sequence in a context mode, and encoding by using a transducer-1 encoder, wherein the specific formula is as follows:

wherein,,

representing a candidate sample template, t representing the candidate sample template length,/representing a separator, ++>

Representation->

T candidate entity fragment, t= { x ₁ ,x ₂ ,…,x _n -text sequence is represented, n text sequence length is represented, consert (·) similarity between two text sequences is calculated, D represents candidate sample example template set, < >>

Representation ofSample example template->

And outputting the coded output.

Step 2, mapping the output of the transducer-1 encoder into an entity boundary discrimination vector through a layer of linear mapping layer, and obtaining a candidate entity boundary predicted value through a conditional random field to obtain a candidate entity fragment, wherein the specific formula is as follows:

o _i ＝W·h _i +b (3)

candidate entity boundary prediction value representing the ith character,/->

The candidate entity boundary which represents the ith character possibly takes a value, T= { B, I, E, S } represents a candidate entity boundary taking a value set, and +.>

And

representation modeling from +.>

Transfer to->

Is used for training the parameters of the system.

And 3, inserting a candidate entity segment separator "/" into the text sequence by utilizing the candidate entity boundary predicted value, constructing an entity boundary perception template input, encoding by using a transducer-2 encoder, and averaging character vectors in the candidate entity segments to obtain candidate entity segment vectors, wherein the specific formula is as follows:

wherein,,

representing candidate entity fragment vectors.

And 4, mapping the model into a candidate entity category discrimination vector through a linear mapping layer, obtaining a candidate entity category predicted value by using a softmax function, and obtaining the identified named entity, wherein the specific formula is as follows:

wherein,,

and->

Representing trainable parameters->

Representing the ith candidateEntity fragment vector->

Representing the i candidate entity fragment class prediction value.

Examples

To verify the effectiveness of the inventive protocol, the following experiments were performed.

Given the text sequence [ Harbin is cold in winter ], the named entity is Harbin with category LOC. The method of the invention is adopted to identify the named entity in the text sequence, and the specific implementation steps are as follows:

step 1, calculating the similarity between a text sequence and a candidate sample template by using a text representation model Consert, selecting the most similar candidate sample template to splice the template into the text sequence in a context mode, and encoding by using a transducer-1 encoder to obtain H= [ H ] ₁ ,h ₂ ,…,h ₈ ]；

Step 1.1, calculating the similarity between the text sequence [ Haerbin winter good coldness) and all templates in the candidate sample template set D by using a Consert (·) function, and obtaining the most similar sample templates [ Luoyang/peony/nice ].

Step 1.2, spliced text sequence [ Haerbin winter good Cold ]]And [ Luoyang// peony/nice looking ]]Obtaining [ Harbin winter good cold SEP Luoyang/peony/good looking ]]Inputting a transducer-1 encoder, wherein SEP represents an inter-sentence separator to obtain H= [ H ] ₁ ,h ₂ ,…,h ₈ ]。

Step 2, H= [ H ] ₁ ,h ₂ ,…,h ₈ ]Mapping the vector into entity boundary discrimination vector by a linear mapping layer, and obtaining candidate entity boundary predicted value by a conditional random field

Wherein B represents that the corresponding position character is the beginning character of the candidate entity segment, I represents that the corresponding position character is the middle character of the candidate entity segment, E represents that the corresponding position character is the ending character of the candidate entity segment, S represents that the corresponding position character is the single character candidate entity segment, and the candidate entity is obtainedThe body segment is Harbin, winter, good and cold;

step 3, for text sequence [ Haerbin winter good Cold ]]Adding a separator/(neglecting the first character of the text sequence) before the character with the predicted value of the candidate entity boundary being B, and adding a separator/(neglecting the first character of the text sequence) before the character with the predicted value of the candidate entity boundary being S, thereby obtaining the entity boundary perception template input [ Harbin/winter/good/cold ]]Encoding by using a transducer-2 encoder, and averaging character vectors in the candidate entity fragments to obtain candidate entity fragment vectors

Step 4, will

Mapping the candidate entity category discrimination vector by a linear mapping layer, and obtaining a candidate entity category predicted value +.>

And acquiring the identified LOC entity halbine.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A named entity recognition method based on prompt learning is characterized by comprising the following steps:

2. The named entity recognition method based on prompt learning according to claim 1, wherein in step 1, similarity between a text sequence and a candidate sample template is calculated by using a text representation model consirt, a candidate sample template which is the most similar is selected to be spliced into the text sequence in a context form, and is encoded by using a transducer-1 encoder, and a specific formula is as follows:

wherein,,

Representation->

Sample template representation +.>

And outputting the coded output.

3. The named entity recognition method based on prompt learning according to claim 2, wherein in step 2, the output of a transducer-1 encoder is mapped into an entity boundary discrimination vector through a linear mapping layer, and a candidate entity boundary prediction value is obtained through a conditional random field, so as to obtain a candidate entity segment, and the specific formula is as follows:

o _i ＝W·h _i +b (3)

wherein h is _i Transformer-1 encoder that is the ith characterOutput, o _i Entity boundary discrimination vectors representing the ith character, W, b represent trainable parameters,

candidate entity boundary prediction value representing the ith character,/->

Candidate entity boundary representing the ith character may take on value, t= { B, I, E, S } represents candidate entity boundary take on value set, +.>

And->

Representation modeling from +.>

Transfer to->

Is used for training the parameters of the system.

4. The recognition method of named entity based on prompt learning according to claim 3, wherein in step 3, a candidate entity segment separator "/" is inserted into a text sequence by using a candidate entity boundary predicted value, an entity boundary perception template input is constructed, a transducer-2 encoder is used for encoding, character vectors in candidate entity segments are averaged, and a candidate entity segment vector is obtained, wherein the specific formula is as follows:

wherein,,

representing candidate entity fragment vectors.

5. The method for recognizing named entity based on prompt learning according to claim 4, wherein in step 4, a layer of linear mapping layer is mapped into a candidate entity class discrimination vector, and a candidate entity class predicted value is obtained by using a softmax function, so as to obtain the recognized named entity, and the specific formula is as follows:

wherein,,

and->

Representing trainable parameters->

Representing the i candidate entity fragment vector, +.>

Representing the i candidate entity fragment class prediction value.

6. A named entity recognition system based on prompt learning, characterized in that named entity recognition based on prompt learning is realized based on named entity recognition according to any one of claims 1-5.

7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing named entity recognition based on hint learning based on named entity recognition of any of claims 1-5 when the computer program is executed.

8. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements named entity recognition based on hint learning based on named entity recognition according to any of claims 1-5.