CN115859978A

CN115859978A - Named entity recognition model and method based on Roberta radical enhanced adapter

Info

Publication number: CN115859978A
Application number: CN202211389670.8A
Authority: CN
Inventors: 张蕾; 戴司宇; 张丽娟; 高蕾; 万健; 陈芳妮; 王海江; 黄杰
Original assignee: Zhejiang Lover Health Science and Technology Development Co Ltd
Current assignee: Zhejiang Lover Health Science and Technology Development Co Ltd
Priority date: 2022-11-08
Filing date: 2022-11-08
Publication date: 2023-03-28

Abstract

The invention belongs to the technical field of computer application, and discloses a named entity recognition model and a method based on a Roberta radical enhanced adapter, wherein the named entity recognition model comprises a radical adapter, a radical enhanced Roberta model and a conditional random field; the radical adapter is used for sending radical characteristics into the bottom layer of Roberta to fully fuse information; the radical enhanced Roberta model is used for extracting semantic features by using a Roberta model of a full word mask scheme; conditional random fields are used to output a conditional probability distribution model of another set of random variables given a set of input random variable conditions. The invention aims at the insufficiency of the context information in the short text and considers that the radical contains deep semantic information, and combines the radical feature to the bottom layer of Roberta to fully fuse the feature. And multiple groups of comparison experiments prove that the model has good performance and the bottom layer fused radical characteristics have greater superiority on the two data sets.

Description

Named entity recognition model and method based on Roberta radical enhanced adapter

Technical Field

The invention belongs to the technical field of computer application, and particularly relates to a named entity recognition model and method based on a Roberta radical enhanced adapter.

Background

Named entity recognition is the basis of many natural language processing tasks, aims at correctly recognizing entities in texts by referring, and has important influence on subsequent research due to the accuracy of a model. The named entity recognition technology is based on a rule dictionary method in the early days, and the method is poor in portability and consumes a large amount of manpower. Subsequently, named entity recognition methods based on machine learning are gradually coming into view of people, but such methods still require manually constructed features. With the development of natural language processing technology, deep learning methods gradually become mainstream models for named entity recognition. Especially in recent years, the pre-training model shows excellent performance in many tasks, and many scholars extract features by using the deep learning model and combine the features with the pre-training model, so that the effect of named entity recognition is improved. However, the current feature enhancement is still fusion at the model level, and the features cannot be deeply interacted.

The invention further enhances the feature fusion to relieve the problem of insufficient semantic information in the short text, thereby improving the recognition performance of the named entity. Therefore, a radical adapter is designed to blend radical information into the Roberta bottom layer, so that the features carry out deep knowledge interaction on the Roberta model bottom layer. In addition, conditional Random Fields (CRF) are used for sequence labeling to obtain optimal sequence tags, taking into account the dependencies between adjacent tags.

Disclosure of Invention

The invention aims to provide a named entity recognition model and a named entity recognition method based on a Roberta radical enhanced adapter, so as to solve the technical problems.

In order to solve the technical problems, the specific technical scheme of the named entity recognition model and method based on the Roberta radical enhanced adapter is as follows:

a named entity recognition model based on a Roberta radical enhanced adapter comprises a radical adapter, a radical enhanced Roberta model and a conditional random field; the radical adapter is used for sending radical characteristics into bottom-layer full fusion information of Roberta; the radical enhanced Roberta model is used for extracting semantic features by using a Roberta model of a full word mask scheme; the conditional random field is used to output a conditional probability distribution model of another set of random variables given a set of input random variable conditions.

Further, the radical adapter comprises means for performing the following steps:

the radical adapter input is divided into two parts of characters and radicals, a radical vector is aligned with a character vector by using double-line attention, then the aligned radical vector is combined with the character vector to obtain a character-radical pair representation, and finally the combined vector representation is subjected to a normalization layer to output a final result;

for a text with a character length of n, the character sequence is represented by output vectors of an encoding layer in Roberta

The radical information corresponding to the character sequence is encoded into a vector and expressed as the vector ^ er>

To align these two vector representations, the radical vector is non-linearly transformed, with the ith element:

/>

wherein W ₁ Is dimension d _c *d _r Matrix of W ₂ Is dimension d _c *d _c Matrix of b ₁ And b ₂ Is an offset term, d _r Dimension representing radical embedding, d _c Represents the dimension of the Roberta hidden layer;

and then adding the transformed radical vector and the character vector to obtain character-radical vector representation:

finally, outputting a final result through a dropout layer and a normalization layer, and fusing the character sequence and the radical sequence to generate a vector

Further, the radical enhanced Roberta model includes means for performing the following steps:

adding a special identifier [ CLS ] to the beginning of each sentence in the input part]Between sentences using SEP]Separating the separators, then embedding the input sequence by three parts to obtain sequence representation, wherein each input character is formed by adding token embedding, segment embedding and position embedding, and the character E in the sequence _t The following formula is formed:

E _t ＝E _{token_emb} +E _{seg_emb} +E _{pos_emb} #(3-3)

the most core part of the Roberta model is composed of 12 layers of transform encoders, wherein output vectors corresponding to [ CLS ] are used as semantic representations of the whole text;

the Roberta enhanced by the radical adapter is to inject the radical adapter into a certain layer of the Roberta, connect the radical adapter between certain transformers inside the Roberta and inject the external radical knowledge into the Roberta;

for a given text of n characters, the words thereofSymbol sequence C = { C ₁ ,c ₂ ,…,c _n Matching the characters with a radical dictionary to obtain a corresponding radical sequence R = { R } ₁ ,r ₂ ,…,r _n And then inputting the character sequence into an embedding layer of Roberta, inputting the obtained embedded representation into a transform coder, and firstly obtaining the output of k transform in order to inject dictionary information between the kth and the (k + 1) th transform

Then each pair of character and radical is passed through the radical adapter to obtain character-radical representation, and the ith character->

And the ith radical->

Past radical adapter is denoted as pick>

Then the sequence obtained by the radical adapter

Inputting the data into the remaining 12-k layer transformers to finally obtain the output T = { T = { (T) } ₁ ,t ₂ ,…,t _n }。

Further, the conditional random field includes means for performing the steps of:

given the output of the last layer of Roberta, T = { T = { T } ₁ ,t ₂ ,…,t _n H, first calculate the score of the predicted sequence as:

O＝W _o T+b _o #(3-5)

then y = { y) for the tag sequence ₁ ,y ₂ ,…,y _n The probability of is defined as shown in:

wherein Q is a transfer matrix, and Q is a transfer matrix,

representing slave label y _i-1 To the label y _i Is selected, is selected>

Representing a character t _i Is predicted as label y _i Is based on the score of->

All possible tag sequences, the numerator represents the score that the current tag sequence is the correct sequence, and the denominator represents the score of each sequence; />

Given N tag data

The model is trained by minimizing sentence-level negative log-likelihood loss as follows:

finally, in the decoding process, a Viterbi algorithm is adopted to find out the label sequence with the highest score, and the calculation formula is as follows:

wherein y is ^* The sequence that maximizes the score function is taken among all the tag sequences.

The invention also discloses a named entity identification method based on the Roberta radical enhanced adapter, which comprises the following steps:

step 1, utilizing a radical adapter to send radical characteristics to the bottom layer of Roberta to fully fuse characteristic information; the method comprises the steps that input of a user is divided into two parts, namely, a character and a radical, the radical vector representation is aligned with a character vector through nonlinear transformation, then the aligned radical vector is combined with the character vector to obtain a character-radical pair representation, and finally the combined vector representation is output through a normalization layer to obtain a final result;

step 2, radical enhancement Roberta: connecting a radical adapter between transformers inside Roberta, thereby injecting external radical knowledge into Roberta;

and 3, finding out a label sequence path with the maximum probability for the input sequence by using the conditional random field.

Further, the step 1 comprises the following specific steps:

step 1.1: firstly, for a text with a character length of n, the character sequence is represented by output vectors of an encoding layer in Roberta

The radical information corresponding to the character sequence is encoded into a vector and expressed as the vector

step 1.2: and then adding the transformed radical vector and the character vector to obtain character-radical vector representation:

step 1.3: finally, outputting a final result through a dropout layer and a normalization layer, and fusing the character sequence and the radical sequence to generate a vector

Further, the step 2 comprises the following specific steps:

for a given text of n characters, its character sequence is C = { C = { C } ₁ ,c ₂ ,…,c _n Matching characters with a radical dictionary to obtain a corresponding radical sequence R = { R = } ₁ ,r ₂ ,…,r _n And then inputting the character sequence into an embedding layer of Roberta, inputting the obtained embedded representation into a transform coder, and firstly obtaining the output of k transform in order to inject dictionary information between the kth and the (k + 1) th transform

And the ith radical->

Pass radical adapter denoted as +>

/>

Then the sequence obtained by the radical adapter

Inputting the output into the remaining 12-k layers of transformers to finally obtain the output T = { T = { (T) } ₁ ,t ₂ ,…,t _n }。

Further, the step 3 comprises the following specific steps:

given the output of the last layer of Roberta, T = { T = { T } ₁ ,t ₂ ,…,t _n First, the score of the predicted sequence is calculated as follows:

O＝W _o T+b _o #(3-5)

wherein Q is a transfer matrix, and Q is a transfer matrix,

representing slave label y _i-1 To the label y _i Is selected, is selected>

All possible tag sequences, the numerator represents the score that the current tag sequence is the correct sequence, and the denominator represents the score of each sequence;

given N tag data

By minimizing sentence-level negative log-likelihood lossThe missed training model is as follows:

and finally, in the decoding process, finding out the label sequence with the highest score by adopting a Viterbi algorithm, wherein the calculation formula is as follows:

The named entity recognition model and method based on the Roberta radical enhanced adapter have the following advantages: aiming at the defects of the context information in the short text and considering that the radical contains deep semantic information, the model provides a new feature fusion scheme, and combines the radical features to the bottom layer of Roberta to fully fuse the features. And multiple groups of comparison experiments prove that the model has good performance and the bottom layer fused radical characteristics have greater superiority on the two data sets.

Drawings

FIG. 1 is a block diagram of the named entity recognition model based on the Roberta radical enhanced adapter.

Fig. 2 is a diagram of the radical adapter structure of the present invention.

FIG. 3 is a diagram of the original Roberta model architecture.

Fig. 4 is a Roberta diagram of radical enhancement of the present invention.

Detailed Description

In order to better understand the purpose, structure and function of the present invention, the named entity recognition model and method based on Roberta radical enhanced adapter of the present invention are described in further detail below with reference to the accompanying drawings.

As shown in FIG. 1, in the named entity recognition model based on the Roberta radical enhanced adapter, firstly, semantic information is extracted by using the Roberta model, meanwhile, radical features are combined into a transform encoder layer of Roberta through a radical adapter, the radical information and the semantic information are fully interacted in a Roberta bottom layer, and finally, the fused semantic information is input into a CRF layer to be decoded to obtain a final mark sequence.

The Radical text (Radical _ txt) source comprises a Xinhua dictionary and a Baidu Chinese dictionary, and the characters and the radicals are used for generating a dictionary of key-value pairs, so that a basis is provided for subsequent fusion of Radical characteristics. For inputting a sentence containing n characters, the original character sequence C = { C = ₁ ,c ₂ ,c ₃ ,…,c _n Is input to Roberta to extract semantic information and match each character in the sentence with the radical dictionary to get the corresponding radical sequence R = { R = ₁ ,r ₂ ,r ₃ ,…,r _n And (4) fusing the semantic information into Roberta to carry out deep knowledge interaction, and finally sending the fused semantic information into a CRF decoding layer to obtain a label sequence. The model provided by the invention aims to improve the accuracy of named entity recognition by enhancing the interaction of features, and the structure of the model is described in detail later.

The named entity recognition model based on the Roberta radical enhanced adapter comprises the radical adapter, the radical enhanced Roberta model and a conditional random field.

Radical adapter:

each element in a sentence consists of two types of information, namely character features and radical features. In order to realize deeper feature interaction, a radical adapter is designed, and radical features are sent to the bottom layer of Roberta to fully fuse information. The specific radical adapter structure is shown in fig. 2, the input of which is divided into two parts, namely, a character and a radical, and the radical vector is aligned with the character vector by using double-line attention, then the aligned radical vector is combined with the character vector to obtain a character-radical pair representation, and finally the combined vector representation is output through a normalization layer.

To align these two vector representations, the radical vector is non-linearly transformed, taking the ith element as an example:

wherein W ₁ Is dimension d _c *d _r Matrix of W ₂ Is dimension d _c *d _c Matrix of b ₁ And b ₂ Is an offset term, d _r Dimension representing radical embedding, d _c Representing the dimensions of the Roberta hidden layer.

Radical enhanced Roberta model:

the original Roberta model is an improved model based on BERT, not only inherits the advantages of the BERT model, utilizes a transform encoder as an intermediate layer to extract semantic information, but also improves some aspects of the BERT model in order to capture semantic features of more layers, the first improvement is to use a dynamic mask, the BERT model uses a static mask, namely, a masked token is not changed in the training process, and the Roberta adopts the dynamic mask, the masked position is continuously updated in each training, the randomness of model input data is improved, and the learning capability of the model is improved. The second improvement is to remove the Next Sequence Prediction (NSP) task, the NSP task in BERT is used to judge whether the two input sentences are continuous, and Roberta removes the NSP task, and instead uses continuous full-sense and doc-sense as input, so as to increase the length of the input Sentence to 512 characters at most, which is much higher than the maximum input 256 characters of BERT model. Subsequently, a Roberta model based on a full-word mask technology is provided by the Hagong-Daiffei combined laboratory, and is called as a Roberta-wwm-ext model, the model improves the original single-character mask and provides a full-word mask scheme, chinese word segmentation operation in natural language processing is fully considered, word is used as granularity for shielding, and the full-word mask scheme can help to capture semantic features at the Chinese word level, so that the performance of the Roberta model is further improved. An example of a comparison between the single-word mask and full-word mask schemes is shown in table 1.

Table 1 examples of different masking schemes

Therefore, the invention uses the Roberta model of the full word mask scheme to extract semantic features, and the model structure diagram is shown in FIG. 3. Adding a special identifier [ CLS ] to the beginning of each sentence in the input part]Between sentences using SEP]The separators are separated, and then the input sequence obtains a sequence representation through three-part embedding, specifically, each input character is formed by adding three parts of token embedding, segment embedding and position embedding, for example, a certain character E in the sequence _t The following formula is formed:

E _t ＝E _{token_emb} +E _{seg_emb} +E _{pos_emb} #(3-3)

the most core part of the Roberta model is composed of 12 layers of transform encoders, semantic features can be fully extracted, character dependency is captured, and finally each character vector representation fused with full-text semantic information is output, wherein an output vector corresponding to [ CLS ] serves as semantic representation of the whole text.

Radical Adapter (RA) enhanced Roberta is to inject a Radical Adapter into a layer of Roberta, whose structure is shown in fig. 4. Specifically, the radical adapter is connected between certain transformers inside Roberta, thereby injecting external radical knowledge into Roberta.

For a given text of n characters, its character sequence is C = { C = { C } ₁ ,c ₂ ,…,c _n Matching characters with a radical dictionary to obtain a corresponding radical sequence R = { R = } ₁ ,r ₂ ,…,r _n }. Then inputting the character sequence into the Roberta embedding layer, inputting the obtained embedded representation into a Transformer encoder, and firstly obtaining the output of k transformers in succession in order to inject dictionary information between the kth and the (k + 1) th transformers

Then each pair of character and radical gets the character-radical representation through the radical adapter, such as the ith character->

And the ith radical->

Past radical adapter is denoted as pick>

Then the sequence obtained by the radical adapter

Conditional random field:

conditional Random Fields (CRFs), which are Conditional probability distribution models that output a set of Random variables given a set of input Random variable conditions, are widely used in sequence labeling tasks. The CRF model can fully utilize rich internal and context characteristic information in the process of labeling to find a label sequence path with the maximum probability for an input sequence.

O＝W _o T+b _o #(3-5)

then y = { y for tag sequences ₁ ,y ₂ ,…,y _n The probability is defined as shown below:

wherein Q is a transfer matrix, and Q is a transfer matrix,

indicating slave label y _i-1 To the label y _i Is selected, is selected>

Is all possible tag sequences, the numerator represents the score for the current tag sequence as the correct sequence, and the denominator represents the score for each sequence.

Given N tag data

Training by minimizing sentence-level negative log-likelihood lossThe model is as follows:

The invention discloses a named entity identification method based on a Roberta radical enhanced adapter, which comprises the following steps:

step 1, utilizing the radical adapter designed by the invention to send radical characteristics into the bottom layer of Roberta to fully fuse characteristic information; the method comprises the following steps of dividing input of a user into two parts of characters and radicals, aligning radical vector representation with character vectors through nonlinear transformation, combining the aligned radical vectors with the character vectors to obtain character-radical pair representation, and outputting a final result of the combined vector representation through a normalization layer, wherein the specific steps are as follows:

(1) Firstly, for a text with a character length of n, the character sequence is represented by output vectors of an encoding layer in Roberta

(2) And then adding the transformed radical vector and the character vector to obtain character-radical vector representation:

(3) Finally, outputting a final result through a dropout layer and a normalization layer, and fusing the character sequence and the radical sequence to generate a vector

Step 2, strengthening Roberta by radicals; the Roberta enhanced by Radical Adapter (RA) is to connect the Radical Adapter between some transformers inside Roberta, so as to inject the knowledge of the external radicals into Roberta, as follows:

for a given text of n characters, its character sequence is C = { C = { C } ₁ ,c ₂ ,…,c _n Matching characters with a radical dictionary to obtain a corresponding radical sequence R = { R = } ₁ ,r ₂ ,…,r _n }. Then inputting the character sequence into the Roberta embedding layer, inputting the obtained embedded representation into a Transformer encoder, and in order to inject dictionary information between the kth and the (k + 1) th transformers, we first obtain the outputs of k transformers in succession

And the ith radical->

Past radical adapter is denoted as pick>

Then the sequence obtained by the radical adapter

And 3, finding a label sequence path with the maximum probability for the input sequence by using a Conditional Random Field (CRF). The method comprises the following specific steps:

given the output T = { T) of the last layer of Roberta ₁ ,t ₂ ,…,t _n First, the score of the predicted sequence is calculated as follows:

O＝W _o T+b _o #(3-5)

then y = { y) for the tag sequence ₁ ,y ₂ ,…,y _n The probability is defined as shown below:

wherein Q is a transfer matrix, and Q is a transfer matrix,

indicating slave label y _i-1 To the label y _i Is selected, is selected>

Representing a character t _i Is predicted as label y _i In the score of (c), in the score of (c)>

Given N tag data

Procedure of experiment

1 data set of experiments

The experiment used the chinese medical dataset CCKS2017 and the chinese Resume dataset Resume. The CCKS2017 dataset, labeled with 5 entity types, examination, signs of symptoms, disease diagnosis, treatment and body part, was divided into training and testing sets in the ratio of 5. The Resume data set is labeled with 8 entity types, namely nationality, education institution, address, name, organization name, specialty, ethnicity and job, and is divided into a training set, a verification set and a test set according to the proportion of 8.

TABLE 2 CCKS2017 data set entity types and numbers

TABLE 3 Resume data set entity types and numbers

2 evaluation index

The accuracy (Precision, P), recall (Recall, R) and F1 value evaluation indexes are adopted in the experiment to measure the quality of the effect of the named entity recognition model, and the calculation formula is as follows:

where TP represents the number of positive samples determined to be positive, FP represents the number of negative samples determined to be positive, and FN represents the number of positive samples determined to be negative. P represents the proportion of correctly predicted results to all predicted results, R represents the proportion of correctly predicted results to all data, and the F1 value is the harmonic mean of P and R.

3 Experimental Environment and parameter settings

The named entity recognition model of the experiment is based on a Pythrch framework, and the specific experimental environment setting is shown in Table 4.

TABLE 4 Experimental Environment

The detailed parameter settings for the experiments are as follows: the Roberta model using the full word mask method comprises 12 layers of transformers, and a radical adapter is added between the first layer and the second layer of the Roberta model. Roberta has a hidden layer dimension of 768, a maximum sequence length of 256, an initial learning rate of 1e-5 and using an Adam optimizer, a batch size of 30, and a number of epochs of 30 training on all datasets.

4 experimental comparison and result analysis

(1) Pre-training model comparisons

First to demonstrate the effectiveness of the Roberta model, roberta was compared with BERT in two data sets, and the experimental comparison results are shown in table 5.

TABLE 5 Pre-training model comparisons

From tables 3-5, it can be seen that Roberta extracts entities better than BERT on both datasets, specifically, the F1 values of Roberta model at CCKS2017 and Resume are 0.77% and 0.49% higher than BERT model, respectively. Thus, the more powerful Roberta model is used as a baseline model for named entity identification.

(2) Compared with the existing research method

Many researchers have conducted named entity recognition studies on CCKS2017 datasets, first, li et al, using the BiLSTM-CRF model in combination with word embedding in the professional field, by additional lexical features to improve the effect of named entity recognition. Wang et al constructs a domain-related dictionary and integrates dictionary features into a BilSTM-CRF model to improve the effect of named entity recognition. Qiu et al input character and dictionary features into residual augmented convolutional neural network (RDCNN) to capture context featuresThe dependencies between adjacent tags are then captured by the CRF. Tang et al propose a deep learning method integrating a language model and an attention mechanism, which extracts semantic features by using a bidirectional gated cyclic network (BiGRU) and a pre-training language model, and then further captures the features by using another layer of BiGRU and an attention module. Subsequently, because the radical information in the Chinese character also contains certain semantic information, yin and the like utilize a Convolutional Neural Network (CNN) to extract radical features, combine the radical features with character representation, perform modeling through a BilSTM model, and then capture long-term dependency relationship between a single character and the context through an attention mechanism. Wu et al ^[22] Capturing the characteristics of the radicals by using a BilSTM model, fully extracting character representation containing context semantic information by using Roberta, and splicing the characteristics of the radicals and the characteristics of the characters to obtain an optimal label sequence through CRF.

TABLE 6 comparison of existing methods on CCKS2017 data set

Specific method results the named entity recognition model proposed by the present invention achieves the best results among all models, as shown in table 6. Li et al model performs poorly because segmentation errors inevitably occur with the word segmentation method at the chosen word level, resulting in subsequent erroneous recognition of word boundaries. Wang and Qiu, etc. enhance certain entity extraction capability by adding dictionary features, and Tang, etc. further capture the dependency relationship among tags by the superposition of multilayer models. Yin and the like combine radical information to realize a higher F1 value, and prove the effectiveness of the radical characteristics. Wu and the like combine the characteristics of the radicals with the characteristics extracted by the Roberta model, and prove that the pre-training-based language model is beneficial to further improving the performance of entity recognition. However, the models are based on modularized fusion, the model inputs the radical characteristics to the Roberta bottom layer to fully fuse the characteristics, and the experimental result proves that the bottom layer characteristic fusion can further improve the model performance.

Many researchers have also conducted named entity recognition research on the Resume data set, and Zhang et al proposed a named entity recognition method (Lattice-LSTM) that fuses vocabulary and character information in consideration of the problem that the word granularity-based model cannot utilize vocabulary information. Li and the like provide a FLAT model aiming at the problems that the structure of the Lattice model is too complex and difficult to utilize, the Lattice model converts the Lattice structure into a planar structure consisting of spans, and position coding is added to improve the parallel capability. Li et al also combine FLAT with BERT pre-training models to further improve the model identification effect. Wei et al integrate external dictionary knowledge into the BERT layer, and depth fusion of features at the bottom layer enhances the performance of the model.

TABLE 7 comparison of existing methods on the Resume dataset

The results of the specific method comparison are shown in Table 7, and the following results can be obtained. The combination of features and a pre-training model can enhance the recognition capability of the model, for example, li and the like improve the F1 value by 0.93 percent after BERT combination and FLAT combination. Wei et al fuse features into the BERT pre-training model bottom layer, and the experimental results show that the bottom layer fusion features can further enhance entity recognition performance. The invention considers that the Chinese radicals also contain deep semantic information, the radical information is merged into Roberta, the optimal performance is realized in a comparison experiment, and the effectiveness of the radical characteristics and the superiority of bottom layer merging characteristics are proved.

(3) Ablation experiment

Results were compared to Roberta-CRF with the addition of the radical adapter, using Roberta-CRF as the baseline model. The results on the CCKS2017 dataset are shown in Table 8 and contain a score for each entity class. It can be seen that the identification of the "examination" and "symptom sign" categories is the best, specifically, the F1 values at baseline are 95.78% and 96.30%, respectively, and the F1 values of the method of the present invention are 96.90% and 98.17%, respectively, but the entity identification of the "treatment" category is poor, and the F1 values at baseline and the method of the present invention are 63.63% and 71.17%, respectively, which may affect the learning ability of the neural network due to the difference in the sample size of the categories. The F1 values of the model of the invention were significantly higher in the "examination", "signs of symptoms" and "treatment" categories than in the baseline model, and effects similar to baseline were also achieved in the "diagnosis of disease" and "treatment" categories of the model of the invention. Finally, the model of the invention achieves better results in the overall category, with F1 values 2.20% higher than the baseline model, which demonstrates the effectiveness of the underlying fused radical features.

TABLE 8 comparison of classes of entities and populations on CCKS2017 data set

And the present invention further compares the baseline model with the proposed model on the Resume dataset, as shown in table 9, since the distribution of the entities of each category of the dataset is not uniform, only the overall score is compared. It can be seen that the F1 value of the model of the present invention is higher than the baseline model by 0.62%, thereby further verifying the effectiveness of the underlying fused radical feature.

TABLE 9 comparison of baseline and invention models on Resume data set

The invention provides a named entity recognition model and a named entity recognition method based on a Roberta radical adapter. Aiming at the situation that the context information in the short text is insufficient and the fact that the radicals contain deep semantic information is considered, the model provides a new feature fusion scheme, and the features of the radicals are combined to the bottom layer of Roberta to fully fuse the features. And the performance of the model and the superiority of the bottom-layer fusion radical characteristics are proved through a plurality of groups of comparison experiments on the two data sets.

It is to be understood that the present invention has been described with reference to certain embodiments, and that various changes in the features and embodiments, or equivalent substitutions may be made therein by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. A named entity recognition model based on a Roberta radical enhanced adapter is characterized by comprising a radical adapter, a radical enhanced Roberta model and a conditional random field; the radical adapter is used for sending radical characteristics into bottom-layer full fusion information of Roberta; the radical enhancement Roberta model is used for extracting semantic features by using a Roberta model of a full word mask scheme; the conditional random field is used to output a conditional probability distribution model of another set of random variables given a set of input random variable conditions.

2. The Roberta radical enhanced adapter based named entity recognition model according to claim 1, characterized in that the radical adapter comprises means to perform the following steps:

The character sequence corresponds toIs encoded as a vector and expressed as a vector @>

3. The Roberta-radical-enhanced adapter-based named entity recognition model according to claim 1, wherein the radical-enhanced Roberta model comprises means for performing the following steps:

E _t ＝E _{token_emb} +E _{seg_emb} +E _{pos_emb} #(3-3)

radical adapter enhanced Roberta is to inject a radical adapter into a layer of Roberta, connect the radical adapter between certain transformers inside Roberta, and thus inject external radical knowledge into Roberta;

for a given text of n characters, its character sequence is C = { C = { C = ₁ ,c ₂ ,…,c _n Matching characters with a radical dictionary to obtain a corresponding radical sequence R = { R = } ₁ ,r ₂ ,…,r _n And then inputting the character sequence into an embedding layer of Roberta, inputting the obtained embedded representation into a transform coder, and firstly obtaining the output of k transform in order to inject dictionary information between the kth and the (k + 1) th transform

And the ith radical->

Past radical adapter is denoted as pick>

/>

Then the sequence obtained by the radical adapter

4. The Roberta-radical-enhanced-adapter-based named entity recognition model of claim 1, wherein the conditional random field comprises means for performing the following steps:

O＝W _o T+b _o #(3-5)

wherein Q is a transfer matrix, and Q is a transfer matrix,

indicating slave label y _i-1 To the label y _i Is selected, is selected>

given N tag data

5. A method for named entity recognition using the Roberta radical enhanced adapter based named entity recognition model according to any of claims 1-4, comprising the steps of:

and 3, finding a label sequence path with the maximum probability for the input sequence by using the conditional random field.

6. The method according to claim 5, wherein the step 1 comprises the following specific steps:

7. The method according to claim 5, wherein the step 2 comprises the following specific steps:

for a given text of n characters, the characters thereofThe sequence is C = { C ₁ ,c ₂ ,…,c _n Matching the characters with a radical dictionary to obtain a corresponding radical sequence R = { R } ₁ ,r ₂ ,…,r _n And then inputting the character sequence into an embedding layer of Roberta, inputting the obtained embedded representation into a transform coder, and firstly obtaining the output of k transform in order to inject dictionary information between the kth and the (k + 1) th transform