CN116432755A

CN116432755A - Weight network reasoning method based on dynamic entity prototype

Info

Publication number: CN116432755A
Application number: CN202310434601.2A
Authority: CN
Inventors: 王丹; 姚建超; 宋彬; 秦浩
Original assignee: Hangzhou Research Institute Of Xi'an University Of Electronic Science And Technology; Xidian University
Current assignee: Hangzhou Research Institute Of Xi'an University Of Electronic Science And Technology; Xidian University
Priority date: 2023-04-21
Filing date: 2023-04-21
Publication date: 2023-07-14

Abstract

The invention discloses a weight network reasoning method based on a dynamic entity prototype, which aims at the problems that the source of expanded data for word embedding in named entity recognition is less, only high-dimensional characteristics of entity words are focused, but semantic information, context information inaccuracy and model generalization performance of entity words are ignored. The invention carries out preprocessing operation on the entity identification data set; designing a prototype extraction algorithm, and training a model capable of generating a dynamic prototype; initializing all parameters of an entity identification algorithm; deploying the extracted dynamic entity prototype into an entity recognition algorithm, and performing iterative learning from the text and the entity prototype by using a weight network and a multi-head attention mechanism to obtain a trained entity recognition model; and predicting the entities of different data sets by using the trained entity identification model, and testing the performance of the model. The technology can solve the problems that semantic information is not effectively utilized in entity identification, generalization performance is poor and the like, thereby improving accuracy of named entity identification.

Description

Weight network reasoning method based on dynamic entity prototype

Technical Field

The invention relates to the field of natural language processing, in particular to a weight network reasoning method based on a dynamic entity prototype.

Background

Named entity recognition is an important task in the field of natural language processing, whose goal is to identify entities from text that have a particular meaning, such as person names, place names, organization names, dates, currencies, and the like. Named entity recognition is one of basic tasks of text mining and information extraction, and has important significance for natural language understanding, machine translation and other applications. However, due to the problems of huge entity word number, difficult resolution of ambiguous words, difficult determination of entity type boundaries and the like of the text, the recognition accuracy of named entity recognition is low.

The existing entity recognition technology mainly comprises the technologies of expanding word embedded representation, using complex network structure, fusing context information and the like. The extended word embedding representation technology represented by ace+fine-tune, ELMo, etc. increases the word information amount by introducing a dictionary to increase the dimension of word embedding representation. The drawback of this type of approach is that extensive word embedding requires a large amount of linguistic data and requires a new addition of vector dimensions in the original dimensions, which results in higher computational complexity. Entity features are learned using complex network structures represented by BINDER, baseline +bs, etc., to map entities to a high-dimensional vector space. The disadvantage of this type of method is that the semantic information contained in the text entity words themselves is not fully utilized, which reduces the interpretability of the model due to the black box nature of the neural network model. The fusion context information technology represented by ConNER and CL-L2 takes the relationship between an entity and a text as a characteristic, so that the information of the entity and the text is fused. A drawback of this approach is that inaccuracy of the context information can lead to reduced accuracy and robustness of the impact model.

Prototypes are typically used to represent data at a center point or representative element of a vector space, and may be used to describe the nature and degree of differentiation of categories, models, or concepts. By using a prototype learning method, the neural network can learn data features better.

Therefore, how to fully utilize the prototype learning method provides a named entity recognition method which can overcome the problems of insufficient language data, insufficient semantic information utilization and the like and has better performance, and is a hot spot problem under discussion of the technicians in the field.

Disclosure of Invention

Aiming at the technical defects of named entity identification, the invention provides a weight network reasoning method based on a dynamic entity prototype, which has higher named entity identification performance, less required expansion data, more accurate and rich fused semantic information and stronger generalization performance.

The technical solution of the invention is to provide a weight network reasoning method based on dynamic entity prototype, which comprises the following steps: the method comprises the following steps of 1, preprocessing an entity identification data set; step 2, designing a prototype extraction algorithm, and training to generate a model of a dynamic prototype; step 3, initializing all parameters of an entity identification algorithm; step 4, deploying the extracted dynamic entity prototype into an entity recognition algorithm, and performing iterative learning from the text and the entity prototype by using a weight network and a multi-head attention mechanism to obtain a trained entity recognition model; and 5, predicting the entities of different data sets by using the trained entity identification model, and testing the performance of the model.

Preferably, the step 1 comprises the following sub-steps:

step 1.1, carrying out reconstruction operation on an entity identification data set, extracting entity words in the entity data set, positions where the entity words appear and types of entity word labels in the entity data set by using a mode of circularly reading file contents, and storing the extracted contents into a buffer zone of a system, so that the reconstruction of a subsequent data set is facilitated;

and 1.2, dividing sentences with a plurality of entity words into a plurality of sentences with only one entity word in each sentence according to the extracted related information of the entity words by the data set reconstruction method to obtain a prototype data set, wherein the prototype data set utilizes the relation between the sentences and the entity words to enable the embedded representation of the sentences to only contain the semantic information of one entity word.

Preferably, the step 2 comprises the following sub-steps:

step 2.1, encoding texts in the prototype data set by using an encoder to obtain embedded representation vectors of sentences as follows:

Seq _prototype ＝[CLS,e ₁ ,e ₂ ,e ₃ ,,…,e _i ,SEP]

wherein Seq is _prototype The method is characterized in that in the calculation of the integral vector representation of a sentence passing through an encoder in a dynamic entity prototype, CLS and SEP are respectively sentence start marks and end marks added into the sentence automatically by the encoder in the process of encoding the sentence into the sentence embedded representation by the encoder, and e _i A vector representation representing an i-th word of the sentence, i representing the length of the sentence;

and 2.2, extracting the characteristics of the whole vector representation of the sentence in the prototype data set by using a BiLSTM network, wherein the calculation formula is as follows:

[h _CLS ,h ₁ ,h ₂ ,h ₃ ,…,h _i ,h _SEP ]＝BiLSTM(Hideen,Seq _prototype )

wherein h is _i Representing vectors in the BiLSTM output result, the embedding corresponding to a single word in the prototype dataset sentence represents word embedding vectors after context feature extraction, h _CLS And h _SEP Output results of the sentence start mark CLS and the sentence end mark SEP in the BiLSTM after the context feature extraction are respectively obtained, wherein Hideen is the hidden layer state of the BiLSTM;

step 2.3, after the sentence text of the prototype dataset is encoded by the encoder, the first dimension tensor of the sentence embedded representation is h _CLS Selecting h _CLS As a representation of an entity in a sentence, processing the BiLSTM output result in parallel;

step 2.4, obtaining representations of all entities in the prototype data set, and carrying out average operation on all representations of the entities in the same category to obtain entity prototypes in corresponding categories;

step 2.5, setting a loss function, clustering and classifying the prototypes of the entities, wherein the formula of the loss function is as follows:

Loss＝log(2-sim(prototype ₁ ,h _CLS1 ))+log (1+sim(prototype ₁ ,h _CLS2 ))

wherein sim is a formula for calculating similarity between an entity prototype and an entity representation, prototype ₁ And, h _CLs1 Entity prototypes and entity representations respectively representing the same class, prototype ₁ And, h _CLs2 Respectively representing entity prototypes and entity representations of different categories;

step 2.6, setting an optimizer as AdamW, setting parameters lr and weight_decay of the optimizer as 0.0001 and 0.0005 respectively, optimizing a loss function, and performing iterative training on the model;

and 2.7, storing and extracting a model of the entity prototype.

Preferably, the step 3 comprises the following sub-steps:

step 3.1, setting a training and testing data set as an entity identification data set;

step 3.2, initializing all parameters of the entity identification network, including the total number of training rounds P of the entity identification algorithm, learning rate learning_rate, head number of multi-head attention mechanism attention_num, embedding vector size of entity and relation, embedding vector size of BiLSTM and BiGRU network hidden layer dimension hidden_size, biLSTM and BiGRU network hidden layer number hidden_num and batch size of training samples.

Preferably, the step 4 comprises the following sub-steps:

step 4.1, in the entity identification training, dividing the size of batches according to input data during training, carrying out data reconstruction on the data of the input entity data sets in each batch to obtain prototype data sets of corresponding batches, generating a dynamic entity prototype corresponding to the input data by using the prototype data sets, and deploying the entity prototype into an entity identification network as a key K and a value V in a multi-head attention mechanism;

step 4.2, using an encoder to encode the text in the entity recognition dataset, and obtaining an embedded representation vector as follows:

Seq _entity ＝[CLS,e ₁ ,e ₂ ,e ₃ ,,…,e _i ,SEP]

wherein seq is _entity Is the integral vector representation of sentences encoded by an encoder in a training entity recognition network, CLS and SEP are respectively a sentence start mark and an end mark added by the encoder in an encoding operation, and e _i A vector representation representing an i-th word of the sentence, i representing the length of the sentence;

step 4.3 extraction of Seq Using IDCNN _entity Setting kernel_size and filters in IDCNN to 3 and 64, respectively, the calculation formula of IDCNN is as follows:

Seq _IDCNN ＝IDCNN(Seq _entity )

step 4.4, step q _IDCNN And Seq _entity Phase splicing, the splicing formula is as follows:

wherein Seq is _joint Is the vector after the splicing of the two vectors,

representing dimension splicing operation;

step 4.5, calculating a loss value of the BiLSTM network;

step 4.5.1, step q _joint The vector is input into the BiLSTM network, and the calculation formula is as follows:

Seq _BiLsTM ＝BiLSTM(Hideen,Seq _joint )

wherein Hideen represents the hidden state of the BiLSTM network, seq _BiLSTM Vectors output for BiLSTM;

step 4.5.2 Using Seq _BilLSTM As query Q in the attention mechanism, use dynamic entity prototype as attentionK and V in the force mechanism are calculated as follows:

Seq _Attention ＝SoftMax(Q×K ^T )×V

wherein Seq is _Attention Representing the output result of multi-head attention, K ^T Representing the transpose of K, x representing the matrix multiplication operation;

and 4.5.3, outputting the result of the calculation of the multi-head attention mechanism to a CRF layer, and calculating the loss value of the model based on BiLSTM, wherein the calculation formula is as follows:

Loss _BilSTM ＝CRF(Seq _Attention ,tags,MASK)

wherein Loss is _BiLsTM Is the calculated BiLSTM-based loss value, tags represent the actual tags obtained from the entity identification dataset, MASK is the MASK made to the dataset;

step 4.6, calculating a loss value of the BiGRU network;

step 4.6.1, step q _joint The vector is input into the BiGRU network, and the calculation formula is as follows:

Seq _BiGRU ＝BiGRU(Hideen,Seq _joint )

wherein Hideen represents the hidden state of the BiGRU network, seq _BiGRU A vector output for biglu;

step 4.6.2 Using Seq _BiGRU As query Q in the attention mechanism, using the dynamic entity prototype as K and V in the attention mechanism, the calculation formula is as follows:

Seq _Attention ＝SoftMax(Q×K ^T )×V

step 4.6.3, outputting the result of the calculation of the multi-head attention mechanism to the CRF layer, and calculating the loss value of the model based on biglu, wherein the calculation formula is as follows:

Loss _BiGRU ＝CRF(Seq _Attention ,tags,MASK)

wherein Loss is _BiGRU Is the calculated loss value based on BiGRU, and tags representsThe actual tag obtained from the entity identification dataset, MASK being the MASK made to the data;

and 4.7, calculating the proportion between the loss values of the BiLSTM and the BiGRU network as the weight values of the BiLSTM and the BiGRU network, wherein the calculation formula is as follows:

wherein Weight is _BiLSTM And Weight _BiGRU The weight values of the BiLSTM and BiGRU networks are respectively corresponding;

and 4.8, optimizing the network by using the weight values of the BiLSTM and BiGRU networks, wherein the optimized formula is as follows:

Network _optimize ＝Weight _BiLSTM *Network _BiLSTM +Weight _BiGRU *Network _BiGRU

wherein Network is provided with _BiLsTM And Network (N) _BiGRU Representing BiLSTM and BiGRU Network structures, respectively, network _optimize Representing the optimized network structure, representing multiplication operation, calculating a weight value by using the loss function value, and optimizing the whole network by using the dynamic weight value according to a calculation formula, wherein the weight corresponding to the network with the large loss function value is small and dynamically changes along with the loss function;

step 4.9, inputting the input vector into the optimized network;

step 4.10, inputting the calculation result of the optimized network into a multi-head attention mechanism;

and 4.11, outputting the calculated result of the multi-head attention mechanism to a CRF layer, calculating the F1 fraction of the model, and storing the model.

Preferably, said step 5 comprises the following sub-steps:

step 5.1, reconstructing data sets of different entity identification data sets to generate a prototype data set;

step 5.2, inputting the prototype data set into an extracted entity prototype network, and calculating a corresponding dynamic entity prototype;

step 5.3, inputting the entity identification data set and the corresponding dynamic entity prototype into the entity identification network after training to predict the entity;

and 5.4, evaluating the predicted performance of the entity identification network by using the evaluation index Predict, recall, F1 score.

Compared with the existing named entity recognition technology, the weight network reasoning method based on the dynamic entity prototype has the following advantages:

1. entity recognition is performed by using a dynamic entity prototype method. An entity prototype, i.e., a vector representation of a certain type of entity in a high-dimensional space, a prototype-extraction network maps different types of entity words into dissimilar vectors, and the same type of entity words into similar vectors. In the training process, a dynamic entity prototype is generated by dividing the data set in batches. The method solves the problem that only entity words are focused in the traditional entity recognition, but full-text information is not focused. The original entity data set is only required to be reconstructed by prototype extraction, so that the requirement on an additional dictionary is greatly reduced, and in addition, the method of using the prototype for entity recognition is adopted, so that the interpretability of the entity recognition task is improved.

2. And dynamically generating an entity prototype. In an entity recognition network, prototype data sets are reconstructed from the number of batches of entity recognition data sets used in each training, so that the generated entity prototypes are dynamically changed in different batches of training entity recognition networks. The method of dynamic prototyping is used, the local context information is effectively utilized, meanwhile, the information which is not learned by the model but is easy to distinguish as noise is not introduced, and the dynamic prototyping can enable the entity identification network parameters to dynamically change along with the dynamic prototyping, so that the risk of model overfitting is reduced.

3. A weight neural network. And calculating the final loss function value of the BiLSTM and BiGRU networks, and calculating a weight value through the value, wherein the weight value dynamically changes along with input data and model parameters in each training, so that an optimized network with better performance than the BiLSTM and BiGRU networks is obtained. Unlike the method using reinforcement learning, the use of this method does not require a lot of computational resources to be expended while obtaining an optimized network model.

Drawings

FIG. 1 is a schematic flow diagram of an implementation of the present invention;

FIG. 2 is a schematic diagram of iterative learning of a weight network interacting with a dynamic prototype in accordance with the present invention;

FIG. 3 is a graph of experimental results of the present invention;

fig. 4 is a tabular diagram of the models referenced by the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

The invention is further described with reference to the drawings and detailed description which follow: as shown in the figure, the main technical idea in this embodiment is: the entity dataset is reconstructed into a prototype dataset such that an entity in the dataset appears in only one sentence. And designing a prototype clustering algorithm to ensure that the vector similarity of the entity prototypes of different types in a high-dimensional space is low, and the vector similarity of the entity prototypes of the same type in the high-dimensional space is high. The model is trained by using a prototype clustering algorithm, so that the model can output entity prototypes with good clustering effect. The batch of data entered into the entity identification network is partitioned using batch_size, whereby corresponding entity prototypes can be dynamically generated. Inputting the entity identification data set and the generated dynamic entity prototype into a weight network, learning richer semantic information from the dynamic prototype by utilizing a multi-head attention mechanism, calculating the weight value of the network, and optimizing the parameters of the whole network to obtain the trained entity identification network. The performance of the trained entity recognition network is evaluated using the index Predict, recall, F1 score.

As shown in fig. 1, the steps for embodying the present invention are as follows.

Step one, preprocessing an entity identification data set;

step 1.1, in the existing entity recognition data set, a plurality of entity words exist in a sentence, so that the embedded representation of the sentence after being encoded by an encoder contains semantic information of the plurality of entity words, and especially under the condition that the categories of the plurality of entity words are different, the embedded representation of the sentence easily contains complex and invalid semantic information, and the characteristics of inaccurate semantic information of the prototype, large introduced noise and the like on the extracted prototype are not beneficial to the extraction of the entity prototype. Therefore, the original data set is firstly subjected to reconstruction operation, the entity words in the entity data set, the positions where the entity words appear and the types of the entity word labels are extracted in a mode of circularly reading the file content, and the extracted content is stored in a buffer zone of the system, so that the reconstruction of the subsequent data set is facilitated. The entity data set is converted into a prototype data set through reconstruction of the entity data set, and the prototype data set is characterized in that each sentence has only one entity word, so that the noise-free interference of sentence semantic information in the prototype data set is ensured;

and 1.2, dividing sentences with a plurality of entity words into a plurality of sentences with only one entity word in each sentence according to the extracted related information of the entity words by using the data set reconstruction method, and fully utilizing the relation between the sentences and the entity words to enable the embedded representation of the sentences to only contain the semantic information of one entity word, thereby being beneficial to maximizing the prototype representation representing the entity words of the sentences. Let sentence with seven words length be represented as o, o, entity containing multiple entities in entity data set ₁ ,o,entity ₂ ,o,o]Where o represents a word in the sentence that is not an entity, entity ₁ And entity ₂ Respectively representing the entity words in the sentence. During division, the entity ₁ The distance between sentences separated from the representative entity words is from the beginning word to the entity ₂ Before appearanceThus identifying sentences o, o, entity in the dataset for the entity ₁ ,o,entity ₂ ,o,o]The result after reconstruction is [ o, o, entity ] ₁ ,o]And [ o, entity ] ₂ ,o,o]Two sentences are reconstructed according to the method, a single sentence containing a plurality of entity words is segmented, and the entity words are prevented from appearing in the sentences at the same time when sentence information is reserved to the maximum extent, so that a prototype data set is finally obtained. Compared with the traditional method for representing a prototype by using the average value of the embedded representation of the entity words, the method reconstructs the entity data set into the prototype data set, and avoids the occurrence of mixed situations of a plurality of entity word meaning information at the data set level.

Step two: designing a prototype extraction algorithm, and training a model capable of generating a dynamic prototype;

step 2.1, generating a dynamic prototype, namely firstly training a network model capable of generating an entity prototype, and then generating the dynamic entity prototype according to batch division trained by an entity data set. Training a network model capable of generating an entity prototype, and encoding sentences in a prototype data set by using an encoder to obtain embedded representations of the sentences, wherein the embedded representations are as follows:

Seq _prototype ＝[CLS,e ₁ ,e ₂ ,e ₃ ,,...,e _i ,SEP]

wherein Seq is _prototype Is the integral vector representation of a single sentence in the prototype data set through an encoder, CLS and SEP are respectively the sentence start mark and the sentence end mark which are automatically added into the sentence by the encoder in the process of encoding the sentence into the sentence embedded representation by the encoder, the encoder distinguishes different sentences by the encoder so as to code the sentence more pertinently, e _i A vector representation representing an i-th word of the sentence, i representing the length of the sentence;

[h _CLS ,h ₁ ,h ₂ ,h ₃ ,...,h _i ,h _SEP ]＝BiLSTM(Hideen,Seq _prototype )

wherein h is _i Representation ofBiLSTM output vector, corresponding to word embedding vector representing the word embedding after context feature extraction for single word in prototype dataset sentence, h _CLS And h _SEP Output results of the sentence start mark CLS and the sentence end mark SEP in the BiLSTM after the context feature extraction are respectively obtained, hideen is a hidden layer state of the BiLSTM, and the hidden layer state is randomly initialized in model starting training;

step 2.3, seq _prototype After BiLSTM features are extracted, the semantic information of the entity words is fused in a BiLSTM output result, and after full training of a model network, the semantic information of the entity words is concentrated and fused in h _CLS In, therefore, the invention selects h _CLS As an entity representation of sentences, h _CLS The semantic information containing entity words can be used as input vector for training prototype, and meanwhile, the prototype data set is obtained by reconstructing the entity data set, and the sentence has only one entity, so h is used _CLS The entity representation as a sentence may not introduce noise of other entity words. The encoder encodes the sentences by adding CLS marks at the beginning of each sentence, so that the first dimension vector of the BiLSTM output result is h _CLS By selecting h _CLS As the vector representation of the entity prototype, the computer can process the BiLSTM output result in parallel to quickly obtain the entity representation of each sentence, and compared with the traditional method for calculating the average value of the embedded vector of the entity word as the entity representation, the method has faster calculation speed and less noise;

Loss＝log(2-sim(prototype ₁ ,h _CLS1 ))+log(1+sim(prototype ₁ ,h _CLS2 ))

wherein sim is a function for calculating similarity between an entity prototype and an entity representation, and the method uses linear algorithmA processed cosine similarity function, the value range of the cosine function is changed from [ -1,1]Linearly compress to [0,1]By combining the logarithmic loss function, the model convergence is accelerated, and the occurrence of the overrun condition of the loss function definition domain in the model back propagation process is prevented ₁ And, h _CLS1 Entity prototypes and entity representations respectively representing the same class, prototype ₁ And h _CLS2 Representing entity prototypes and entity representations of different classes, respectively. The log is a logarithmic function based on a natural number e, and the derivative of the log function is larger when the distance between the predicted value and the true value is smaller and smaller when the distance is larger, so that the model can perform supervised clustering on the prototype more quickly and effectively, and the problem of gradient disappearance or gradient explosion can be avoided. The invention uses the logarithmic loss function to train the prototype, so that the entity prototypes of different categories are dissimilar in the vector space, and the entity prototypes of the same category are similar in the vector space. After iterative training, calculating an entity prototype for the obtained entity representation, wherein obvious distinction is made between the entity prototypes;

and 2.7, storing and extracting a model of the entity prototype.

Initializing all parameters of an entity identification algorithm;

step 3.1, setting training data sets and test data sets as entity identification data sets;

step 3.2, initializing algorithm parameters including the total number of training rounds P of the entity recognition algorithm, learning rate learning_rate, head number of attention_num of the multi-head attention mechanism, embedding vector size of entity and relation, biLSTM and BiGRU network hidden layer dimension hidden_size, biLSTM and BiGRU network hidden layer number hidden_num, and batch size of training samples, wherein the parameters of BiLSTM network and BiGRU network are the same, and the specific set values are shown in table 1.

Deploying the extracted dynamic entity prototype into an entity recognition algorithm, and performing iterative learning from the text and the entity prototype by using a weight network and a multi-head attention mechanism to obtain a trained entity recognition model, wherein the specific implementation steps of the invention for realizing the iterative learning by the interaction of the weight network and the dynamic prototype are shown in figure 2;

and 4.1, in entity identification training, dividing the size of batches according to input data during training, and carrying out data reconstruction on the data of the input entity data set in each batch to obtain a prototype data set. For example, the batch of data division is 64, then the entity data set will be divided into 64 parts for data reconstruction, respectively, to obtain 64 corresponding prototype data sets. And generating entity prototypes which change along with the change of input data according to the corresponding relation between different prototype data sets and the input entity data sets, so that the entity prototypes realize dynamic change.

The prior art of using prototype methods for entity identification is to train out prototypes of the whole entity data set, which results in a lack of real-time data changes to the entity prototypes, thereby introducing a lot of noise. According to the method, according to batches of data division, a prototype data set and an entity data set are trained together according to a corresponding relation to obtain a dynamically-changed entity prototype. The entity prototype is dynamic, so that noise introduced by the entity prototype can be reduced, and the entity recognition model can process different types of data in the training and testing stages, so that the generalization capability of the model is improved, the diversity of a data set is increased, the risk of overfitting is reduced, and a better effect is achieved.

Deploying a dynamic entity prototype into an entity recognition network as a key K and a value V in a multi-head attention mechanism;

Seq _entity ＝[CLS,e ₁ ,e ₂ ,e ₃ ,,…,e _i ,SEP]

wherein seq is _entity Is the integral vector representation of sentences encoded by an encoder in a training entity recognition network, and CLS and SEP are respectively the sentence start mark and the sentence end mark added by the encoder in the encoding operationSign e _i A vector representation representing an i-th word of the sentence, i representing the length of the sentence;

step 4.3 extraction of Seq Using IDCNN _entity According to the global feature of the invention, the expansion convolution IDCNN mode is adopted, and the receptive field is increased under the condition of not increasing the model parameters, so that the coverage range and the recognition efficiency of the model are improved, the local and global features of the data are better utilized, the input data contains more abundant information, meanwhile, the noise and redundant information in the input data are effectively restrained, and the robustness of the model is improved. Setting kernel_size and filters in IDCNN to 3 and 64, respectively, the calculation formula of IDCNN is as follows:

Seq _IDCNN ＝IDCNN(Seq _entity )

step 4.4, step q _IDCNN And Seq _entity And the local characteristic information is fused while the global characteristics are reserved, and the splicing formula is as follows:

wherein Seq is _joint Is the vector after the splicing of the two vectors,

representing dimension splicing operation;

step 4.5, calculating a loss value of the BiLSTM network;

Seq _BiLSTM ＝BiLSTM(Hideen,Seq _joint )

hideen is the hidden layer state of BiLSTM, and in model start training, the hidden layer state is randomly initialized, and Seq _BiLSTM Vectors output for BiLSTM;

step 4.5.2 Using Seq _BiLSTM As Q in the attention mechanism, a dynamic entity prototype is used as K and V in the attention mechanism, and the calculation formula is as follows:

Seq _Attention ＝SoftMax(Q×K ^T )×V

Loss _BiLSTM ＝CRF(Seq _Attention ,tags,MASK)

step 4.6, calculating a loss value of the BiGRU network;

Seq _BiGRU ＝BiGRU(Hideen,Seq _joint )

hideen is the hidden layer state of BiGRU, and in model initial training, the hidden layer state is randomly initialized and Seq _BiGRU A vector output for biglu;

step 4.6.2 Using Seq _BiGRU As Q in the attention mechanism, a dynamic entity prototype is used as K and V in the attention mechanism, and the calculation formula is as follows:

Seq _Attention ＝SoftMax(Q×K ^T )×V

Loss _BiGRU ＝CRF(Seq _Attention ,tags,MASK)

wherein Loss is _BiGRU Is the calculated BiGRU-based loss value, tags representing the identification of the dataset from the entityThe acquired real label, MASK, is the MASK made to the data;

and 4.7, extracting the high-dimensional characteristics of the data by using a complex neural network structure in the traditional method, wherein the result is that the trained model is easy to be over-fitted, and the calculation resource consumption is high. According to the invention, different neural networks are connected in parallel according to the ratio of the loss function, so that the result contribution of each network is more balanced, and the overall performance is improved. The loss functions of different neural networks are weighted averaged, wherein the weights are determined by the respective loss function ratios. The method can make the result of each network more balanced in contribution to the overall output, and can avoid the overlarge influence of the output result of a certain network on the overall result, thereby improving the robustness and stability of the model. Meanwhile, the generalization capability of the model can be further improved. Because different neural networks have different characteristics and performances, the adaptation capability of the model to different data can be improved by combining the neural networks, so that the generalization capability of the model is improved. Calculating the specific gravity between the loss values of the BiLSTM and the BiGRU network as the weight values of the BiLSTM and the BiGRU network, wherein the calculation formula is as follows:

wherein Network is provided with _BiLSTM And Network (N) _BiGRU Representing BiLSTM and BiGRU Network structures, respectively, network _optimize Representing the optimizedAnd (3) calculating a weight value by using the loss function value according to the network structure representing multiplication operation, wherein the network with a large loss function value has a small weight according to a calculation formula. The weight changes dynamically along with the loss function, and the dynamic weight value optimizes the whole network;

step 4.9, inputting the input vector into the optimized network;

Fifthly, predicting the entities of different data sets by using the trained entity identification model, and testing the performance of the model;

The invention is further illustrated in the following in connection with experiments:

1. simulation experiment conditions:

hardware environment of the simulation experiment of the invention: intel (R) Xeon (R) Silver 4210CPU@2.20GHz, 64GB memory, RTX 3090GPU, software environment: ubuntu20.04 operating System, python3.8, pytorch1.10.0.

2. Experimental content and results analysis:

the simulation experiment of the invention is to execute entity identification tasks on two public entity identification data sets, namely people daily report 1998 and CCKS2017, and evaluate the performance of the entity identification algorithm of the invention through evaluation indexes Predit, recall, F1. Meanwhile, experimental results in a plurality of papers are cited to be compared with the named entity recognition algorithm provided by the invention, so that the advantages of the algorithm are proved. The cited paper is shown in fig. 4, and the main experimental parameters of the simulation experiment of the present invention are shown in table 1.

Table 1 simulation experiment parameters

Parameter name	Numerical value
		Total number of training wheels P	300
Learning rate	0.0001
		Head number attention_num of multi-head attention mechanism	32
Entity embedding vector size ebedding_size	768
		BiLSTM network hidden layer dimension hidden_size	500
BiLSTM network hidden layer number hidden_num	2
		Batch size of training samples batch_size	64

On the data set people daily report 1998 and CCKS2017, the experimental result of entity identification by using the weight network reasoning method based on the dynamic entity prototype provided by the invention is shown in figure 3. Here, to demonstrate the algorithmic advantages of the present invention, the experimental results of the cited papers on the dataset people daily report 1998, CCKS2017 are compared with the present invention to obtain the experimental results shown in fig. 3. As can be seen from fig. 3, compared with the current conventional entity identification, the present invention obtains better performance on three evaluation indexes Predit, recall, F1 of the data sets of people daily report 1998 and CCKS 2017. Therefore, the algorithm provided by the invention has better performance than the traditional named entity recognition algorithm. The traditional named entity recognition algorithm has the problems that the context information cannot be effectively utilized, the entity semantic information cannot be effectively extracted, the interpretability is lacking and the like. The algorithm provided by the invention carries out entity recognition by taking the dynamic prototype of the calculated entity word as a method, focuses on using the dynamic entity prototype to supplement semantic information in a vector space and simultaneously uses dynamic weight to optimize network parameters, and solves the problems that the traditional model is difficult to use global information, difficult to recognize ambiguous words and the like, so that the recognition performance of the named entity is better than that of the algorithm in the theory.

The experimental result verifies that the invention can complete the task of identifying the named entity with better performance.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A weight network reasoning method based on dynamic entity prototype is characterized in that: comprises

Step 1, preprocessing an entity identification data set;

step 2, designing a prototype extraction algorithm, and training to generate a model of a dynamic prototype;

step 3, initializing all parameters of an entity identification algorithm;

step 4, deploying the extracted dynamic entity prototype into an entity recognition algorithm, and performing iterative learning from the text and the entity prototype by using a weight network and a multi-head attention mechanism to obtain a trained entity recognition model;

and 5, predicting the entities of different data sets by using the trained entity identification model, and testing the performance of the model.

2. The weight network reasoning method based on dynamic entity prototypes of claim 1, wherein: the step 1 comprises the following sub-steps:

3. The weight network reasoning method based on dynamic entity prototypes of claim 1, wherein: the step 2 comprises the following sub-steps:

Seq _prototype ＝[CLS，e ₁ ，e ₂ ，e ₃ ，，…，e _i ，SEP]

wherein Seq is _prototype The method is characterized in that in the calculation of the integral vector representation of a sentence passing through an encoder in a dynamic entity prototype, CLS and SEP are respectively sentence start marks and end marks added into the sentence automatically by the encoder in the process of encoding the sentence into the sentence embedded representation by the encoder, and e _i Representing sentence ithA vector representation of words, i representing the length of the sentence;

[h _CLS ，h ₁ ，h ₂ ，h ₃ ，…，h _i ，h _SEP ]＝BiLSTM(Hideen，Seq _prototype )

Loss＝log(2-sim(prototype ₁ ，h _CLS1 ))+log(1+sim(prototype ₁ ，h _CLS2 ))

step 2.6, setting an optimizer as AdamW, setting parameters 1r and weight decay of the optimizer as 0.0001 and 0.0005 respectively, optimizing a loss function, and performing iterative training on the model;

and 2.7, storing and extracting a model of the entity prototype.

4. The weight network reasoning method based on dynamic entity prototypes of claim 1, wherein: said step 3 comprises the following sub-steps:

step 3.2, initializing all parameters of the entity identification network, including the total number of training rounds P of the entity identification algorithm, learning rate rating_rate, head number of attention_num of the multi-head attention mechanism, embedding vector size of entity and relation, size of embedded vector, biLSTM and BiGRU network hidden layer dimension hidden_size, biLSTM and BiGRU network hidden layer number hidden_num and batch size of training samples.

5. The weight network reasoning method based on dynamic entity prototypes of claim 1, wherein: the step 4 comprises the following sub-steps:

Seq _entity ＝[CLS，e ₁ ，e ₂ ，e ₃ ，，…，e _i ，SEP]

Seq _IDCNN ＝IDCNN(Seq _entity )

wherein Seq is _joint Is the vector after the splicing of the two vectors,

representing dimension splicing operation;

step 4.5, calculating a loss value of the BiLSTM network;

Seq _BiLSTM ＝BiLSTM(Hideen，Seq _joint )

step 4.5.2 Using Seq _BilLSTM As query Q in the attention mechanism, using the dynamic entity prototype as K and V in the attention mechanism, the calculation formula is as follows:

Seq _Attention ＝SoftMax(Q×K ^T )×V

Loss _BiLSTM ＝CRF(seq _Attention ，tags，MASK)

wherein Loss is _BiLSTM Is calculated based on BiLSTM lossThe value tags represents the actual tag obtained from the entity identification dataset, MASK is the MASK made to the data;

step 4.6, calculating a loss value of the BiGRU network;

Seq _BiGRU ＝BiGRU(Hideen，Seq _joint )

Seq _Attention ＝softMax(Q×K ^T )×V

Loss _BiGRU ＝CRF(Seq _Attention ，tags，MASK)

wherein Loss is _BiGRU The loss value based on BiGRU is calculated, tags represent real labels obtained from entity identification data sets, and MASK is a MASK made on the data;

step 4.9, inputting the input vector into the optimized network;

6. The weight network reasoning method based on dynamic entity prototypes of claim 1, wherein: said step 5 comprises the following sub-steps: