CN114169330A - Chinese named entity identification method fusing time sequence convolution and Transformer encoder - Google Patents

Chinese named entity identification method fusing time sequence convolution and Transformer encoder Download PDF

Info

Publication number
CN114169330A
CN114169330A CN202111399845.9A CN202111399845A CN114169330A CN 114169330 A CN114169330 A CN 114169330A CN 202111399845 A CN202111399845 A CN 202111399845A CN 114169330 A CN114169330 A CN 114169330A
Authority
CN
China
Prior art keywords
model
text
layer
convolution
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111399845.9A
Other languages
Chinese (zh)
Other versions
CN114169330B (en
Inventor
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunentropy Education Technology Wuxi Co ltd
Original Assignee
Yunentropy Education Technology Wuxi Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunentropy Education Technology Wuxi Co ltd filed Critical Yunentropy Education Technology Wuxi Co ltd
Priority to CN202111399845.9A priority Critical patent/CN114169330B/en
Publication of CN114169330A publication Critical patent/CN114169330A/en
Application granted granted Critical
Publication of CN114169330B publication Critical patent/CN114169330B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

A Chinese named entity recognition method integrating time sequence convolution and a Transformer encoder belongs to the field of character recognition. First, character and word features are modeled using a flat lattice structure proposed by the predecessor, and absolute position encoding is changed to relative position encoding in a transform encoder to avoid loss of directional information. Secondly, the TCN is used for enhancing the capture of the network model to the position information and acquiring more local context semantic relations. And finally, performing regular constraint on the output distribution of the model by adopting an R-Drop strategy, preventing the model from being over-fitted and improving the generalization capability of the model. Experimental results show that F1 values of the model on the Weibo data set and the MSRA data set respectively reach 61.18% and 94.48%, the model is superior to a traditional model and a reference model, and superiority of the model in Chinese named entity recognition is verified.

Description

Chinese named entity identification method fusing time sequence convolution and Transformer encoder
Technical Field
The invention belongs to the technical field of named entity recognition, and provides a Chinese named entity recognition method integrating a time sequence convolution and a Transformer encoder.
Background
Named entity recognition plays an important role as a basic work of many natural language processing tasks in the fields of complex natural language processing such as event extraction, machine question answering, information retrieval, knowledge graph construction and the like, and has been receiving attention in recent years. Common categories of named entity identification include person name, place name, organization name, time, value, currency, and some proper nouns. Early traditional named entity recognition algorithms were dictionary and rule based methods, and used machine learning to jointly predict entity boundaries and class labels, but this approach is often specific to a particular domain and is inefficient and inflexible. With the improvement of computer performance and the development of deep learning technology, the method based on deep learning is widely applied in the field of natural language processing, and gradually becomes mainstream in the direction of named entity recognition.
The named entity recognition model based on deep learning automatically extracts feature information of a text by using a pre-training word vector without manual extraction, thereby improving the capability of feature expression and data fitting. The Recurrent Neural Network (RNN) is widely applied to named entity tasks due to its advantage in processing sequence class time stream data, and particularly, a Bidirectional Long Short-Term Memory network (BiLSTM) has Bidirectional characterization capability, and can model texts through context information of current words. The model of BiLSTM and CRF (Conditional Random Field) combination proposed by Huang et al [ HUANG Z, XU W, YU K, et al, Bidirective LSTM-CRF mod-els for sequence tagging [ EB/OL ] [2015-08-09]. https:// axiv.org/a-bs/1508.01991. pdf ] is currently the most popular model. The Lattice-LSTM model proposed by Zhang et al [ YUE ZHANG, JIE YANg. Chinese NER using Lattice LSTM [ C ]// In Proceedings of the 56th annular Meeting of the Association for the practical linkage, Stroudsburg, PA: Association for the practical linkage, 2018, Vol (1): 1554. 1564 ] was further improved over the BilSTM model by first using a Lattice structure to combine the model based on character sequence labeling with the model based on word sequence labeling to achieve a significant effect. However, because the loop structure of RNN cannot obtain long-term dependency relationship and cannot be calculated in parallel, which limits the calculation efficiency, many learning models begin to abandon the loop structure and turn into a parallelizable Convolutional Neural Network (CNN) or attention mechanism. Of these, two representative models of interest are the time series Convolution Networks (TCNs) and the transform model proposed by Vaswani et al, which perform in various sequence learning tasks consistently or even better than the results of using RNN models.
The TCN model is a variant of the CNN model, and compared with the Transformer model, the TCN model has a poorer capability of acquiring context information from a text of any length and is a one-way structure, so that the TCN model is less used in a named entity recognition task. However, the TCN gradient is stable, the used memory is low, and the receptive field can be flexibly customized according to different tasks. The Transformer model uses an attention mechanism to construct an encoder-decoder framework, typically only using Transformer's encoder and the decoder using CRF models in the field of named entity recognition. However, the Transformer encoder generally performs on named entity recognition, mainly because a pure self-attention mechanism cannot distinguish position and direction information, which is very important for the Chinese named entity recognition task. Thus, the Transformer encoder incorporates position information into the input, constituting an absolute position code, but it cannot distinguish between words at different positions, i.e. there is no direction information. Hang et al [ YAN H, DENG B, LI X N, et al. TENER: adapting transforming encoder for the purpose of the named entity [ EB/OL ] [2019-12-10]. htt-ps:// axiv.org/abs/1911.04474. pdf ] propose a TENER model in 2019, by adding direction and distance perception to the attention mechanism, and a non-scaling multiplication method, the absolute position coding adopted by the original Transformer encoder is changed into relative position coding, so that the Transformer encoder achieves the effect on the Chinese named entity. Li et al [ XIAOANAN LI, HANG YAN, XIPENG QIU, et al, FLAT: Chin-ese NER using FLAT-Lattice transform [ C ]// In Proceedings of the58th annular Meeting of the Association for the Association In the scientific Ling-uistics, Stroudsburg, PA: Association for the scientific Ling-cs, 2020: 6836-6842 ] proposed FLAT model (Flat-Lattice Transformer) which uses relative position coding on the basis of the TENER model and the Lattice-LSTM model, and introduces word information and Lattice character information using a FLAT lossless structure, have achieved great success In the field of named entity recognition In Chinese. Relative position coding is introduced, so that a Transformer encoder can capture local information among words besides acquiring global characteristics of sentences through a self-attention mechanism. But this relies heavily on position coding which has limited impact and requires manual elaboration, without altering the inherent drawbacks of the Transformer model structure.
The model base based on deep learning is a deep neural network, and when the large-scale deep neural network model is trained, in order to prevent overfitting of the model and improve the generalization capability of the model, regularization technologies such as layer standardization, batch standardization and Dropout are indispensable modules. Among them, Dropout technique is a widely used regularization technique because it only needs to randomly discard a part of neurons in the training process. However, randomly discarding some neurons at a time results in different submodels being generated after each discard, and therefore Dropout operates to a certain extent so that the trained model is a combined constraint of multiple submodels. Based on the randomness brought to the network by the special mode of Dropout, the R-Drop proposed by Liang et al [ LIANG X, WU L, LI J, et al, R-Drop: regulated Drop for neural networks [ EB/OL ] [2021-06-28]. https:// axiv.org/abs/2106.14448. pdf ] acts on the output layer of the model, and further carries out regular constraint on the output predictions of a plurality of submodels, so that the output of the submodels keeps consistent.
In summary, the model based on the transform encoder needs to acquire the direction information of the sequence by introducing relative position coding, but lacks the necessary network model structure to model the local features in the sentence sequence. Meanwhile, as a deep learning model, a Transformer encoder has a large number of parameters, and the problem of overfitting is easy to occur in the training process, so that the model cannot well exert performance on a general data set.
Disclosure of Invention
In view of the above problems, the present invention is directed to a method for identifying a named entity in chinese by fusing a time-series convolution and a transform encoder.
The technical scheme of the invention is as follows:
a Chinese named entity recognition method fusing a time sequence convolution and a Transformer encoder comprises the following steps:
step one, establishing a Transformer-TCN-R-Drop model
The Transformer-TCN-R-Drop model consists of an input layer, an encoding layer and an output layer. The input layer comprises an embedding layer and position codes, the embedding layer adopts a flat lattice structure, and when character vectors are generated, word vectors corresponding to characters are generated simultaneously by combining a dictionary; position coding uses a way of relative position coding of different character or word texts. The encoding layer obtains local and global characteristics of the text through a Transformer encoder and a TCN model, and the ADD operation is adopted to fuse characteristic information captured by the two models, so that a new vector sequence is obtained finally. And the output layer decodes the fused feature vectors by adopting a CRF model to obtain a globally optimal label sequence, and meanwhile, the generalization capability of the model is improved by adopting an R-Drop regularization strategy in the whole training process.
The Transformer encoder is formed by stacking a plurality of encoders, and the structure of each layer of encoder comprises a multi-head self-attention layer, a feedforward network layer, a residual connection and layer standardization.
The TCN model consists of a causal convolution which can be applied to a sequence structure and an expansion convolution and residual module which is applied to memory historical information.
Step two, utilizing the established Transformer-TCN-R-Drop model to identify the Chinese named entity
(1) Potential words corresponding to each character of a sentence are obtained from a dictionary with characters and added into a text sequence;
(2) and converting the final text sequence into a text sequence with a FLAT lattice structure adopted by the FLAT model, wherein X is { X ═ X }1,...,xTT is the length of the text sequence after the text sequence contains the word;
(3) a text sequence is defined as a collection of text segments span, wherein one text segment consists of a text token, a head and a tail. The text represents a character or a word, and the head and the tail represent the position index in the original text sequence of the first character and the last character of the text, respectively, wherein the head and the tail are the same for a single character.
(4) The embedding layer vectorizes each text to represent a matrix
Figure BDA0003370816730000031
dmodelIs the dimension of the text vector.
(5) Encoding the interaction between the text segments defined in step (3) using relative position encoding.
(6) And (4) calculating attention of the text vector matrix vectorized in the step (4) by using a multi-head attention mechanism in a transform encoder, and adding the position code obtained in the step (5) in the calculation process.
(7) Adding output text vector matrixes input after the multi-head attention mechanism is calculated to perform residual connection; meanwhile, layer standardization is adopted for normalization processing.
(8) And (4) outputting the result normalized in the step (7), sending the result into a feedforward network layer, and performing nonlinear conversion by using ReLU activation.
(9) And (5) similarly, residual error connection is carried out on the output after the ReLU activation transformation in the step (8) and the output after the normalization in the step (7), normalization processing is carried out by adopting layer normalization, and the text characteristic A of the Transformer encoder is output.
(10) Local feature information between text vector matrices is obtained using causal convolution and dilated convolution of the TCN.
(11) The TCN regularizes the text vector matrix by adopting a residual error module, one residual error module comprises convolution and nonlinear mapping of two layers, and finally obtained text characteristics B are output.
(12) Using the ADD operation, the two features a and B are fused.
(13) Dropout is added in each layer of the transform encoder and TCN network to regularize the network. And an R-Drop regularization strategy is adopted to avoid the problem of inconsistent output distribution of the training model caused by Drop.
(14) And outputting the label sequence by using a Softmax function and a CRF layer to obtain a final identification result.
The invention has the beneficial effects that:
1. on the premise of not changing the parallelism of the model and capturing the long-distance dependency relationship, a TCN and a Tran-sformer encoder are fused on an encoding layer, the implicit position information is learned by utilizing the advantages of the TCN, more local information among characters and words is captured, and therefore the TCN and the Tran-sformer encoder can better complement the global and partial local information captured by the relative position encoding, and finally extracted vocabulary features are more sufficient.
2. Under the condition of not influencing network neurons or model parameters, after the CRF model of the output layer outputs a prediction result, the R-Drop regularization technology is used for reducing the parameter freedom of the model and improving the robustness of the model. Experimental results prove that on the two types of general data sets, the accuracy, the recall rate and the F1 value of the model provided by the invention are improved under the condition of not depending on external resources and pre-training language models.
Drawings
FIG. 1 is a Transformer-TCN-R-Drop model framework.
Fig. 2 is a text sequence of a flat lattice structure.
FIG. 3 is a transform encoder architecture.
FIG. 4 shows the TCN model structure.
FIG. 5 shows the R-Drop process.
Detailed Description
1. Transformer-TCN-R-Drop model
The overall model framework proposed by the invention is shown in fig. 1 and mainly comprises an input layer, an encoding layer and an output layer.
The input layer comprises an embedding layer and position codes, the embedding layer adopts a flat lattice structure, and when character vectors are generated, word vectors corresponding to characters are generated simultaneously by combining a dictionary; position coding uses a way of relative position coding of different character or word text segments. The encoding layer obtains local and global characteristics of the text through a Transformer encoder and a TCN model, and the ADD operation is adopted to fuse characteristic information captured by the two models, so that a new vector sequence is obtained finally. And the output layer decodes the fused feature vectors by adopting a CRF model to obtain a globally optimal label sequence, and meanwhile, the generalization capability of the model is improved by adopting an R-Drop regularization strategy in the whole training process.
1.1 input layer
1.1.1 Embedded layers
Before the text is sent into the Transformer-TCN-R-Drop model, vectorization representation needs to be carried out, namely each word in the text is mapped into a fixed-length dimension vector, which is the function of an embedding layer. In the named entity recognition task, errors of word segmentation can be well avoided by utilizing dictionary information, such as: the words "Chongqing" and "people and pharmacy" in "Chongqing people and pharmacy" can eliminate the potential erroneous entity "Chongqing people". Therefore, before the text vectorization representation, all potential words corresponding to each character in the text are obtained from a dictionary with characters, and the final text sequence is converted into a text sequence X ═ { X } of a FLAT lattice structure adopted by a FLAT model1,...,xTT is the length of the text sequence after the word is included, as shown in fig. 2. Defining a text sequence X as a set of a plurality of text segments span, wherein one text segment consists of a text token, a head and a tail; the text represents a character or a word, and the head and the tail represent the position indexes of the first character and the last character of the text in the original text sequence respectively, wherein the head and the tail are the same for a single character; finally, each text segment is vectorized and expressed into a matrix
Figure BDA0003370816730000051
dmodelIs the dimension of the text vector.
1.1.2 position coding
To prevent subsequent processing from losing the position information of the text before the word-word vectors are input to the encoder, the Transformer encoder adds an additional position code to represent the absolute position information of each word in the text sequence. But the input text sequence of the invention is flat and consists of text segments of different lengths. The method with reference to the FLAT model therefore introduces a relative position coding in order to encode the interaction between text segments. For two text segments X in a text sequence XiAnd xjThey have three relationships: intersecting, containing and separating. To better represent the relationship between different text segments and the distance between characters and words, a dense vector model is usedTheir relationship is to be drawn:
Figure BDA0003370816730000052
Figure BDA0003370816730000053
Figure BDA0003370816730000054
Figure BDA0003370816730000061
wherein head [ i]、tail[i]Respectively represent xiThe head position and the tail position of the body,
Figure BDA0003370816730000062
denotes xiHead position and xjDistance between head positions, other
Figure BDA0003370816730000063
Have similar meanings. For example: "heavy" and "Chongqing",
Figure BDA0003370816730000064
from the four relative distances, the fact that the word of 'Chong' is in the word of 'Chongqing' and is an inclusion relation can be judged, so that the word of 'Chong' pays more attention to the word of 'Chongqing' in an encoder, and the boundary of an entity can be better identified. The final relative position coding of the text segment is a simple nonlinear transformation after four distance concatenations:
Figure BDA0003370816730000065
wherein, WrIt is the parameter that can be learned that,
Figure BDA0003370816730000066
indicating a splicing operation, PdAccording to Vaswani et al [ YAN H, DENG B, LI X N, et al. TENER: adapting transformerencoder for nano-entry registration [ EB/OL ]].[2019-12-10].htt-ps://arxiv.org/abs/1911.04474.pdf.]And (4) calculating.
Figure BDA0003370816730000067
Figure BDA0003370816730000068
Wherein d represents
Figure BDA0003370816730000069
k is the dimension number of the word-word vector, and the value range is [0, dmodel/2]。
1.2 coding layer
1.2.1Transformer encoder
The main purpose of the Transformer encoder is to calculate the relationship between characters and words in the text, so that the model can learn the relationship between words and the importance degree of each word and each word, thereby acquiring global and local feature information. The Transformer encoder is formed by stacking a plurality of encoders, the structure of each layer of encoder is mainly composed of a multi-head self-attention layer (multi-area self-attention), a Feed-forward network layer (Feed-forward network), a Residual connection (Residual connection) and a layer normalization (layer normalization) which are arranged in a penetrating way, as shown in fig. 3, and the core is the multi-head attention mechanism.
The Transformer encoder is originally to add the position information and the word-word vector and then directly send the added position information to the self-attention layer of the encoder, because the current position information is relative position coding, if the text segments of the relative positions are added with the same position coding, the position relation between the text segments cannot be distinguished, the absolute position coding is abandoned, and the relative position coding and the text vector are separately added into the self-attention layer of the encoder, the invention abandons the absolute position codingThe attention layer is calculated, in other words, when calculating the current position vector, the relative position relation of the text segments depending on the current position vector is considered. The specific operation is that a vector matrix E is inputXWeight matrices W different from threeq、Wk、WvAnd multiplying to obtain three vectors with the same dimensionality, namely a Query vector (Q), a Key vector (K) and a Value vector (V).
[Q,K,V]=EX[Wq,Wk,Wv] (8)
When the attention score is calculated, only the relative position relation between the Query vector and the Key vector is considered, the relative position relation is added into the attention calculation of each layer of the transducer encoder from the attention layer, then the attention Value is obtained by normalizing the attention Value by using a Softmax function, the attention Value is multiplied by the Value vector, and finally the weighted sum of all the text vectors is output. Each character vector contains not only information of other characters but also word information, position information, and distance information:
Aij=(Qi+u)TKj+(Qi+v)TRijWR (9)
Att(A,V)=softmax(A)V (10)
wherein u, v and WRAre learnable parameters. The Transformer encoder is a result of performing attention calculation on a text sequence by a plurality of attention heads without sharing a weight matrix and then connecting the plurality of attention heads.
Multihead(A)=[head1,head2,...,headn]W (11)
Figure BDA0003370816730000071
Wherein i is the number of attention heads, and i takes the value of [1, n]. And then the signal is processed by a feedforward network layer FFN, wherein the FFN is a position multi-layer perceptron with nonlinear transformation and can increase the nonlinear expression capability of the model. Meanwhile, in order to solve the problem of degeneration of deep network training,adding residual connection and layer standardization after a multi-head self-attention layer and a feedforward network layer to finally obtain a new sentence matrix: z ═ Z1,z2,...,zT],
Figure BDA0003370816730000072
1.2.2TCN model
Because the Transformer encoder introduces relative position coding, the extraction of the Transformer encoder on the local characteristic information of the text sequence is solved, but the local information obtained by the position coding is limited, and the Transformer encoder depends heavily on external dictionary information, so that the local information of the vocabulary cannot be learned through the model structure. The TCN model based on convolution can reserve the relative position between vocabularies through convolution kernel, does not need to additionally introduce position coding set manually, and has the capacity of capturing long-term dependency of text sequences and parallel computation. Therefore, the TCN model is added into the encoder, so that the local features of the text can be extracted more flexibly, and the vector information captured by the transform encoder is supplemented.
The structure of the TCN model is shown in fig. 4, and is mainly composed of a causal convolution applicable to a sequence structure and a dilation convolution and residual module applicable to memory of history information. The causal convolution is characterized in that the output length of the text is equal to the input length, and future information is not considered.
Input text matrix EX{x1,x2,...,xTF ═ F }, filter F ═ F1,f2,...,fO) Where O is the size of the convolution kernel at xiThe causal convolution of (a) is:
Figure BDA0003370816730000081
in order to make the output generated by the network the same as the input length, the TCN adopts a one-dimensional full convolution network FCN structure, in which each hidden layer has the same length as the input layer, and the length is padded with zeros to keep the subsequent layer the same as the previous layer. The structure of causal convolution and FCN has a disadvantage in that it is necessary to construct a very deep network or very large filter in order to obtain very long-lived textual information, and thus TCN introduces dilated convolution to obtain long-lived historical information.
The dilated convolution exponentially increases the receptive field by spacing the number of convolution kernels, allowing the output of each convolution to contain a larger range of information, thereby enabling the output to represent a larger range of input features, and capturing longer-range dependencies. Input text matrix EX{x1,x2,...,xTF ═ F for filtering1,f2,...,fO) Then at xiThe dilated convolution of (d) is:
Figure BDA0003370816730000082
where d is the dilation factor, O is the size of the convolution kernel, and i- (O-O) d represents the direction of the history. The dilation convolution thus amounts to introducing a fixed step size between each two adjacent filter taps. By controlling the size of d, the receptive field is widened under the premise of keeping the calculation amount unchanged, and when d is 1, the expansion convolution is degenerated into the common convolution. And finally, adding the identity mapping of cross-layer link in the residual error network by the TCN in order to improve the accuracy. Finally, the output sentence matrix of the model is: b ═ B1,b2,...,bT],
Figure BDA0003370816730000083
The TCN model is mainly used for more flexibly obtaining more local features of an input text sequence through the receptive field of the TCN model, and in the Chinese named entity recognition, the label judgment of each character in a sentence not only considers the overall global features of the sentence, but also considers the local features of surrounding characters and words. Therefore, the text feature vector output by the fused TCN model can enable the text feature output by the model to have richer context semantic information. In order not to increase extra calculation amount, the invention adopts an ADD characteristic fusion strategy to combine a Transformer encoder with a transform encoderAnd fusing text features output by the TCN model to obtain a final text representation matrix H ═ z1+b1),(z2+b2),...,(zT+bT)],
Figure BDA0003370816730000084
1.3 output layer
1.3.1CRF layer
Based on the coding layer, only the word vector containing the context information can be obtained, and even if the word vector and the relative position coding are added, the dependency relationship between the final prediction labels cannot be considered. Therefore, the model adopts a CRF layer to obtain a globally optimal label sequence by considering the adjacent relation between labels. CRF is a discriminant model based on conditional probabilities, assuming that the output of the model, i.e., the input sequence of CRF, is X ═ X (X)1,x2,...,xT) Wherein one possible predicted tag sequence is Y ═ (Y)1,y2,...,yT) Defining the evaluation score s as:
Figure BDA0003370816730000085
wherein the content of the first and second substances,
Figure BDA0003370816730000091
is the escape probability from label i to label j;
Figure BDA0003370816730000092
is the y-th of the characteriThe score of each tag. The probability P that the sequence Y occurs in all possible predicted sequences is:
Figure BDA0003370816730000093
wherein the content of the first and second substances,
Figure BDA0003370816730000094
is a possible predicted sequence observation; y istIs the set of observations for all possible occurrences of the predicted sequence for input sequence X. In the CRF training, a maximum likelihood estimation method is introduced to define a loss function:
Figure BDA0003370816730000095
when predicting the CRF, selecting the candidate label sequence with the maximum probability as a final result according to the trained parameters.
1.3.2R-Drop
Dropout is used in both the transform coder and the TCN model, and the Dropout causes the output distribution of the model to be different each time, so the invention adds R-Drop in the model to constrain the model to keep the output consistent.
The R-Drop process is illustrated in FIG. 5, because the Dropout method randomly discards some neurons in each layer of the neural network, Dropout randomly selects some neurons in each layer of the two models to discard in the multi-headed self-attention layer and feedforward network layer of the transform encoder in the encoder of the global model of the invention, and in the causal convolution and dilation convolution of the TCN model. Because the neurons discarded each time are different, this operation makes the trained global model a combined constraint of multiple submodels (from the same global model), and the combination of models can improve the performance of the global model.
In particular, training data
Figure BDA0003370816730000096
n is the number of training samples, given the input data x for each training stepiX is to beiTwo different output predictions are obtained by the CRF layer through two times of network forward propagation, which are respectively Pθ(yi|xi) And Pθ'(yi|xi). Since Dropout randomly discards some neurons at a time, as shown in FIG. 5, the left output prediction PθDiscarded neurons in each layer and right output prediction Pθ' discarded nerveThe elements are different. Thus for the same input data xiTwo different prediction probabilities are obtained through two different sub-models (from the same overall model). R-Drop regularizes the output distribution of the training model by minimizing the symmetric Kullback-Leibler (KL) divergence between the two prediction probabilities.
Figure BDA0003370816730000097
Plus the cross entropy loss function of the model itself:
Figure BDA0003370816730000098
with the CRF loss function added, the final R-Drop training loss function is:
Figure BDA0003370816730000101
wherein alpha is used to control
Figure BDA0003370816730000102
Coefficient weight of (2). In this way, R-Drop further regularizes the model space beyond Drop, improving the generalization capability of the model. When Dropout is used, the training phase and the prediction phase of the model will be different. During training, the starting of Dropout enables the output of each submodel to be close to the real distribution, namely model averaging; however, at test time, the closing of Dropout causes the model to be averaged only over the parameter space, so there is inconsistency in training and testing. The R-Drop constraint parameter space constrains the output of the submodels in the training process, so that the outputs of different submodels can be consistent, the inconsistency of training and testing is reduced, and the performance of the model is improved after Drop is closed in the testing stage. The algorithm for R-Drop to compute the loss during the training phase is as follows:
Figure BDA0003370816730000103
2. experiment and result analysis
In order to verify the effectiveness of the Transformer-TCN-R-Drop Chinese named entity recognition model provided by the invention, experiments are carried out on two different types of Chinese named entity recognition universal data sets, namely a Weibo data set and an MSRA data set. And the accuracy, the recall rate and the F1 value are used as main evaluation indexes to ensure the correctness and consistency of the experimental result, so that the effect of the model is verified.
2.1 evaluation index
The invention adopts commonly used indexes in named entity recognition tasks, such as precision rate (P), recall rate (R) and F1 value (F) as evaluation indexes of model performance, and a calculation formula of each index is as follows:
Figure BDA0003370816730000111
Figure BDA0003370816730000112
Figure BDA0003370816730000113
wherein, TP represents the number of entities correctly predicted by the model, FP represents the number of entities predicted by the model but predicted wrongly, and FN represents the number of entities actually predicted by the model but not predicted by the model.
2.2 Experimental data
The Weibo dataset and the MSRA dataset used in the experiment are public Chinese named entity recognition universal datasets. The Weibo dataset is a social media class dataset that contains four classes of entities, political, person, place, and organization names. The MSRA dataset is a microsoft published dataset that contains three types of entities, a person name, a place name, and an organizational name. The detailed statistics of the two data sets are shown in table 1.
Table 1 data set statistics
Figure BDA0003370816730000114
2.3 Experimental Environment and parameter settings
An experimental model is built based on an open source natural language processing framework FastNLP provided by the university of Compound Dan, the specific experimental environment is shown in Table 2, and the hyper-parameters adopted by the model in the experiment are shown in Table 3.
TABLE 2 Experimental Environment
Figure BDA0003370816730000115
Figure BDA0003370816730000121
TABLE 3 model hyper-parameter settings
Figure BDA0003370816730000122
The performance of the model is sensitive to the learning rate and the value of alpha, the dimensions of hidden layers set by the microblog data set and the MSRA data set are different, the microblog data set is 128, and the MSRA is 160. Through multiple times of training, a group of super parameters with good effect is obtained, wherein the learning rate of 0.003 is selected to ensure the stability of the training, Dropout of each module is set, Dropout of a character and word combination part is set to be 0.5, Dropout of a CRF output layer is set to be 0.3, and a super parameter coefficient alpha of R-Drop is set to be 3.
2.4 results and analysis of the experiment
2.4.1 comparison results and analysis of the model itself
To verify the effectiveness of the TCN model and R-Drop, ablation tests were performed on the Weibo and MSRA datasets, respectively, and compared to the traditional model (BilSTM) and the baseline model (FLAT), using the F1 value as an evaluation index. The reference model is based on a relative position coding adopted by a Transformer encoder, and a TCN model and an R-Drop are not fused. As shown in Table 4, both the Weibo and MSRA data sets were less effective in the FLAT model than the model with the TCN structure added. The F1 values were boosted to different degrees across the two different types of data sets and different depth models, regardless of whether the models were introduced R-Drop alone or added TCN and R-Drop simultaneously.
Table 4 model ablation experimental results
Figure BDA0003370816730000131
After the introduction of the two modules, the model based on the Transformer-TCN-R-Drop is significantly improved over the baseline model, and the F1 values are 4.43% and 2.61% higher than the conventional model and 0.86% and 0.36% higher than the baseline model on both data sets, respectively. Therefore, the model provided by the invention is proved to be effective, the context information acquired by the Transformer is more effective due to the local information acquired by the TCN, and the capability of acquiring long-distance dependency and parallelization calculation is reserved. The R-Drop carries out regular constraint on the output of the submodel generated due to Dropout randomness through KL divergence, the freedom degree of parameters is reduced, and therefore the generalization capability of the model is enhanced. Therefore, the advantages of the Transformer-TCN-R-Drop are combined, and the overall performance of the model is improved.
2.4.2 comparison results and analysis of different models
In order to verify the effectiveness of the Transformer model fusing the TCN and the R-Drop, a comparison experiment of the model and the current mainstream model is carried out on two kinds of data sets, and the comparison indexes are the values of the precision rate P, the recall rate R and the F1. Table 5 is a comparison of different models on the Weibo dataset and table 6 is a comparison of different models on the MSRA dataset listing the experimental results for each model on both types of dataset separately. The invention selects the comparative model as follows:
1) the Lattice LSTM model improves the performance of chinese named entity recognition by encoding and matching words in a lexicon. But cannot capture long-distance dependency and there is a certain loss of information.
2) The CAN-NER model, the LR-CNN model and the ID-CNN model are the best methods for enhancing word information by using a convolution model so as to improve the model identification performance at present, but have information loss to a certain degree like the Lattice LSTM.
3) The CGN model is used for enhancing the recognition effect of the Chinese named entity by capturing dictionary word information in an all-round way based on a cooperative graph neural network, but needs an RNN as a bottom encoder to capture the orderliness of sentences.
4) The Transformer + relalativateposition + CRF model, the PLT model and the FLAT model are all based on character and word information enhanced by a Transformer encoder. The first model and the FLAT model are both the operation mode that the position coding of the Transformer model is changed from absolute position to relative position and the attention is modified; the PLT model introduces a porous mechanism to enhance local modeling and maintain the ability to capture long-term dependencies.
TABLE 5 comparison of different models on Weibo data set
Figure BDA0003370816730000141
Table 6 comparison of different models on MSRA dataset
Figure BDA0003370816730000142
Figure BDA0003370816730000151
As can be seen from the comparison of the comprehensive experiments in tables 5 and 6, the Transformer model fused TCN and R-Drop has improved accuracy, recall rate and F1 value compared with other models.
1) The CAN-NER, LR-CNN and ID-CNN-CRF models are compared with the Lattice LSTM model, the model effect of the added convolution module is higher than that of the Lattice LSTM model, the enhancement of the convolution on local vocabulary information and the learning of implicit position information are explained, and context semantic information and relation of more words and phrases CAN be obtained.
2) The transform + relativeposition + CRF models, PLT models and FLAT models, compared to other models, have been found to extract features more strongly from the transform encoder that is coded instead in relative position than from LSTM, convolutional and graph networks. The improved multi-head self-attention mechanism can capture the dependency relationship among core entities, selectively focus on high-value information among vocabularies, and accordingly more highly-relevant vocabulary information in the input is considered in the output process.
3) And comparing the Transformer-TCN-R-Drop model with all models, wherein after the TCN model and the R-Drop are added, the three indexes of the model are all higher than those of the comparison model. Compared with the best model, on the Weibo data set, the accuracy rate is improved by 0.86%, the recall rate is improved by 4%, and the F1 value is improved by 0.86%; on the MSRA data set, the accuracy rate is improved by 0.76%, the recall rate is improved by 1.65%, and the F1 value is improved by 0.36%. The TCN is fully proved to be more fully extracted from the input vocabulary features through the local features dynamically captured by the convolution receptive field and the new features generated after the local features are fused with the features captured by the transform coder, and the method is more beneficial to the classification of the labels. Meanwhile, the R-Drop regularization strategy controls the degree of freedom of the model parameters and is complementary with a Drop out method, so that overfitting of the model is better prevented, the generalization capability of the model is improved, and the recognition capability of the model on different data sets is improved.
The invention provides a TCN-based Chinese named entity recognition model based on a Transformer encoder, which overcomes the problems of position and direction information loss, model structure defect and the like of an original Transformer encoder, and fully utilizes the directionality of relative position encoding, the capability of capturing character-word characteristics by a self-attention mechanism and the capability of extracting local information of wild sentences through a TCN model. And meanwhile, the robustness of the model is improved by the introduced R-Drop strategy. Experimental results show that the method provided by the invention has better effect on two Chinese data sets than the former model, and verifies the effectiveness of the change. In subsequent work, other external information such as character structures, sememes and the like is considered to be introduced, so that the model is further optimized, and the recognition capability of the model is improved.

Claims (6)

1. The method for identifying the Chinese named entity by fusing the time sequence convolution and the Transformer encoder is characterized by comprising the following steps of:
step one, establishing a Transformer-TCN-R-Drop model
The Transformer-TCN-R-Drop model consists of an input layer, an encoding layer and an output layer; the input layer comprises an embedding layer and position codes, the embedding layer adopts a flat lattice structure, and when character vectors are generated, word vectors corresponding to characters are generated simultaneously by combining a dictionary; the position coding adopts a mode of carrying out relative position coding on different character or word texts; the encoding layer obtains local and global characteristics of the text through a Transformer encoder and a TCN model, and fuses characteristic information captured by the two models by adopting ADD operation to finally obtain a new vector sequence; the output layer decodes the fused feature vectors by adopting a CRF model to obtain a globally optimal label sequence, and meanwhile, the generalization capability of the model is improved by adopting an R-Drop regularization strategy in the whole training process;
the Transformer encoder is formed by stacking a plurality of encoders, and the structure of each layer of encoder comprises a multi-head self-attention layer, a feedforward network layer, a residual connection and layer standardization;
the TCN model consists of a causal convolution which can be suitable for a sequence structure and an expansion convolution and residual module which is suitable for memorizing historical information;
step two, utilizing the established Transformer-TCN-R-Drop model to identify the Chinese named entity
(1) Potential words corresponding to each character of a sentence are obtained from a dictionary with characters and added into a text sequence;
(2) and converting the final text sequence into a text sequence with a FLAT lattice structure adopted by the FLAT model, wherein X is { X ═ X }1,...,xTT is the length of the text sequence after the text sequence contains the word;
(3) defining a text sequence as a set of a plurality of text segments span, wherein one text segment is composed of a text token, a head and a tail; the text represents a character or a word, and the head and the tail represent the position indexes of the first character and the last character of the text in the original text sequence respectively, wherein the head and the tail are the same for a single character;
(4) the embedding layer vectorizes each text to represent a matrix
Figure FDA0003370816720000021
dmodelIs a dimension of a text vector;
(5) encoding interactions between the text segments defined in step (3) using relative position encoding;
(6) calculating attention of the text vector matrix vectorized in the step (4) by using a multi-head attention mechanism in a transform encoder, and adding the position code obtained in the step (5) in the calculation process;
(7) adding output text vector matrixes input after the multi-head attention mechanism is calculated to perform residual connection; meanwhile, layer standardization is adopted for normalization processing;
(8) outputting the result normalized in the step (7) and then sending the result into a feedforward network layer, and performing nonlinear conversion by using ReLU activation;
(9) similarly, residual error connection is carried out on the output after the ReLU activation transformation in the step (8) and the output after normalization in the step (7), layer normalization is adopted for normalization processing, and text characteristics A of a transform encoder are output;
(10) obtaining local characteristic information between text vector matrixes by using a causal convolution and an expansion convolution of the TCN;
(11) the TCN regularizes the text vector matrix by adopting a residual error module, one residual error module comprises convolution and nonlinear mapping of two layers, and finally obtained text characteristics B are output;
(12) fusing the two characteristics of A and B by using an ADD operation;
(13) dropout is added to each layer of the transform encoder and TCN network to regularize the network; an R-Drop regularization strategy is adopted to avoid the problem of inconsistent output distribution of a training model caused by Drop;
(14) and outputting the label sequence by using a Softmax function and a CRF layer to obtain a final identification result.
2. The method for Chinese named entity recognition by merging sequential convolution and transform encoder as claimed in claim 1, wherein the specific process from step (6) to step (9) is as follows:
inputting the vector matrix E vectorized in the step (4)XWeight matrices W different from threeq、Wk、WvMultiplying to obtain three vectors with the same dimensionality, namely a Query vector Q, Key vector K and a Value vector V:
[Q,K,V]=EX[Wq,Wk,Wv]
when the attention score is calculated, only the relative position relation between a Query vector and a Key vector is considered, the relative position relation is added into the attention calculation of each layer of the transducer encoder from the attention layer, then the attention Value is obtained by normalizing the attention Value by using a Softmax function, the attention Value is multiplied by a Value vector, and finally the weighted sum of all text vectors is output; each character vector contains not only information of other characters but also word information, position information, and distance information:
Aij=(Qi+u)TKj+(Qi+v)TRijWR
Att(A,V)=softmax(A)V
wherein u, v and WRIs a learnable parameter; a. theijIs the attention score, R, of a single textijIs a relative position-coding matrix and,
the Transformer encoder is used for performing attention calculation on a text sequence by a plurality of attention heads on the premise of not sharing a weight matrix, and then connecting results of the plurality of attention heads;
Multihead(A)=[head1,head2,...,headn]W
Figure FDA0003370816720000031
wherein i is the number of attention heads, and i takes the value of [1, n](ii) a Then, the signal is processed by a feedforward network layer FFN, wherein the FFN is a position multilayer perceptron with nonlinear transformation and can increase the nonlinear expression capability of the model; meanwhile, in order to solve the problem of deep network training degradation, residual error connection and layer standardization are added after a multi-head self-attention layer and a feedforward network layer, and a new sentence matrix is finally obtained: z ═ Z1,z2,...,zT],
Figure FDA0003370816720000032
3. The method for Chinese named entity recognition by fused time series convolution and Transformer encoder as claimed in claim 1, wherein the specific process from step (10) to step (11) is as follows:
the causal convolution is characterized in that the output length of the text is equal to the input length, and future information is not considered; input text matrix EX{x1,x2,...,xTF ═ F }, filter F ═ F1,f2,...,fO) Where O is the size of the convolution kernel at xiThe causal convolution of (a) is:
Figure FDA0003370816720000041
in order to make the output generated by the network the same as the input length, the TCN adopts a one-dimensional full convolution network FCN structure, wherein the length of each hidden layer is the same as that of the input layer, and the length adopts zero padding to keep the length of the subsequent layer the same as that of the previous layer;
the dilated convolution exponentially increases the field of view by spacing the number of convolution kernels such that the output of each convolution contains a larger range of information, thereby enabling the output to represent a larger range of input bitsCharacterizing, and capturing longer distance dependencies; input text matrix EX{x1,x2,...,xTF ═ F for filtering1,f2,...,fO) Then at xiThe dilated convolution of (d) is:
Figure FDA0003370816720000042
where d is the dilation factor, O is the size of the convolution kernel, and i- (O-O) d represents the direction of the history; therefore, the dilation convolution amounts to introducing a fixed step size between every two adjacent filter taps; by controlling the size of d, the receptive field is widened on the premise of keeping the calculated amount unchanged, and when d is 1, the expansion convolution is degenerated into common convolution; finally, in order to improve the accuracy rate, the TCN adds the identity mapping of cross-layer link in the residual error network; finally, the output sentence matrix of the model is: b ═ B1,b2,...,bT],
Figure FDA0003370816720000043
4. The method for Chinese named entity recognition by fusing sequential convolution and Transformer encoder as claimed in claim 1, wherein the step (12) is to use an ADD feature fusion strategy to fuse the text features outputted from the Transformer encoder and TCN model to obtain a final text representation matrix H ═ z [ (z ═ z) as a result of the fusion of the text features1+b1),(z2+b2),...,(zT+bT)],
Figure FDA0003370816720000044
5. The method for Chinese named entity recognition by merging time series convolution and Transformer encoder as claimed in claim 1, wherein the specific process of step (13) is:
training data
Figure FDA0003370816720000051
n is the number of training samples, given the input data x for each training stepiX is to beiTwo different output predictions are obtained by the CRF layer through two times of network forward propagation, which are respectively Pθ(yi|xi) And Pθ'(yi|xi) (ii) a Since Dropout randomly discards some of the neurons at a time, the neurons discarded at each time are different, resulting in the same input data xiObtaining two different prediction probabilities through the same model; the R-Drop regularizes the output distribution of the training model by minimizing the symmetric KL divergence between the two prediction probabilities;
Figure FDA0003370816720000052
plus the cross entropy loss function of the model itself:
Figure FDA0003370816720000053
with the CRF loss function added, the final R-Drop training loss function is:
Figure FDA0003370816720000054
wherein alpha is used to control
Figure FDA0003370816720000055
Coefficient weight of (2).
6. The method for Chinese named entity recognition by fused time series convolution and Transformer encoder as claimed in claim 1, wherein the specific process of step (14) is:
CRF is a conditional probability based criterionThe output of the model, i.e. the input sequence of the CRF, is set to X ═ X (X)1,x2,...,xT) Wherein one possible predicted tag sequence is Y ═ (Y)1,y2,...,yT) Defining the evaluation score s as:
Figure FDA0003370816720000056
wherein the content of the first and second substances,
Figure FDA0003370816720000057
is the escape probability from label i to label j;
Figure FDA0003370816720000058
is the y-th of the characteriA score of each tag; the probability P that the sequence Y occurs in all possible predicted sequences is:
Figure FDA0003370816720000061
wherein the content of the first and second substances,
Figure FDA0003370816720000062
is a possible predicted sequence observation; y istA set of observations for all possible occurrences of the predicted sequence for input sequence X; in the CRF training, a maximum likelihood estimation method is introduced to define a loss function:
Figure FDA0003370816720000063
when predicting the CRF, selecting the candidate label sequence with the maximum probability as a final result according to the trained parameters.
CN202111399845.9A 2021-11-24 2021-11-24 Chinese named entity recognition method integrating time sequence convolution and transform encoder Active CN114169330B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111399845.9A CN114169330B (en) 2021-11-24 2021-11-24 Chinese named entity recognition method integrating time sequence convolution and transform encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111399845.9A CN114169330B (en) 2021-11-24 2021-11-24 Chinese named entity recognition method integrating time sequence convolution and transform encoder

Publications (2)

Publication Number Publication Date
CN114169330A true CN114169330A (en) 2022-03-11
CN114169330B CN114169330B (en) 2023-07-14

Family

ID=80480130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111399845.9A Active CN114169330B (en) 2021-11-24 2021-11-24 Chinese named entity recognition method integrating time sequence convolution and transform encoder

Country Status (1)

Country Link
CN (1) CN114169330B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580424A (en) * 2022-04-24 2022-06-03 之江实验室 Labeling method and device for named entity identification of legal document
CN114611336A (en) * 2022-05-11 2022-06-10 中国农业大学 Circulating water aquaculture dissolved oxygen prediction control method, device, equipment and medium
CN114943229A (en) * 2022-04-15 2022-08-26 西北工业大学 Software defect named entity identification method based on multi-level feature fusion
CN115019143A (en) * 2022-06-16 2022-09-06 湖南大学 Text detection method based on CNN and Transformer mixed model
CN115081439A (en) * 2022-07-01 2022-09-20 淮阴工学院 Chemical medicine classification method and system based on multi-feature adaptive enhancement
CN115344504A (en) * 2022-10-19 2022-11-15 广州软件应用技术研究院 Software test case automatic generation method and tool based on requirement specification
CN115545269A (en) * 2022-08-09 2022-12-30 南京信息工程大学 Power grid parameter identification method based on convolution self-attention Transformer model
CN115879473A (en) * 2022-12-26 2023-03-31 淮阴工学院 Chinese medical named entity recognition method based on improved graph attention network
CN116186574A (en) * 2022-09-09 2023-05-30 武汉中数医疗科技有限公司 Thyroid sampling data identification method based on artificial intelligence
CN116341557A (en) * 2023-05-29 2023-06-27 华北理工大学 Diabetes medical text named entity recognition method
CN117077672A (en) * 2023-07-05 2023-11-17 哈尔滨理工大学 Chinese naming entity recognition method based on vocabulary enhancement and TCN-BILSTM model
CN117236323A (en) * 2023-10-09 2023-12-15 青岛中企英才集团商业管理有限公司 Information processing method and system based on big data
CN117807603A (en) * 2024-02-29 2024-04-02 浙江鹏信信息科技股份有限公司 Software supply chain auditing method, system and computer readable storage medium
CN117077672B (en) * 2023-07-05 2024-04-26 哈尔滨理工大学 Chinese naming entity recognition method based on vocabulary enhancement and TCN-BILSTM model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111477221A (en) * 2020-05-28 2020-07-31 中国科学技术大学 Speech recognition system using bidirectional time sequence convolution and self-attention mechanism network
CN111783462A (en) * 2020-06-30 2020-10-16 大连民族大学 Chinese named entity recognition model and method based on dual neural network fusion
AU2020103654A4 (en) * 2019-10-28 2021-01-14 Nanjing Normal University Method for intelligent construction of place name annotated corpus based on interactive and iterative learning
CN112989834A (en) * 2021-04-15 2021-06-18 杭州一知智能科技有限公司 Named entity identification method and system based on flat grid enhanced linear converter
CN113269277A (en) * 2020-07-27 2021-08-17 西北工业大学 Continuous dimension emotion recognition method based on Transformer encoder and multi-head multi-modal attention

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2020103654A4 (en) * 2019-10-28 2021-01-14 Nanjing Normal University Method for intelligent construction of place name annotated corpus based on interactive and iterative learning
CN111477221A (en) * 2020-05-28 2020-07-31 中国科学技术大学 Speech recognition system using bidirectional time sequence convolution and self-attention mechanism network
CN111783462A (en) * 2020-06-30 2020-10-16 大连民族大学 Chinese named entity recognition model and method based on dual neural network fusion
CN113269277A (en) * 2020-07-27 2021-08-17 西北工业大学 Continuous dimension emotion recognition method based on Transformer encoder and multi-head multi-modal attention
CN112989834A (en) * 2021-04-15 2021-06-18 杭州一知智能科技有限公司 Named entity identification method and system based on flat grid enhanced linear converter

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIANZHUO等: "Research on Named Entity Recognition in Chinese EMR Based on Semi-Supervised Learning with Dual Selected Strategy", 《ACAI 2020》, pages 1 - 10 *
LIN SUN等: "Joint Learning of Token Context and Span Feature for Span-Based Nested NER", 《IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》, vol. 28, pages 2720 - 2730, XP011814457, DOI: 10.1109/TASLP.2020.3024944 *
蒋维: "基于时序卷积网络的中文命名实体识别研究", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》, pages 138 - 2568 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943229A (en) * 2022-04-15 2022-08-26 西北工业大学 Software defect named entity identification method based on multi-level feature fusion
CN114943229B (en) * 2022-04-15 2024-03-12 西北工业大学 Multi-level feature fusion-based software defect named entity identification method
CN114580424A (en) * 2022-04-24 2022-06-03 之江实验室 Labeling method and device for named entity identification of legal document
CN114611336A (en) * 2022-05-11 2022-06-10 中国农业大学 Circulating water aquaculture dissolved oxygen prediction control method, device, equipment and medium
CN115019143A (en) * 2022-06-16 2022-09-06 湖南大学 Text detection method based on CNN and Transformer mixed model
CN115081439A (en) * 2022-07-01 2022-09-20 淮阴工学院 Chemical medicine classification method and system based on multi-feature adaptive enhancement
CN115081439B (en) * 2022-07-01 2024-02-27 淮阴工学院 Multi-feature self-adaptive enhancement-based chemical classification method and system
CN115545269A (en) * 2022-08-09 2022-12-30 南京信息工程大学 Power grid parameter identification method based on convolution self-attention Transformer model
CN116186574B (en) * 2022-09-09 2023-12-12 武汉中数医疗科技有限公司 Thyroid sampling data identification method based on artificial intelligence
CN116186574A (en) * 2022-09-09 2023-05-30 武汉中数医疗科技有限公司 Thyroid sampling data identification method based on artificial intelligence
CN115344504A (en) * 2022-10-19 2022-11-15 广州软件应用技术研究院 Software test case automatic generation method and tool based on requirement specification
CN115344504B (en) * 2022-10-19 2023-03-24 广州软件应用技术研究院 Software test case automatic generation method and tool based on requirement specification
CN115879473B (en) * 2022-12-26 2023-12-01 淮阴工学院 Chinese medical named entity recognition method based on improved graph attention network
CN115879473A (en) * 2022-12-26 2023-03-31 淮阴工学院 Chinese medical named entity recognition method based on improved graph attention network
CN116341557A (en) * 2023-05-29 2023-06-27 华北理工大学 Diabetes medical text named entity recognition method
CN117077672A (en) * 2023-07-05 2023-11-17 哈尔滨理工大学 Chinese naming entity recognition method based on vocabulary enhancement and TCN-BILSTM model
CN117077672B (en) * 2023-07-05 2024-04-26 哈尔滨理工大学 Chinese naming entity recognition method based on vocabulary enhancement and TCN-BILSTM model
CN117236323A (en) * 2023-10-09 2023-12-15 青岛中企英才集团商业管理有限公司 Information processing method and system based on big data
CN117236323B (en) * 2023-10-09 2024-03-29 京闽数科(北京)有限公司 Information processing method and system based on big data
CN117807603A (en) * 2024-02-29 2024-04-02 浙江鹏信信息科技股份有限公司 Software supply chain auditing method, system and computer readable storage medium
CN117807603B (en) * 2024-02-29 2024-04-30 浙江鹏信信息科技股份有限公司 Software supply chain auditing method, system and computer readable storage medium

Also Published As

Publication number Publication date
CN114169330B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
CN114169330B (en) Chinese named entity recognition method integrating time sequence convolution and transform encoder
CN109284506B (en) User comment emotion analysis system and method based on attention convolution neural network
CN111897908B (en) Event extraction method and system integrating dependency information and pre-training language model
CN109299273B (en) Multi-source multi-label text classification method and system based on improved seq2seq model
CN110969020B (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN112613308B (en) User intention recognition method, device, terminal equipment and storage medium
CN111738003B (en) Named entity recognition model training method, named entity recognition method and medium
CN112749274B (en) Chinese text classification method based on attention mechanism and interference word deletion
CN116432655B (en) Method and device for identifying named entities with few samples based on language knowledge learning
CN112287672A (en) Text intention recognition method and device, electronic equipment and storage medium
CN115831102A (en) Speech recognition method and device based on pre-training feature representation and electronic equipment
CN113806494A (en) Named entity recognition method based on pre-training language model
CN109446326A (en) Biomedical event based on replicanism combines abstracting method
CN114429132A (en) Named entity identification method and device based on mixed lattice self-attention network
CN113255366A (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN114528835A (en) Semi-supervised specialized term extraction method, medium and equipment based on interval discrimination
CN110298046B (en) Translation model training method, text translation method and related device
CN113887836B (en) Descriptive event prediction method integrating event environment information
CN111259673A (en) Feedback sequence multi-task learning-based law decision prediction method and system
CN116629361A (en) Knowledge reasoning method based on ontology learning and attention mechanism
CN115796182A (en) Multi-modal named entity recognition method based on entity-level cross-modal interaction
CN114298052B (en) Entity joint annotation relation extraction method and system based on probability graph
CN115906857A (en) Chinese medicine text named entity recognition method based on vocabulary enhancement
CN114580423A (en) Bert and Scat-based shale gas field named entity identification method
CN114756679A (en) Chinese medical text entity relation combined extraction method based on conversation attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 1603-12, No. 8, Financial Second Street, Wuxi Economic Development Zone, Jiangsu Province, 214000

Applicant after: Uni-Entropy Intelligent Technology (Wuxi) Co.,Ltd.

Address before: 214072 room 1603-12, No. 8, financial Second Street, economic development zone, Wuxi City, Jiangsu Province

Applicant before: Yunentropy Education Technology (Wuxi) Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant