CN115600597A - Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium - Google Patents

Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium Download PDF

Info

Publication number
CN115600597A
CN115600597A CN202211271734.4A CN202211271734A CN115600597A CN 115600597 A CN115600597 A CN 115600597A CN 202211271734 A CN202211271734 A CN 202211271734A CN 115600597 A CN115600597 A CN 115600597A
Authority
CN
China
Prior art keywords
word
sub
semantic
words
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211271734.4A
Other languages
Chinese (zh)
Inventor
王媛媛
胡荣林
董甜甜
邱军林
曹昆
郭俊莹
张海艳
冯万利
王忆雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaiyin Institute of Technology
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN202211271734.4A priority Critical patent/CN115600597A/en
Publication of CN115600597A publication Critical patent/CN115600597A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a named entity identification method, a device, a system and a storage medium based on attention mechanism and in-word semantic fusion, wherein the method comprises the following steps: s1, inputting a sentence sequence into a sub-word fitter to be matched with sub-word embedding information; s2, inputting the matched sub-word embedded information into a CNN semantic network to extract the internal semantic features of the sub-words; s3, obtaining a word-level text representation by using a CHINESE-BERT model, and inputting the word-level text representation into the global context characteristics of the learning sentence in the BI-LSTM network; s4, inputting the obtained internal semantic features and the global context features of the sub-words into a WordFusionAttention module, and extracting key context features after the internal features of the words are fused; and S5, inputting the fused key context characteristics into a CRF decoder to predict an entity label. Compared with the prior art, the method provided by the invention can effectively improve the recognition precision of the named entity and relieve the problem of difficult recognition of the unknown words.

Description

Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium
Technical Field
The invention relates to the technical field of Chinese named entity recognition of computer natural language processing, in particular to a named entity recognition method, a named entity recognition device, a named entity recognition system and a storage medium based on attention mechanism and intra-word semantic fusion.
Background
Named entity recognition is a popular research direction in the field of natural language processing, and its technology has achieved competitive results in the general field, but its application in chinese text still has huge problems and challenges.
With the wide application of deep learning, a deep neural network-based model is commonly adopted in the named entity recognition task. The method is to use a deep neural network to extract text features by taking words or word vectors in sentences as input. The Bi-LSTM + CRF model as proposed by Huang et al: huang Z, wei X, kai Y. Bidirectional LSTM-CRF Models for Sequence Tagging [ J ] Computer Science 2015. The phenomenon of word ambiguity frequently occurring in Chinese texts can cause poor recognition effect, the accuracy of the model is effectively improved until the BERT pre-training model appears, semantic information in sentences can be well represented, and therefore the problem of word ambiguity is solved. However, these methods are trained based on a large amount of corpus and a large amount of labeled data, and at present, there are few studies on the recognition of chinese text entities in some professional fields at home and abroad, such as the chemical field, and it is difficult to improve the recognition performance in the case of facing a small number of samples, and the reasons mainly include the following two points:
first, the existing pre-training model-based general named entity recognition uses a large-scale training corpus, which contains a plurality of domain knowledge, so that the model lacks attention to professional vocabularies in a specific domain, and imbalance of various domain samples can lead to poor entity type recognition effect of the general model in the professional domain.
Secondly, entities in the professional field are different from general entities, vocabularies in the professional field have the characteristics of complex naming rules and frequent appearance of new words, and the existing model based on sequence labeling has the problem that internal composition information and unknown words of important words in sentences are ignored and are difficult to identify.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems pointed out in the background technology, the invention discloses a named entity recognition method, a named entity recognition device, a named entity recognition system and a named entity storage medium based on attention mechanism and intra-word semantic fusion.
The technical scheme is as follows: the invention discloses a named entity identification method based on attention mechanism and in-word semantic fusion, which comprises the following steps:
s1, dividing text data to obtain a text sequence with sentences as units;
s2, inputting the text sequence obtained in the step S1 into a sub-word fitter to obtain sub-word representations of words in the text;
s3, local semantic information in the sub-word representation obtained in the step S2 is learned through the CNN semantic network, and the internal features of the sub-words are extracted;
s4, extracting character-level text representation of the text sequence in the step S1 by using a CHINESE-BERT model;
s5, learning context long-distance information from the character-level text representation in the step S4 by using Bi-LSTM, and extracting global context semantic features;
s6, inputting the internal features of the subwords in the S3 and the global context semantic features in the S5 into a WordFusionAttention module to obtain key context features;
and S7, inputting the key context characteristics obtained in the step S6 into a CRF decoder, learning the internal characteristic constraint of the text, and obtaining the label of entity identification.
Further, the sub-word adaptorator in step S2 matches the existing words in the thesaurus from the text, concatenates the sub-word representations at the beginning of the same character to form the sub-word representations in the text, and specifically includes the following steps:
s2.1, constructing a word stock into a dictionary tree T, wherein each node in the dictionary tree stores each Chinese character of a word, a root node stores a first Chinese character of the word, a back pointer of the node points to a next Chinese character of the word, and a front pointer of the node points to a previous Chinese character of the word;
s2.2, traversing each character in the text sequence obtained in the step S1, searching by using the dictionary tree T in the step S2.1, and obtaining a word set W corresponding to each word by taking each word in the input sequence S as a word at the beginning i,j I belongs to n, j belongs to l; wherein, W i,j The method comprises the steps of representing a j-th word which is matched and begins with the ith character in a sentence, and l representing the number of the matched words which begin with the ith character in the sentence; writing a none value into a set which is not matched with the words;
s2.3, inputting the sub-words in the word set obtained in the step S2.2 into a CHINESE-BERT model to obtain a sub-word matrix based on word embedding
Figure BDA0003895245640000021
Filling and zero matrix splicing are carried out on the sub-word matrix corresponding to each character to obtain spatial information W embedded with the three-dimensional sub-words i The first dimension is the number of each word corresponding to a sub-word, the second dimension is the length of each sub-word, and the third dimension is the vector dimension of the word in the sub-word:
CW i j =e(W i,j )
W i =Ex(CW i j )
wherein e (-) represents the loaded CHINESE-BERT model; ex (-) denotes the padding and splicing operation.
Further, the CNN semantic network in step S3 includes two convolutional layers and one pooling layer, where the size of the convolutional kernel in the first convolutional layer is 9 × 9; the convolution kernel size in the second convolution layer is 3 × 3; a max pooling operation behind the first convolutional layer, with a window of 1 x 3; inputting the sub-word embedded vector into a first convolution layer to obtain shallow semantic features inside words; the shallow semantic feature vector is down-sampled through a pooling layer to obtain a semantic feature vector; and extracting deep semantic features inside the words from the semantic feature vectors through a second convolution layer to obtain the deep semantic features.
Further, in step S6, the wordfusion attention module calculates similarity between the context of the sentence and the feature of the subword through dot product operation to dynamically weight the sentence, which specifically includes the following steps:
step 4.1, the global context feature X obtains two feature matrixes K and V through two linear changes, and the formula is as follows:
K(X),V(X)=x T E k ,x T W v
step 4.2, calculating the similarity by performing dot product operation on the feature matrix K and the sub-word feature q, using mu to scale the space range, and using tanh () function to perform normalization processing, wherein the formula is as follows:
H(K,q)=tanh(μKq)
and 4.3, performing weight adjustment on the changed global context characteristics through the similarity, wherein the formula is as follows:
Att=softmax(H(x,q)V)
Figure BDA0003895245640000031
wherein, W v ,E k Is the weight matrix to be learned, mu is the scale factor control, x represents the input context global features, q represents the sub-word internal feature vectors,
Figure BDA0003895245640000032
and representing the context feature vector after fusing the internal features of the subwords.
Further, in step S7, the CRF decoder extracts the relationship features between the entity combination and the tag by calculating the transition features of the tag, so as to predict the entity tag type.
The invention also discloses a named entity recognition system based on attention mechanism and in-word semantic fusion, which comprises the following modules:
the data preprocessing module is used for dividing the text to obtain a sentence sequence required by model input;
the embedding module comprises a sentence sequence for acquiring character embedding, words in the sentences are matched, and sub-word embedding vectors are acquired;
the coding module extracts context characteristics based on the word-embedded sentence sequence and extracts the internal semantic characteristics of the sub-words;
the WordFusionAttention module dynamically fuses semantic features inside sub-words through an improved attention mechanism for context features based on word-embedded sentence sequences, enriches the features of sentences and facilitates the model to understand semantic information of the words;
and the decoding module is used for extracting the transfer characteristics of the label and predicting the label.
The invention also discloses a named entity recognition device based on attention mechanism and intra-word semantic fusion, which comprises a memory and a processor, wherein:
a memory for storing a computer program capable of running on the processor;
a processor for executing the steps of the named entity recognition method based on attention mechanism and intra-word semantic fusion described above when running the computer program.
The invention also discloses a storage medium, wherein a computer program is stored on the storage medium, and when the computer program is executed by at least one processor, the steps of the named entity identification method based on the attention mechanism and the intra-word semantic fusion are realized.
Has the advantages that:
the method comprises the steps of matching information of professional words in an input sentence through a professional word bank, obtaining sub-word representation based on word vectors through a large-scale pre-training CHINESE-BERT model, obtaining high-dimensional sub-word embedding space information through splicing and filling, learning intra-word semantic features of a plurality of sub-words corresponding to each word of the sentence through a CNN semantic network, and fusing the intra-word semantic features into the context features based on the word vectors through an attention system.
Drawings
FIG. 1 is a flow chart of an entity identification task of the present invention;
FIG. 2 is a block diagram of a CNN semantic network;
FIG. 3 is a block diagram of a WordFusionAttention module;
FIG. 4 is a model diagram of the named entity recognition method of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and the specific embodiments. The following embodiments are merely illustrative of the technical concepts and features of the present invention, and are intended to enable one skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the scope of the present invention. All equivalents and modifications made according to the spirit of the present invention are intended to be included within the scope of the present invention.
As shown in FIG. 1, the named entity recognition method based on attention mechanism and intra-word semantic fusion of the invention comprises the following steps:
s1, inputting text data into a preprocessing module to obtain an input sequence S = { x } taking a sentence as a unit 1 ,x 2 ,...,x n In which x i Representing the ith character in the sentence.
In this example, in order to make the input value of the model have a fixed dimension, the input sequence S is tail-filled with a value of 0, resulting in an input sequence with a sentence length of 50.
S2, inputting the input sequence S in the step S1 into a sub-word fitter for dynamic searching to obtain a sub-word embedded vector W i j
In the embodiment of the invention, the sub-word regulator comprises the following steps:
and S2.1, constructing a dictionary tree T according to the special dictionary in the step S1.
S2.2, searching the word corresponding to each character in the input sequence S by using the dictionary tree T in the step S2.1 to obtain a word set W corresponding to each character i,j I belongs to n, j belongs to l; wherein, W i,j The method comprises the steps of representing the jth matched word beginning with the ith character in a sentence, and l representing the number of the matched words beginning with the ith character in the sentence; no. is written to the set of words that are not matched.
S2.3, inputting the sub-words in the word set obtained in the step S2.2 into a CHINESE-BERT model to obtain an embedded vector of each sub-word
Figure BDA0003895245640000051
Then, tail filling 0 and zero matrix splicing are carried out on the embedded vector of each sub-word to obtain a sub-word embedded vector W corresponding to the final word i The formula is as follows:
CW i j =e(W i,j )
W i =Ex(CW i j )
wherein e (-) represents the loaded CHINESE-BERT model; CW i j Dimension of
Figure BDA0003895245640000061
Figure BDA0003895245640000062
Representing the length of the jth sub-word corresponding to the ith word; ex (·) denotes a splicing operation; w i The dimensions are 16 × 32 × 768.
In the embodiment of the invention, in order to embed the sub-word corresponding to each word into the vector W i The dimensionality is the same, the maximum length of a single word is set to be 32, and the maximum number of sub-words corresponding to each word is set to be 16.
S3, inputting the word sequence obtained in the step S2 into a CNN semantic network to obtain a sub-word internal feature vector V W
Further, the CNN module in step S3 includes two convolutional layers and one pooling layer. Wherein, the first convolution layer has 7 convolution kernels with the size of 16 multiplied by 9, the filling is 0, and the step length is 2; the second convolution layer has 7 convolution kernels with a size of 1 × 3 × 3, a padding of 1, and a step size of 2; a maximum pooling operation is followed by the first convolutional layer, with a window of 1 × 3 and a step size of 1; inputting the sub-word embedded vector into a first convolution layer to obtain shallow semantic features F1 inside words; the shallow semantic feature vector is down-sampled through a pooling layer to obtain a semantic feature vector F2; extracting deep semantic features inside the words from the semantic feature vector F2 through a second convolution layer to obtain deep semantic features V with the dimension of 1 multiplied by 6 multiplied by 32 W The method specifically comprises the following steps:
s3.1, embedding the sub-words, firstly inputting the sub-words into a 9 multiplied by 9 convolutional layer to extract shallow semantic features F1 of the relation between the characters in the sub-words, wherein the specific formula is as follows:
F1=k i ·x
wherein k is i For the ith convolution kernel parameter, x represents the input value.
S3.2, the shallow semantic features F1 can extract the key semantic features F2 of the words in the subwords through pooling operation with a window of 1 x 3.
S3.3, extracting deep semantic features F3 in the sub-words through a 3 x 3 convolutional layer by using the key semantic features F2, and then inputting the deep semantic features F3 into an activation function
Figure BDA0003895245640000063
The concrete formula is as follows:
Figure BDA0003895245640000064
where x is the input feature and ε is the impact factor.
FIG. 1 is a block diagram of an intra-word semantic network that includes two convolutional layers and one max-pooling layer. The sub-word embedding vector with the dimension of 16 multiplied by 32 multiplied by 768 firstly passes through a convolution layer of 9 multiplied by 9, shallow semantic features of professional words of 7 multiplied by 12 multiplied by 130 are extracted, then dimension reduction is carried out through maximum pooling operation, low-dimensional features of 7 multiplied by 12 multiplied by 64 are obtained, finally standard convolution operation is carried out with 1 convolution kernel of 3 multiplied by 3, and deep semantic features of 1 multiplied by 6 multiplied by 32 in the professional words are obtained.
S4, inputting the input sequence S obtained in the step S1 into a CHINESE-BERT model to obtain a sentence sequence vector S at a character level c The dimension is 1 × 50 × 768.
S5, sentence sequence vector S in the step S4 c Inputting the context global characteristics V into a BI-LSTM network to obtain the context global characteristics V of the character level C
In the embodiment of the invention, the BI-LSTM network consists of forward LSTM and backward LSTM networks, and captures the context characteristics from left to right and from right to left respectively, so that the global context characteristic information of a sentence can be better acquired. The LSTM network includes an input gate, a forgetting gate, and an output gate mechanism.
The input gate is defined as:
i t =σ([h t-1 ,s t ]·w i +b i )
Figure BDA0003895245640000071
the forgetting gate is defined as:
f t =σ([h t-1 ,s t ]W f +b f )
the output gate is defined as:
O t =σ([h t-1 ,s t ]·W o +b o )
h t =O t ⊙tanh(C t )
wherein an indicates vector element multiplication.
Sentence sequence vector S C Adopting forward input and reverse input, obtaining two different intermediate layer representations through calculation, and splicing the two vector representations to be used as the output of a hidden layer:
Figure BDA0003895245640000072
Figure BDA0003895245640000073
Figure BDA0003895245640000074
through the last hidden layer, obtaining the global context characteristic V C ={h 1 ,h 2 ,…,h n Dimension 1 × 6 × 32.
S6, the characteristic vector V in the sub-word obtained in the step S3 W And the character-level context global characteristics V obtained in the step S5 C Inputting the final context feature vector V of the text sentence into a WordFusionAttention module S
Furthermore, the WordFusionAttention module in step S6 is composed of an improved dot product attention mechanism, and the context global feature V obtained in step S5 is dynamically adjusted by the attention mechanism in combination with the semantic information in the subwords C The WordFusionAttention module is composed of an improved dot product attention mechanism as follows:
step 6.1, the global context feature X is subjected to two linear changes to obtain two feature matrixes K and V, and the formula is as follows:
K(X),V(X)=x T E k ,x T W v
step 6.2, calculating the similarity by performing dot product operation on the feature matrix K and the subword feature q, using mu to scale the space range, and using a tanh () function to perform normalization processing, wherein the formula is as follows:
H(K,q)=tanh(μKq)
and 6.3, performing weight adjustment on the changed global context characteristics through the similarity, wherein the formula is as follows:
Att=softmax(H(x,q)V)
Figure BDA0003895245640000081
wherein, W v ,E k Is the weight matrix to be learned, mu is the scale factor control, x represents the input context global features, q represents the sub-word internal feature vectors,
Figure BDA0003895245640000082
and representing the context feature vector after fusing the internal features of the subwords.
FIG. 2 is a block diagram of a WordFusionLayer module. Using W k ,W v The matrix carries out spatial transformation on the context global features, and then calculates the similarity between the global features of the sentence context and the internal features of the sub-words through dot product, so that the local features of the professional words can be enhanced, and the fusion of the boundary information of the professional words is realized.
S7, inputting the final context feature vector obtained in the step S6 into a CRF decoder to learn sentence internal feature constraints, and outputting an entity tag sequence;
further, the CRF encoder in step S7 includes a label transfer matrix, a scoring function, and a loss function, where the label transfer matrix M is a training weight, and scores the predicted label sequence L of the input sentence S as:
Figure BDA0003895245640000083
normalizing all possible label sequences to obtain the probability of a prediction sequence L, wherein the objective function is as follows:
Figure BDA0003895245640000091
the loss function is:
log(p(l|S))=score(S,l)-log(∑score(S,l′))
the final prediction is defined as:
Figure BDA0003895245640000092
the invention also discloses a named entity recognition system based on attention mechanism and in-word semantic fusion, which comprises the following modules:
and the data preprocessing module is used for dividing the text to obtain a sentence sequence required by the model input.
And the embedding module comprises a sentence sequence for acquiring embedded characters, matching words in the sentences and acquiring sub-word embedded vectors.
And the coding module is used for extracting context characteristics based on the word embedded sentence sequence and extracting the internal semantic characteristics of the sub-words.
The WordFusionAttention module dynamically fuses the semantic features inside the sub-words through an improved attention mechanism for the context features based on the word-embedded sentence sequence, enriches the sentence features, and facilitates the model to understand the semantic information of the professional words.
And the decoding module is used for extracting the transfer characteristics of the label and predicting the label.
The implementation functions of the modules of the named entity recognition system are implemented by a named entity recognition method based on attention mechanism and intra-word semantic fusion, which are not described herein again.
The invention also discloses a named entity recognition device based on attention mechanism and in-word semantic fusion, which comprises a memory and a processor, wherein:
a memory for storing a computer program capable of running on the processor.
A processor for executing the steps of a named entity recognition method based on attention mechanism and intra-word semantic fusion as described above when running the computer program.
The invention further discloses a storage medium on which a computer program is stored, which computer program, when being executed by at least one processor, carries out the above steps of a named entity recognition method based on attention mechanism and intra-word semantic fusion.
The above embodiments are merely illustrative of the technical concepts and features of the present invention, and the purpose of the embodiments is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (8)

1. A named entity recognition method based on attention mechanism and in-word semantic fusion is characterized by comprising the following steps:
s1, dividing text data to obtain a text sequence with sentences as units;
s2, inputting the text sequence obtained in the step S1 into a sub-word fitter to obtain sub-word representations of words in the text;
s3, the CNN semantic network learns the local semantic information in the sub-word representation obtained in the step S2, and extracts the internal features of the sub-words;
s4, extracting character-level text representation of the text sequence in the step S1 by using a CHINESE-BERT model;
s5, learning context long-distance information from the character-level text representation in the step S4 by using Bi-LSTM, and extracting global context semantic features;
s6, inputting the internal features of the subwords in the S3 and the global context semantic features in the S5 into a WordFusionAttention module to obtain key context features;
and S7, inputting the key context characteristics obtained in the step S6 into a CRF decoder, learning the internal characteristic constraint of the text, and obtaining the label of entity identification.
2. The named entity recognition method based on attention mechanism and intra-word semantic fusion as claimed in claim 1, wherein the sub-word fitter in step S2 is to match existing words in a lexicon from a text, concatenate sub-word representations at the beginning of the same character to become sub-word representations in the text, and specifically comprises the following steps:
s2.1, constructing a word library into a dictionary tree T, wherein each node in the dictionary tree stores each Chinese character of a word, a root node stores a first Chinese character of the word, a back pointer of the node points to a next Chinese character of the word, and a front pointer of the node points to a previous Chinese character of the word;
s2.2, traversing each character in the text sequence obtained in the S1, searching by using the dictionary tree T in the S2.1, and obtaining a word set W corresponding to each character by taking each character in the input sequence S as a word at the beginning i,j I belongs to n, j belongs to l; wherein, W i,j The method comprises the steps of representing a j-th word which is matched and begins with the ith character in a sentence, and l representing the number of the matched words which begin with the ith character in the sentence; writing a none value into a set which is not matched with the words;
s2.3, inputting the sub-words in the word set obtained in the step S2.2 into a CHINESE-BERT model to obtain a sub-word matrix based on word embedding
Figure FDA0003895245630000011
Filling and zero matrix splicing are carried out on the sub-word matrix corresponding to each character to obtain spatial information W embedded with the three-dimensional sub-words i The first dimension is the number of each word corresponding to a sub-word, the second dimension is the length of each sub-word, and the third dimension is the vector dimension of the word in the sub-word:
CW i j =e(W i,j )
W i =Ex(CW i j )
wherein e (-) represents the loaded CHINESE-BERT model; ex (-) denotes the padding and splicing operation.
3. The method for identifying named entities based on attention mechanism and intra-word semantic fusion as claimed in claim 1, wherein the CNN semantic network in step S3 comprises two convolutional layers and one pooling layer, wherein the convolutional kernel size in the first convolutional layer is 9 x 9; the convolution kernel size in the second convolution layer is 3 x 3; a max pooling operation behind the first convolutional layer, with a window of 1 x 3; inputting the sub-word embedding vector into a first convolution layer to obtain shallow semantic features inside words; the shallow semantic feature vector is down-sampled through a pooling layer to obtain a semantic feature vector; and extracting deep semantic features inside the words from the semantic feature vectors through the second convolution layer to obtain the deep semantic features.
4. The method for named entity recognition based on attention mechanism and intra-word semantic fusion as claimed in claim 1, wherein the wordfusion attention module calculates similarity between the sentence context and the sub-word feature through dot product operation in step S6 to dynamically weight, specifically comprising the following steps:
step 4.1, the global context feature X obtains two feature matrixes K and V through two linear changes, and the formula is as follows:
K(X),V(X)=x T E k ,x T W v
step 4.2, calculating the similarity by performing dot product operation on the feature matrix K and the sub-word feature q, using mu to scale the space range, and using tanh () function to perform normalization processing, wherein the formula is as follows:
H(K,q)=tanh(μKq)
and 4.3, performing weight adjustment on the changed global context characteristics through the similarity, wherein the formula is as follows:
Att=softmax(H(x,q)V)
Figure FDA0003895245630000021
wherein, W v ,E k Is to be studiedThe learned weight matrix, mu, is the control of the scaling factor, x represents the context global feature of the input, q represents the sub-word internal feature vector,
Figure FDA0003895245630000022
and representing the context feature vector after fusing the internal features of the subwords.
5. The method for identifying named entities based on attention mechanism and intra-word semantic fusion as claimed in claim 1, wherein the CRF decoder in step S7 predicts the entity tag type by calculating the branch feature of the tag and extracting the relationship feature between the entity combination and the tag.
6. A named entity recognition system based on attention mechanism and intra-word semantic fusion is characterized by comprising the following modules:
the data preprocessing module is used for dividing the text to obtain a sentence sequence required by model input;
the embedding module comprises a sentence sequence for acquiring character embedding, words in the sentences are matched, and sub-word embedding vectors are acquired;
the encoding module is used for extracting context characteristics based on the word embedded sentence sequence and extracting the internal semantic characteristics of the sub-words;
the WordFusionAttention module dynamically fuses semantic features inside sub-words through an improved attention mechanism for context features based on word-embedded sentence sequences, enriches the features of sentences and facilitates the model to understand semantic information of the words;
and the decoding module is used for extracting the transfer characteristics of the label and predicting the label.
7. A named entity recognition device based on attention mechanism and intra-word semantic fusion is characterized by comprising a memory and a processor;
a memory for storing a computer program capable of running on the processor;
a processor for performing the steps of the named entity recognition method based on attention mechanism and intra-word semantic fusion according to any one of claims 1 to 5 when running the computer program.
8. A storage medium having stored thereon a computer program which, when executed by at least one processor, carries out the steps of the method for named entity recognition based on attention-driven and intra-word semantic fusion according to any one of claims 1 to 5.
CN202211271734.4A 2022-10-18 2022-10-18 Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium Pending CN115600597A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211271734.4A CN115600597A (en) 2022-10-18 2022-10-18 Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211271734.4A CN115600597A (en) 2022-10-18 2022-10-18 Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium

Publications (1)

Publication Number Publication Date
CN115600597A true CN115600597A (en) 2023-01-13

Family

ID=84846080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211271734.4A Pending CN115600597A (en) 2022-10-18 2022-10-18 Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium

Country Status (1)

Country Link
CN (1) CN115600597A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116486420A (en) * 2023-04-12 2023-07-25 北京百度网讯科技有限公司 Entity extraction method, device and storage medium of document image

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116486420A (en) * 2023-04-12 2023-07-25 北京百度网讯科技有限公司 Entity extraction method, device and storage medium of document image
CN116486420B (en) * 2023-04-12 2024-01-12 北京百度网讯科技有限公司 Entity extraction method, device and storage medium of document image

Similar Documents

Publication Publication Date Title
CN111783462B (en) Chinese named entity recognition model and method based on double neural network fusion
CN110334354B (en) Chinese relation extraction method
Zhou et al. A C-LSTM neural network for text classification
CN109726389B (en) Chinese missing pronoun completion method based on common sense and reasoning
CN110377903B (en) Sentence-level entity and relation combined extraction method
CN110263325B (en) Chinese word segmentation system
CN111611810B (en) Multi-tone word pronunciation disambiguation device and method
CN112541356B (en) Method and system for recognizing biomedical named entities
CN111666758B (en) Chinese word segmentation method, training device and computer readable storage medium
CN110990555B (en) End-to-end retrieval type dialogue method and system and computer equipment
CN112711948A (en) Named entity recognition method and device for Chinese sentences
CN113190656B (en) Chinese named entity extraction method based on multi-annotation frame and fusion features
CN110134950B (en) Automatic text proofreading method combining words
CN113128203A (en) Attention mechanism-based relationship extraction method, system, equipment and storage medium
CN113255320A (en) Entity relation extraction method and device based on syntax tree and graph attention machine mechanism
CN112800239A (en) Intention recognition model training method, intention recognition method and device
CN113743099A (en) Self-attention mechanism-based term extraction system, method, medium and terminal
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN114781375A (en) Military equipment relation extraction method based on BERT and attention mechanism
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN114781380A (en) Chinese named entity recognition method, equipment and medium fusing multi-granularity information
CN111881256A (en) Text entity relation extraction method and device and computer readable storage medium equipment
CN115169349A (en) Chinese electronic resume named entity recognition method based on ALBERT
CN113191150B (en) Multi-feature fusion Chinese medical text named entity identification method
CN115600597A (en) Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination