CN114239584A - Named entity identification method based on self-supervision learning - Google Patents
Named entity identification method based on self-supervision learning Download PDFInfo
- Publication number
- CN114239584A CN114239584A CN202111539122.4A CN202111539122A CN114239584A CN 114239584 A CN114239584 A CN 114239584A CN 202111539122 A CN202111539122 A CN 202111539122A CN 114239584 A CN114239584 A CN 114239584A
- Authority
- CN
- China
- Prior art keywords
- entity
- output
- embedding
- vector
- main network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The invention discloses a named entity identification method based on self-supervision learning, which comprises the following steps: preprocessing a data set, constructing a positive example sentence pair and a negative example sentence pair by using the processed data set, and respectively encoding sentences in the positive example sentence pair and the negative example sentence pair by using an embedding encoder; different definitions of entities in different contexts are learned according to different entity features and similarity matrixes in a named entity recognition model based on self-supervision learning, feature vectors of the named entity recognition model are fully learned according to the similarity of positive example sentence pairs and negative example sentence pairs, and the language difference of different corpora is met. The invention improves the accuracy of named entity recognition, and solves the problem of entity type recognition error caused by word abbreviation in the output result through the knowledge map, thereby more accurately predicting the entity and the entity type, and leading the word embedding vector with the ambiguous word to better represent the paraphrase of the word in the current context.
Description
Technical Field
The invention relates to a named entity identification method, in particular to a named entity identification method based on self-supervision learning.
Background
The big data era comes, and research on named entity recognition gradually becomes one of the advanced fields of interdiscipline of cognitive science, information science and intelligent science and international emerging. In recent years, the western developed countries attach increasing importance to named entity identification, and open source information extraction becomes one of the important bases for national defense policy, strategic decision and command operation of each country. Named entity recognition is also rapidly becoming one of the international leading hot spots in the field of informatics in academia.
Existing named entity recognition methods mostly extract entities and entity types based on text. The main task of named entity recognition is to recognize and classify proper nouns such as names of people and places and numerical phrases such as meaningful time and date in the text. There are three main methods for named entity identification: rule-based methods, statistical-based methods, and supervised learning-based methods.
The rule-based method mainly extracts entities in the text through text rules by pre-constructing some special rules. The rule-based method has higher accuracy in some specific fields, but also causes great limitation, such as poor cross-domain portability, because the rule-based method has higher accuracy only in some specific fields; the statistics-based method mainly performs statistics on text information and excavates word features from a text corpus. The method based on statistics has higher requirements on the corpus, but the general corpus suitable for the evaluation of the large named entity recognition task is less at present, so the development of the method is limited to a certain extent; the method based on supervised learning mainly obtains a classifier from training data through training, applies the classifier to new entity recognition, solves the limitation of a rule-based method in a specific field to a certain extent and solves the problem of high requirement on a general corpus to a certain extent, but does not well learn the expression of an ambiguous word in the current context in a word embedding stage.
The invention further learns the ambiguous words by utilizing the self-supervision learning, provides a named entity recognition method based on the self-supervision learning and constructs a complete named entity recognition model.
Disclosure of Invention
The invention aims to solve the problem that the existing named entity recognition technology does not well learn the paraphrase of an ambiguous word in the current context in the word embedding stage, and provides a named entity recognition method based on self-supervised learning.
The technical scheme adopted by the invention is as follows:
step 1: preprocessing the data set;
1-1, forming words and conjunctions with entity types marked in a data set into sentences;
1-2 the sentence s of step 1-1iTranslating into sentence a in arbitrary languagesiThen sentence a is repeatediBy a anditranslation into regular sentence in same language
Step 2: constructing positive and negative example sentence pair sets of the sentences processed in the step 1, wherein the positive example sentence pair sets are Set composition, negative example sentence pair setThe negative example sentence pair consists of an original sentence and sentences translated from other sentences in the corpus;
and step 3: respectively carrying out embedding coding on the sentences in the positive example sentence pairs and the negative example sentence pairs by using an embedding coder;
and 4, step 4: embedding the embeded words into vectors and inputting the embedded vectors into a deep neural network layer DNN;
and 5: performing similarity calculation on the output vectors of the positive example sentence pairs and the negative example sentence pairs obtained in the step 4, and splicing the calculation results into a brand-new similarity matrix M according to rowssim(ii) a And optimizing the embedding encoder f in the step 3 by using a contrast loss function l through back propagation and a gradient descent algorithmkThe parameter (1) of (1);
step 6: obtaining sentences formed by words of the marked entity types, constructing a data set, and further dividing the data set into a training set and a testing set;
and 7: establishing a named entity recognition model based on self-supervision learning, wherein the named entity recognition model comprises a main network and a correction module which are sequentially cascaded; then, training the main network by using a training set, testing the trained main network by using a testing set, and finally correcting the output result of the tested main network by using a correction module;
the main network comprises an optimized embedding encoder f in the step 5kA bidirectional LSTM layer and a CRF layer;
the correction module comprises a phrase retrieval module and an entity type modification module; the phrase retrieval module is used for acquiring a potential entity set of the main network input item and screening out the potential entity set existing in the public knowledge graphAnd then combining the potential entity withAnd constructing the entity type into a potential entity set PE; the potential entity set comprises words, phrases formed by a plurality of words, and entity types corresponding to the words and the phrases; the entity type modification module is used for receiving the potential entity set PE output by the phrase retrieval module and the entity type label output by the main network, then comparing the entity type output by the main network with the entity type corresponding to each potential entity in the main network input item in the potential entity set PE, if the entity type output by the main network is consistent with the entity type corresponding to each potential entity in the main network input item in the potential entity set PE, the entity type modification module does not need to modify, and if the entity type output by the main network is inconsistent with the entity type output by the main network input item in the potential entity set PE, the entity type modification module modifies the output result of the main network;
and 8: and the named entity recognition of the text is realized by utilizing a tested named entity recognition model based on self-supervision learning.
It is a further object of the present invention to provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the above-mentioned method.
It is a further object of the present invention to provide a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method described above.
The technical scheme provided by the invention has the following beneficial effects:
according to the invention, positive example sentence pairs and negative example sentence pairs are constructed by translating sentences in multiple intermediate languages, the negative example sentence pairs are stored in a queue in practical use, the data of the current mini-batch enters the queue, the oldest mini-batch data in the past is shifted out of the queue, and the queue is used for decoupling the size of the queue and the batch size, namely the size of the queue is not limited by the constraint of the batch size, so that the problem that a large amount of negative example mini-batch data is needed in self-supervision learning is well solved;
the similarity of word embedding vectors in sentences in a vector representation space is measured by using the similarity function, and the parameters of the embedding encoder are slowly updated in a momentum moving average mode, so that the loss of feature consistency caused by the drastic change of the parameters of the embedding encoder can be avoided, the embedding encoder can be kept in an updated state all the time, and the embedding encoder can better accord with the definition of ambiguous words in the current context when the ambiguous words are encoded in a word embedding encoding stage by using the similarity function and the momentum moving average mode;
the invention solves the problem of entity type recognition error caused by word abbreviation in the output result by disclosing the knowledge map, and further improves the accuracy rate of entity recognition.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of an embedding encoder optimization for self-supervised learning;
FIG. 3 is a diagram of a named entity recognition model architecture based on self-supervised learning according to the present invention;
FIG. 4 is a diagram of a correction module in the named entity recognition model based on self-supervised learning according to the present invention;
Detailed Description
Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings. The specific flow description is shown in fig. 1, wherein:
step 1: preprocessing the data set;
1-1, forming words and conjunctions with entity types marked in a data set into sentences;
the entity is proper nouns such as a name of a person, a place name, a mechanism name and the like in the text;
1-2 the sentence s of step 1-1iTranslating into sentence a in arbitrary languagesiThen sentence a is repeatediBy a anditranslation into regular sentence in same language
Step 2: constructing positive and negative example sentence pair sets of the sentences processed in the step 1, wherein the positive example sentence pair sets are Set composition, negative example sentence pair setThe negative example sentence pair consists of an original sentence and sentences translated from other sentences in the corpus;
and step 3: the method comprises the following steps of using an embedding encoder to respectively conduct embedding encoding on sentences in a positive example sentence pair and a negative example sentence pair, wherein the embedding encoding specifically comprises the following steps:
sentence siInput to the embedding encoder fq(query-encoder) carries out word embedding encoding and obtains an encoded result qi(ii) a At the same time, the sentence siCorresponding positive and negative example sentencesInput to the embedding encoder fk(key-encoder) carries out word embedding encoding and obtains the encoded result
The embedding encoder fq、fkInitialization parameter theta ofq、θkThe same;
and 4, step 4: inputting the imbedding coded word embedding vector to a deep neural network layer (DNN);
the deep neural network layer includes a first fully-connected layer, a Relu layer, and a second fully-connected layer.
(1) First fully-connected layer: converting an embedding vector output by an unaptimized embedding encoder into an output vector with the same dimensionality through one layer of linear change;
odense1=Wxinput+b
wherein o isdense1Representing the output vector, xinputRepresenting an embedding vector output by an unaptimized embedding encoder, wherein W represents a weight matrix, and b represents a bias vector;
(2) relu layer: the convergence speed of the model can be maintained in a stable state by inputting the output vector of the first full-connection layer into a Relu activation function;
odense2=max(odense1,0)
wherein o isdense2An output vector representing the Relu layer;
(3) second full connection layer: converting the output vector of the Relu layer into an output vector with the same dimensionality as the number of the types of the predicted entities;
and 5: performing similarity calculation on the output vectors of the positive example sentence pairs and the negative example sentence pairs obtained in the step 4, and splicing the calculation results into a brand-new similarity matrix M according to rowssim(ii) a And optimizing the embedding encoder f in the step 3 by using a contrast loss function l through back propagation and a gradient descent algorithmkThe parameter (1). The specific operation is as follows:
5-1 vector of outputs of DNNSimilarity calculation is carried out through a similarity function sim (·), and the normal case similarity of similar sentences is obtainedNegative example similarity of dissimilar sentences Then r is+And r-Aggregating according to rows to obtain a similarity matrix Msim:
5-2 the similarity of the positive and negative example sentence pairs in the vector representation space is measured using the following comparison loss function l:
wherein tau is a hyper-parameter, the function of tau is to adjust the similarity to an input magnitude of a conformity function, exp (-) represents an exponential function with a natural constant e as a base, and sum (-) represents that matrix elements are added in rows;
5-3 optimization of the embedding encoder f by the contrast loss function l through back propagation, gradient descent algorithmkThe parameter (1) of (1);
wherein f iskUpdating theta by using momentum moving average modekAnd deposit the pass f using a queuekThe encoded mini-batch data (key) is stored in the queue as the current sentence siNegative example sentence pairThe data of the current mini-batch will enter the queue, the data of the earliest past mini-batch will be removed from the queue, and the momentum moving average mode is as follows:
θk←mθk+(1-m)θq
wherein m is momentum;
FIG. 2 is a flow chart of an embedding encoder optimization for self-supervised learning;
step 6: obtaining sentences formed by words of the marked entity types, constructing a data set, and further dividing the data set into a training set and a testing set;
and 7: building a named entity recognition model based on self-supervision learning, wherein the named entity recognition model comprises a main network and a correction module which are sequentially cascaded as shown in FIG. 3; then, training the main network by using a training set, testing the trained main network by using a testing set, and finally correcting the output result of the tested main network by using a correction module;
a) the main network comprises an optimized embedding encoder f in the step 5kA bidirectional LSTM layer and a CRF layer;
1) optimized embedding encoder fkUsed for encoding each word in the sentence into a word embedding vector; inputting complete sentences and outputting word embedding vectors of each word in the sentences;
2) the bidirectional LSTM layer is used for learning dependency information among words; the input is a word embedding vector, and the output is a word embedding vector containing dependency information among words;
LSTM has only one hidden state h compared with RNNtOne more cell state c for LSTMtThe hidden state and the cell state can store all valid information at and before time t. The LSTM implements protection and control of information through three gate control units, which are an input gate, a forgetting gate, and an output gate, respectively. The first step of LSTM is to discard some information reserved in the long sequence training process, and the step is completed by a forgetting gate which reads ht-1And xtAnd outputting a value between 0 and 1 by the sigmod activation function, wherein 0 represents complete abandoning, 1 represents complete reserving, and the forgetting gate is calculated as follows:
ft=σ(Wfht-1+Ufxt+bf)
wherein xtIs an embedding encoder fkOutput word-embedded vector, ht-1Hidden state of LSTM at time t-1, WfAnd UfAre respectively a forgetting door middle ht-1And xtWeight matrix of bfTo forget the offset vector of the gate, σ (-) denotes the sigmod activation function, ftIs the output of the forgetting gate;
the second step is to update the cell state ct. In updating ctPreviously it was necessary to determine which information needs to be updated by means of the input gate and to determine the candidate update content (candidate value vector z) by means of a tanh layer. The calculation mode through the input gate and the tanh layer is similar to that of the forgetting gate, and the calculation mode is as follows:
it=σ(Wiht-1+Uixt+bi)
z=tanh(Wzht-1+Uzxt+bz)
wherein WiAnd UiAre respectively an input gate ht-1And xtWeight matrix of biIs the offset vector of the input gate,itis the output of the input gate; wzAnd UzRespectively h in the candidate value vectort-1And xtWeight matrix of bzAn offset vector that is a vector of candidate values;
cell state c is then updated by matrix dot productt:
ct=ft⊙ct-1+it⊙z
Wherein |, indicates a dot product operation of the matrix;
the last step through LSTM is to update the hidden state ht. Update htThe cell state c needs to be determinedtThe state h is updated by processing the tanh layer to obtain a value between-1 and 1, and multiplying the value by the output point of the output gatetThe output gate is similar to the forgetting gate and the input gate in calculation mode.
ot=σ(Woht-1+Uoxt+bo)
ht=ot⊙tanh(ct)
Wherein WoAnd UoRespectively in the output gatet-1And xtWeight matrix of boIs an offset vector of the output gate, otIs the output of the output gate;
for many sequence tagging tasks, it makes sense to access both past and future information, whereas the hidden state of the one-way LSTM can only obtain information from the past. In order to obtain both past and future information, a bi-directional LSTM is used. The output of the bi-directional LSTM is the score of each token belonging to each class of tags, which needs to be normalized by softmax after calculation:
wherein gamma isiNormalized result of the label score representing the ith token, xiA label score vector representing the ith token, n being the size of the label category;
3) a CRF layer for further correcting the recognition result; inputting an output vector of a bidirectional LSTM layer, and outputting an entity label of each word;
CRF is a typical discriminant model that functions to further correct recognition results. For the named entity recognition task, some meaningless characters may exist in the output result, and the model does not consider the dependency relationship between the labels. The CRF can reasonably combine the context information to extract the dependency relationship between the labels, so that the identified entity meets the labeling rule;
in CRF, there are two very important scores, namely an emulsion score and a Transition score. Wherein the emision score is derived from the output of the bi-directional LSTM model, specifically predicting for each token a score for each class of labels; while Transition score is the probability of Transition from a certain class of tags to another, the Transition matrix is the Transition probability that can be trained to change the internal tags. With the occurrence score and the Transition score, the Path score Path of the current output sequence can be calculated, as shown in the formula:
Ti,j=emi+transi,j
em thereiniAnd transi,jAn emulation score of the ith token and a Transition score, T, of the label transferred from the ith token to the jth token in a sentence, respectivelyi,jThe sum of the number of the emulation score and the Transition score in a sentence. CRF training is performed by:
wherein, PathrealPath score for the correct Path in the training process, PathiFor the path score of the ith possible path, loss represents the loss function of the CRF layer;
b) the correction module comprises a phrase retrieval module and an entity type modification module, as shown in FIG. 4;
1) a phrase retrieval module for obtaining the potential entity set of the main network input item and screening out the public knowledge mapThe potential entity of (a); the potential entity set comprises words and phrases formed by a plurality of words;
the input is a sentence, and the output is a potential entity set PE in the sentence; the specific steps of the retrieval are as follows:
i. finding out The permutation and combination of all word group synthesis phrases in The sentence, for example, The sentence "The European Commission" can obtain The set Pe ═ The, { The, European, Commission, The European, European Commission, The European Commission };
inputting each potential entity in the set Pe obtained in the step i into the public knowledge graphIf the potential entity and the entity type corresponding to the potential entity can be retrieved from the public knowledge graph, adding the potential entity and the entity type into a potential entity set PE;
the set of potential entities PE, for example { The European Commission: Organization, … };
2) the entity type modification module is used for receiving the potential entity set PE output by the phrase retrieval module and the entity type labels output by the main network, then comparing the entity type labels output by the main network with the entity types corresponding to all potential entities in the main network input items in the potential entity set PE, if the entity types are consistent, the modification is not needed, and if the entity types are not consistent, the output result of the main network is modified;
and 8: and the named entity recognition of the text is realized by utilizing a tested named entity recognition Model (MBBCD) based on self-supervised learning.
The performance evaluation of the invention adopts a Conll2003 English public data set, and the following table shows the data volume condition of the data set:
number of articles | Number of sentences | Number of words | |
Training set | 946 | 14987 | 203621 |
Development set | 216 | 3466 | 51362 |
Test set | 231 | 3684 | 46435 |
The data set comprises four entity types, namely place name, person name, organization name and other entities, and the entity labeling method adopts a BIO labeling method: the BIO notation specifies that all named entities begin with the B label, I denotes inside the named entity, O denotes outside the named entity, e.g., if a word in the corpus is labeled B/I-XXX, B/I denotes that the word belongs to the beginning or inside of the named entity, i.e., the word is part of the named entity, and XXX denotes the type of the named entity. The following table shows the specific distribution of the number of entities in the training set, development set, and test set in the data set:
place name | Name of a person | Organization name | Other entities | |
Training set | 7140 | 6600 | 6321 | 3438 |
Development set | 1837 | 1842 | 1341 | 922 |
Test set | 1668 | 1617 | 1661 | 702 |
In step 7, the DBpedia English knowledge graph is adopted to correct the entity type recognition caused by the word abbreviation in the output result, and the following table is the entity recognition result of the invention on the test set:
in the entity recognition result table, CNN is used for character-level coding, Glove is used for providing pre-trained word vectors, and a named entity recognition Model (MBBCD) based on self-supervised learning is the named entity recognition method based on self-supervised learning provided by the invention. Precision, Recall and Micro-F1 are adopted as performance evaluation indexes of entity identification in the experiment. The marking method in the named entity simultaneously determines the entity boundary and the entity type, and only when the entity boundary and the entity type are simultaneously and accurately marked, the identification result of the current entity is correct. Based on the data True Posives (TP), False Posives (FP) and False Negatives (FN), the Precision rate (Precision), Recall rate (Recall) and F1 value (F1-score) of the named entity recognition task can be calculated. TP is defined to correctly identify entity boundaries and entity types, FP is defined to correctly identify entities but entity boundaries or entity types are misjudged, and FN is defined to be entities that should be identified but are not actually identified.
According to the definition of Precision: for a given data set, the accuracy rate is the ratio of the number of samples with correct classification to the total number of samples, and the calculation mode of the accuracy rate in the named entity recognition task can be obtained:
according to the definition of Recall rate Recall: the recall ratio is used for explaining the calculation mode of the recall ratio in the named entity recognition task which is available in the ratio of the positive cases to the total proportion and judged to be true in the classifier:
according to the definition of the Micro-average F1 value Micro-F1: the Micro-F1 value is a harmonic average index of the precision rate and the recall rate and is a comprehensive index for balancing the influence of the precision rate and the recall rate. Therefore, the calculation mode of the Micro-F1 value can be obtained:
Claims (9)
1. a named entity recognition method based on self-supervision learning is characterized by comprising the following steps:
step 1: preprocessing the data set;
1-1, forming words and conjunctions with entity types marked in a data set into sentences;
1-2 the sentence s of step 1-1iTranslating into sentence a in arbitrary languagesiThen sentence a is repeatediBy a anditranslation into regular sentence in same language
Step 2: constructing positive and negative example sentence pair sets of the sentences processed in the step 1, wherein the positive example sentence pair sets areSet composition, negative example sentence pair setThe negative example sentence pair consists of an original sentence and sentences translated from other sentences in the corpus;
and step 3: respectively carrying out embedding coding on the sentences in the positive example sentence pairs and the negative example sentence pairs by using an embedding coder;
and 4, step 4: embedding the embeded words into vectors and inputting the embedded vectors into a deep neural network layer DNN;
and 5: performing similarity calculation on the output vectors of the positive example sentence pairs and the negative example sentence pairs obtained in the step 4, and splicing the calculation results into a brand-new similarity matrix M according to rowssim(ii) a And optimizing the embedding encoder f in the step 3 by using a contrast loss function l through back propagation and a gradient descent algorithmkThe parameter (1) of (1);
step 6: obtaining sentences formed by words of the marked entity types, constructing a data set, and further dividing the data set into a training set and a testing set;
and 7: establishing a named entity recognition model based on self-supervision learning, wherein the named entity recognition model comprises a main network and a correction module which are sequentially cascaded; then, training the main network by using a training set, testing the trained main network by using a testing set, and finally correcting the output result of the tested main network by using a correction module;
the main network comprises an optimized embedding encoder f in the step 5kA bidirectional LSTM layer and a CRF layer;
the correction module comprises a phrase retrieval module and an entity type modification module; the phrase retrieval module is used for acquiring a potential entity set of the main network input item and screening out the potential entity set existing in the public knowledge graphThen constructing the potential entity and the entity type into a potential entity set PE; the potential entities comprise words and phrases formed by a plurality of words; the entity type modification module is used for receiving the potential entity set PE output by the phrase retrieval module and the entity type label output by the main network, then comparing the entity type output by the main network with the entity type corresponding to each potential entity in the main network input item in the potential entity set PE, if the entity type output by the main network is consistent with the entity type corresponding to each potential entity in the main network input item in the potential entity set PE, the entity type modification module does not need to modify, and if the entity type output by the main network is inconsistent with the entity type output by the main network input item in the potential entity set PE, the entity type modification module modifies the output result of the main network;
and 8: and the named entity recognition of the text is realized by utilizing a tested named entity recognition model based on self-supervision learning.
2. The method as claimed in claim 1, wherein the embedding encoder f is a named entity recognition method based on self-supervised learningq、fkInitialization parameter theta ofq、θkThe same is true.
3. The named entity recognition method based on self-supervised learning as recited in claim 1, wherein the step 3 specifically comprises:
4. The named entity recognition method based on self-supervised learning as recited in claim 1, wherein the deep neural network layer comprises a first fully-connected layer, a Relu layer and a second fully-connected layer;
(1) first fully-connected layer: will embed encoder fq、fkConverting the output embedding vector into an output vector with the same dimensionality through one layer of linear change;
odense1=Wxinput+b
wherein o isdense1Representing the output vector, xinputRepresenting an embedding vector output by an unaptimized embedding encoder, wherein W represents a weight matrix, and b represents a bias vector;
(2) relu layer: inputting the output vector of the first full-connection layer into a Relu activation function to maintain the convergence speed of the model in a stable state;
odense2=max(odense1,0)
wherein o isdense2An output vector representing the Relu layer;
(3) second full connection layer: and converting the output vector of the Relu layer into an output vector with the same dimension as the number of types of the predicted entity.
5. The named entity recognition method based on self-supervised learning as recited in claim 1, wherein the specific operations in step 5 are as follows:
5-1 vector of outputs of DNNSimilarity calculation is carried out through a similarity function sim (·), and the normal case similarity of similar sentences is obtainedNegative example similarity of dissimilar sentences Then r is+And r-Aggregating according to rows to obtain a similarity matrix Msim:
5-2 the similarity of the positive and negative example sentence pairs in the vector representation space is measured using the following comparison loss function l:
wherein tau is a hyper-parameter, exp (-) represents an exponential function with a natural constant e as a base, and sum (-) represents the addition of matrix elements in rows;
5-3 optimization of the embedding encoder f by the contrast loss function l through back propagation, gradient descent algorithmkThe parameter (1).
6. The method as claimed in claim 1, wherein the embedding encoder f is a named entity recognition method based on self-supervised learningkUpdating theta by using momentum moving average modekThe momentum moving average mode is as follows:
θk←mθk+(1-m)θq
where m is the momentum.
7. The method as claimed in claim 1, wherein the optimized embedding encoder f is a named entity recognition method based on self-supervised learningkUsed for encoding each word in the sentence into a word embedding vector; inputting complete sentences and outputting word embedding vectors of each word in the sentences;
the bidirectional LSTM layer is used for learning dependency information among words; the input is a word embedding vector, and the output is a word embedding vector containing dependency information among words;
the CRF layer is used for further correcting the recognition result; the input is the output vector of the bi-directional LSTM layer and the output is the entity type label for each word.
8. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-7.
9. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111539122.4A CN114239584A (en) | 2021-12-15 | 2021-12-15 | Named entity identification method based on self-supervision learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111539122.4A CN114239584A (en) | 2021-12-15 | 2021-12-15 | Named entity identification method based on self-supervision learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114239584A true CN114239584A (en) | 2022-03-25 |
Family
ID=80756701
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111539122.4A Pending CN114239584A (en) | 2021-12-15 | 2021-12-15 | Named entity identification method based on self-supervision learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114239584A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115688777A (en) * | 2022-09-28 | 2023-02-03 | 北京邮电大学 | Named entity recognition system for nested and discontinuous entities of Chinese financial text |
-
2021
- 2021-12-15 CN CN202111539122.4A patent/CN114239584A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115688777A (en) * | 2022-09-28 | 2023-02-03 | 北京邮电大学 | Named entity recognition system for nested and discontinuous entities of Chinese financial text |
CN115688777B (en) * | 2022-09-28 | 2023-05-05 | 北京邮电大学 | Named entity recognition system for nested and discontinuous entities of Chinese financial text |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108628823B (en) | Named entity recognition method combining attention mechanism and multi-task collaborative training | |
CN108733792B (en) | Entity relation extraction method | |
CN109902145B (en) | Attention mechanism-based entity relationship joint extraction method and system | |
CN109657239B (en) | Chinese named entity recognition method based on attention mechanism and language model learning | |
CN109726389B (en) | Chinese missing pronoun completion method based on common sense and reasoning | |
CN112733541A (en) | Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism | |
CN111738003B (en) | Named entity recognition model training method, named entity recognition method and medium | |
CN106599032B (en) | Text event extraction method combining sparse coding and structure sensing machine | |
CN109800437A (en) | A kind of name entity recognition method based on Fusion Features | |
CN111966812B (en) | Automatic question answering method based on dynamic word vector and storage medium | |
CN112052684A (en) | Named entity identification method, device, equipment and storage medium for power metering | |
CN111222318A (en) | Trigger word recognition method based on two-channel bidirectional LSTM-CRF network | |
CN111104509A (en) | Entity relation classification method based on probability distribution self-adaption | |
CN113948217A (en) | Medical nested named entity recognition method based on local feature integration | |
Deng et al. | Self-attention-based BiGRU and capsule network for named entity recognition | |
CN111753088A (en) | Method for processing natural language information | |
CN113821635A (en) | Text abstract generation method and system for financial field | |
Ye et al. | Chinese named entity recognition based on character-word vector fusion | |
CN113160917B (en) | Electronic medical record entity relation extraction method | |
CN108875024B (en) | Text classification method and system, readable storage medium and electronic equipment | |
CN112699685B (en) | Named entity recognition method based on label-guided word fusion | |
CN114239584A (en) | Named entity identification method based on self-supervision learning | |
CN115906846A (en) | Document-level named entity identification method based on double-graph hierarchical feature fusion | |
CN115600595A (en) | Entity relationship extraction method, system, equipment and readable storage medium | |
CN115600597A (en) | Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |