CN113204970A - BERT-BilSTM-CRF named entity detection model and device - Google Patents

BERT-BilSTM-CRF named entity detection model and device Download PDF

Info

Publication number
CN113204970A
CN113204970A CN202110631994.7A CN202110631994A CN113204970A CN 113204970 A CN113204970 A CN 113204970A CN 202110631994 A CN202110631994 A CN 202110631994A CN 113204970 A CN113204970 A CN 113204970A
Authority
CN
China
Prior art keywords
crf
bert
layer
model
bilstm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110631994.7A
Other languages
Chinese (zh)
Inventor
彭涛
王上
姚田龙
包铁
张雪松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202110631994.7A priority Critical patent/CN113204970A/en
Publication of CN113204970A publication Critical patent/CN113204970A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a BERT-BilSTM-CRF named entity detection model, belonging to the technical field of named entity identification, which comprises the following steps: the IDCNN-CRF named entity recognition model and the BERT-BilSTM-CRF named entity recognition model are structured as follows: the Embdding layer is a Word vector layer and is used for processing input data into Word vectors and sending the Word vectors into the model, and the Word2Vec is represented by adopting distributed vectors; and the IDCNN layer is used for sending the word vectors or the word vectors processed by the embedding layer into the IDCNN layer, and recalculating the input word vectors through the expansion convolution operation of the expansion convolution neural network to obtain new vector representation. According to the BERT-BilSTM-CRF named entity detection model and the device, the BilSTM-CRF model is used as a reference, an IDCNN-CRF model and a BERT-BilSTM-CRF model are constructed by using a data set labeled in the "people's daily newspaper" of Beijing university and a MSRA named entity identification data set of Microsoft's academy of research, so that the accuracy and the operating efficiency of named entity identification are improved, and the model training time is shortened.

Description

BERT-BilSTM-CRF named entity detection model and device
Technical Field
The invention relates to the technical field of named entity recognition, in particular to a BERT-BilSTM-CRF named entity detection model and a device.
Background
The current named entity recognition task is mainly to capture words or phrases in the input text and to perform qualitative classification. Several methods for naming entity recognition tasks have been proposed and can be generally classified into three categories. The first type is named recognition of entities in texts according to a rule method, the second type is named entity recognition by using feature engineering based on a traditional statistical machine learning method, and the third type is named entity recognition by automatically extracting features of text information based on a deep learning method.
For the Chinese named entity recognition task, the rule-based method seriously depends on the structure of the rule, the transportability is poor, and the maintenance cost is high; the method based on statistical learning needs to be implemented on feature engineering, and a large amount of labor cost is consumed; the named entity recognition based on deep learning automatically extracts features, the process of manually designing the features is omitted, and the effect of the deep learning model is better along with the improvement of computer hardware, so that the named entity recognition based on deep learning has important research value.
With the explosive growth of data, how to process massive data and extract effective information become the most important problem at present, and the named entity recognition technology can automatically extract key entity information from massive text data. Aiming at the problem, the invention realizes a BERT-BilSTM-CRF named entity detection model and a device.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
To solve the above technical problem, according to an aspect of the present invention, the present invention provides the following technical solutions:
a BERT-BilSTM-CRF named entity detection model comprises an IDCNN-CRF named entity recognition model and a BERT-BilSTM-CRF named entity recognition model:
the IDCNN-CRF named entity recognition model architecture is as follows:
the Embdding layer is a Word vector layer and is used for processing input data into Word vectors and sending the Word vectors into the model, and the Word2Vec is represented by adopting distributed vectors;
the IDCNN layer is used for sending the word vectors or the word vectors processed by the embedding layer into the IDCNN layer, and recalculating the input word vectors through the expansion convolution operation of the expansion convolution neural network to obtain new vector representation;
the projection layer is used for carrying out linear conversion on vector representations calculated from the IDCNN layer neural network, the converted dimensions are the dimensions of the tags and are consistent with the dimensions of the tags, then Softmax normalization processing is carried out to obtain a probability p, the probability representations of the m-dimensional word vectors are combined to obtain vectors if the mapped vector representation dimensions are m dimensions, each dimension vector can be regarded as the probability of each class of tags, the class with the maximum probability is selected to obtain a classification result, and then the named entity identification task can be completed;
the CRF layer is used for screening out the optimal result through the transfer matrix and feeding the optimal result back to the user;
the BERT-BilSTM-CRF named entity recognition model is structured as follows:
the BERT layer is used for inputting sentences composed of single characters, and after the BERT processes the text sequence and obtains vector representation of each character, the BERT layer is used as the input of the next layer of BilSTM;
and in a BilSTM-CRF layer, the text sequence is subjected to BERT processing to obtain vector representation of a corresponding BERT pre-training word vector, the vector representation enters a BilSTM unit, the output result of the BilSTM is calculated, the result is sent to the CRF, and the optimal sequence label is calculated.
As a preferred scheme of the BERT-BilSTM-CRF named entity detection model, the invention comprises the following steps: the Embdding layer obtains the dependency relationship of upper and lower characters by training a large-scale corpus, and sends pre-trained 100-dimensional wikipedia word vectors and 20-dimensional word segmentation characteristics as input into the next layer.
As a preferred scheme of the BERT-BilSTM-CRF named entity detection model, the invention comprises the following steps: the CRF layer combines the result obtained by deep learning with a statistical learning model, maintains a matrix by using the CRF, transfers the probability between labels, converts the label of m dimension into (m +2) × (m +2), and the two extra dimensions represent the beginning and the end of the state, and corrects the invalid label by learning the rule of label conversion according to the change of the two parameters.
As a preferred scheme of the BERT-BilSTM-CRF named entity detection model, the invention comprises the following steps: the beginning of the sentence in the BERT layer is marked by cls, the end of the sentence is represented by sep, and the input of the BERT is formed by combining a word vector, a segment vector and a position vector.
As a preferred scheme of the BERT-BilSTM-CRF named entity detection model, the following steps are included: and calculating the semantic representation of the current word and the left word by the forward LSTM of the BilTM in the BilSTM-CRF layer, calculating the semantic representation of the current word and the right word by the backward LSTM, and splicing the obtained state representations of the two hidden layers to obtain an output result of the BilTM.
As a preferred scheme of the BERT-BilSTM-CRF named entity detection model, the following steps are included: the algorithm is realized by the following main formula:
Figure BDA0003104060550000031
Figure BDA0003104060550000032
Figure BDA0003104060550000033
Figure BDA0003104060550000034
a BERT-BilSTM-CRF named entity detection device comprises:
the information extraction module is used for extracting entity information and semantic relations between entities;
the information extraction module is connected with the information retrieval module and is used for screening out information related to the keywords through the query of the keywords, identifying the entity type of the keywords by using a named entity, classifying the text information and reducing the retrieval range;
the information retrieval module is connected with the machine translation module and used for identifying entity information of a translation target and analyzing the lexical method by using a translation rule;
and the machine translation module is connected with a question-answering system, and the question-answering system searches answers of the questions by matching the relation between the keywords and the entities and feeds back the result output to the user.
Compared with the prior art: according to the BERT-BilSTM-CRF named entity detection model and the device, the obtained F1 value of the constructed IDCNN-CRF named entity recognition model is respectively 10.4% and 11.41% higher than that of a baseline model CRF on a data set of ' people's daily report ' and an MSRA data set, is 0.38% and 2.07% higher than that of a BilSTM-CRF model, is shortened by nearly 30% in training time, and obviously improves the operation efficiency; the F1 value of the constructed BERT-BilTM-CRF model in Chinese named entity recognition is improved by 5.29 percent and 6.7 percent in two data sets compared with the BilTM-CRF model and is improved by 4.91 percent and 4.63 percent compared with the IDCNN-CRF model.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the present invention will be described in detail with reference to the accompanying drawings and detailed embodiments, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise. Wherein:
FIG. 1 is a diagram of IDCNN-CRF model architecture according to the present invention;
FIG. 2 is a view of the internal structure of the LSTM of the present invention;
FIG. 3 is a BilSTM model architecture of the present invention;
FIG. 4 is a schematic diagram of a conditional random field of a linear chain according to the present invention;
FIG. 5 is a representation of the BERT input of the present invention;
FIG. 6 is a diagram of IDCNN convolution operations according to the present invention;
FIG. 7 is a graph comparing accuracy for different models of the present invention;
FIG. 8 is a graph comparing recall in different models of the present invention;
FIG. 9 is a graph of F1 values versus values for various models of the present invention;
FIG. 10 is a comparison graph of the results of the daily report experiments of the people of the present invention;
FIG. 11 is a graph comparing the results of the MSRA experiments of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described herein, and it will be apparent to those of ordinary skill in the art that the present invention may be practiced without departing from the spirit and scope of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The basic algorithm adopted by the invention is as follows:
one, long and short term memory network (LSTM)
Long and short term memory is a type of neural network with memory, and thus it shows very good performance in solving timing problems and natural language processing. The appearance of the LSTM solves the problems of gradient extinction and gradient explosion in the recurrent neural network to a certain extent. The LSTM network itself is not only a sequence structure, but also a chain-cycle network structure, so the LSTM can handle long-term dependencies.
LSTM solves the problem of short-term memory because it introduces a core structure, here we call memory cell Ct. In addition, the LSTM has three control gates, input gate itOutput gate otAnd forget door ft. Wherein the input gate controls the information to be able to be input to the current network configuration and the output gate controls the information to be output as the current network. In the long and short time memory network LSTM, the most important thing is to forget the gate. The memory before forgetting to decide is removed and the current information is retained.
Adding an internal state (internal state) c in the LSTM neural network structuretFor linear transmission of information on the one hand and for nonlinear output of information to an external state h of the hidden layer on the other handtAs shown in formula (1) and formula (2).
ct=ftct-1+ittanh(Wxcxt+Whcht-1+bc) (1)
ht=ottanh(ct) (2)
Wherein f ist,itAnd otThree gates (gates) respectively, which are mainly used for setting the transmission of information; c. Ct-1The memory unit at the previous moment; by internal state ctThe LSTM neural network can obtain the history information from the start of each time t to the current time from the neural network at each time t.
In the LSTM neural network, a gating mechanism (gating mechanism) is introduced into the memory unit c to control the information transmission, as shown in fig. 1. Three gates in formula (3), formula (4) and formula (5) are input gates itForgetting door ftAnd output/input gate ot
The value range of the gate in the LSTM neural network is (0, 1), which means that the information is passed through according to a certain proportion. The roles of the three gates in the LSTM network are: forget door ftControlling how much information the internal state of the previous moment needs to be forgotten; input door itControlling how much information of the candidate state at the current moment needs to be stored; output gate otControlling the internal state c at the present momenttHow much information is output to external state ht. The transformation between the three "gates" is as follows.
it=σ(Wxixt+Whiht-1+Wcict-1+bi) (3)
ft=σ(Wxfxt+Whfht-1+Wcfct-1+bf) (4)
ot=σ(Wxoxt+Whoht-1+Wcoct+bo) (5)
σ () is a logistic function with an interval of (0, 1), W is a shape weight matrix, xtFor input at the current time, ht-1The external state at the previous time. When f ist=0,itWhen the number of the input gates is 1, the information of the forgetting gate is deleted, the information of the input gate is completely reserved, the memory unit clears the history information, and the information is written into the candidate state vector. When f ist=1,itWhen the value is equal to 0, the memory unit writes the history information and does not update new information.
Two-way long-short term memory network (BilSTM)
The BiLSTM includes a forward LSTM layer, which is generally considered to be a sequential sequence, and a backward LSTM layer, which is defined to be a reverse sequence, as shown in fig. 1. For information x input at time ttThe output of the hidden layer in the forward LSTM layer is denoted as
Figure BDA0003104060550000071
The output of the hidden layer in the backward LSTM layer is represented as
Figure BDA0003104060550000072
As shown in fig. 2, the output of the SLTM of the second layer
Figure BDA0003104060550000073
All the information before is fused, and the formula is shown as (6):
Figure BDA0003104060550000074
wherein Wh (1)And Wx (1)As a weight matrix of the first layer network, b(1)Is a bias vector. Similarly, the output of the LSTM of the first layer
Figure BDA0003104060550000075
The future information is fused, and the formula is shown as (7):
Figure BDA0003104060550000076
and the output h of the current hidden layer under the whole network frameworktThe two are combined to obtain the formula shown in (8):
Figure BDA0003104060550000077
wherein, the symbol | | | represents the end-to-end concatenation of the vectors.
Conditional Random Field (CRF)
Conditional random fields are probabilistic undirected graph models based on conditional probabilities. In named entity recognition, a given text input sequence X ═ X (X)1,x2,...,xn) N is the number of words,xiThe i-th word representing the text input sequence, then Y ═ Y1,y2,...,yn) Is an output sequence and the set of Y is T ═ { B-PER, I-PER, B-LOC, I-LOC, B-ORG, I-ORG, O }, the output sequence Y isiAnd input sequence xiAnd correspond to each other. And under the condition that the given input sequence X takes the value of X, the conditional probability that the output sequence Y takes the value of X is p (Y | X), and the conditional probability p (Y | X) is the conditional random field. As shown in formula (9) and formula (10):
Figure BDA0003104060550000081
Figure BDA0003104060550000082
wherein, tk()、sl() Representing transition matrices, lambda, as characteristic functionsk、μlIs the matrix weight parameter, and S (x) is the normalization factor.
X=(x1,x2,...,xn) And Y ═ Y1,y2,...,yn) This one-to-one correspondence of CRFs having the same structure is a random field for the linear chain elements, as shown in fig. 4.
The model theory of the IDCNN-CRF named entity recognition model and the BERT-BilSTM-CRF named entity recognition model is described as follows:
IDCNN-CRF model theory
The expansion convolution neural network has no pooling, so that information loss while the dimension of the up-down sampling is kept the same is avoided. Thus, the receptive field is not reduced. Compared with the common convolutional neural network, the expansion convolutional neural network increases the expansion width parameter. The expansion width refers to the expansion size of the convolution operation, and the input matrix ignores the middle part according to the step size, so that a wider receptive field can be obtained.
The IDCNN framework is shown in fig. 6, where black dot positions represent the characteristic information of the convolution kernel, and outer squares represent the receptive field of the convolution operation, denoted by f.
The first plot in fig. 6 has an expansion width of 1, corresponding to a conventional convolutional neural network, and the receptive field f is 3 × 3 — 9; the second plot is a dilation convolution operation with dilation width 2, with the convolution kernel size unchanged, or 3 x 3, with the dilation width increased and data in the middle of the step ignored, resulting in an expanded field range of f 7 49.
In named entity recognition, each word x of a text sequenceiThe mapping matrix W is made by concatenating over a sliding window, i.e. the expansion widthcAnd obtaining output Z of t words after convolution operationtAs shown in equation 11:
Figure BDA0003104060550000083
wherein x isiRepresenting the ith word in the text sequence, k representing the number of iterations of the dilated convolutional neural network, and symbol
Figure BDA0003104060550000091
Indicating a coupling operation. When the expansion width omega is more than 1, the expansion convolutional neural network does not act on continuous text sequences any more, the text sequence range is expanded according to the expansion width, the characteristic information between far words is obtained, and the characteristic information and the affine matrix W are further combinedcAnd (3) connecting to obtain the output result of the expansion convolution operation, as shown in an equation 12:
Figure BDA0003104060550000092
as can be seen from fig. 6, IDCNN designs two layers, where the expansion width of the first layer is 1, and all feature information between adjacent characters is taken; the expansion width of the second layer is 2, and the character between the sliding windows f 3 and f 9 can be taken, so that the dependency relationship of the context characteristic information can be captured. The dilated convolution output sequence h of the first layernAs shown in equation 13:
Figure BDA0003104060550000093
wherein the content of the first and second substances,
Figure BDA0003104060550000094
first layer of a convolutional layer having an expansion width of 1
Figure BDA0003104060550000095
The convolution layer output of the j-th layer is
Figure BDA0003104060550000096
r () is a ReLU activation function, the last layer outputs feature information of all words, the dilation width is set to 1, and the output formula is shown as 14:
Figure BDA0003104060550000097
and finally, taking the convolution operations of the two layers as a whole, and obtaining the final output result of the text sequence after IDCNN iterates for k times.
BERT-BilSTM-CRF model theory
The BERT has an internal structure of a double-layer bidirectional Transformer, integrates characteristic information of the left side and the right side of a character, constructs a complete context environment, and provides two unsupervised pre-training tasks: mask language model (Mask LM) and Next Sentence Prediction (NSP). The two tasks capture word features and sentence features respectively and then combine the results.
Mask language model
The bi-directional LSTM prediction context information is the concatenation of forward and backward information, for example:
changchun is the province meeting of Jilin, when the Jilin is predicted, only Changchun can be seen from left to right, only province meeting can be seen from right to left, and the Changchun and the province meeting are not really seen. BERT solves the problem by means of Mask, and really realizes a bidirectional language model. The principle of Mask LM is to cover Jilin, predict with Changchun and province, the context information links together, get "Jilin". The method of masking by BERT is to randomly select 15% of words in a training sample, and among the words, 80% of the words may Mask the word, 10% of the words may be replaced by any word, and 10% of the words may remain the same.
Prediction of next sentence
In the Mask language model, BERT establishes word-level dependency, while NSP aims to learn
Learning sentence-level dependencies, such as question-answering systems, requires associations between sentences. The method for performing NSP by BERT is as follows: dividing training samples into two classes, and constructing a normal sentence pair A-B by using 50% of linguistic data, wherein B is the next sentence of A and is marked as isNext; and (B) constructing abnormal language sequence pairs by using 50% of the linguistic data, and selecting a random sentence in the linguistic data and marking the random sentence as NotNext. And (4) predicting whether B is the next sentence of A or not by classifying the relation of the sentences to represent the relation of sentence levels.
Experimental results and analysis:
IDCNN-CRF model experiment
The transverse comparison experiment for constructing the IDCNN-CRF model under different experimental parameters mainly comprises the following three aspects: whether the influence of the pre-training word vector on the experimental result, the influence of the dimension of the word vector on the experimental result and the influence of the learning rate on the experimental result are loaded or not. Experiments are carried out in different data sets by adjusting different parameters, an optimal parameter combination suitable for the experiments is found out, and the longitudinal comparison experiments of different models are laid.
Loading pre-training word vector comparisons
In order to compare the influence of the pre-training Word vector on the experimental results, the large-scale wiki encyclopedia corpus is trained by using Word2Vec in the Gensim toolkit to obtain the pre-training Word vector. And fixing other parameters, optimizing the objective function by using an Adam algorithm, wherein the dimension of the word vector is 100 dimensions, the dimension of the hidden layer is 128 dimensions, the clip is set to be 5, the learning rate is 0.001, and the dropout is 0.5. On the basis, the influence of whether the pre-training word vector is loaded or not is compared, and the experimental result is shown in table 1.
TABLE 1 Loading of Pre-training word vector test results
Figure BDA0003104060550000111
As can be seen from Table 1, the random Word vector has uncertainty, the accuracy in the data set of the "people's daily newspaper" is higher than that of the loaded pre-training Word vector, and in the comparison of the recall rate and the F1 value, the Word2Vec Word vector is higher than that of the initialized random Word vector, and is respectively improved by 0.78% and 0.73%; in the MSRA data set, the accuracy, recall rate and F1 value of Word2Vec Word vectors are higher than those of random Word vectors, and are respectively improved by 0.95%, 1.62% and 2.18%, so that pre-training Word vectors are loaded in subsequent experiments.
Word vector dimension comparison
In order to select the proper word vector dimension, the model is trained from 50 dimensions, 100 dimensions and 200 dimensions, and the obtained experimental results are shown in table 2.
TABLE 2 different word vector dimension experimental results
Figure BDA0003104060550000112
Figure BDA0003104060550000121
As can be seen from table 2, the word vector dimension of 100 dimensions has better effect than other dimensions under the premise of fixing other parameters. The dimensionality is less than 100, the parameters of model training are too few, the fitting capability of the model is weak, the effect is reduced along with the increase of the dimensionality of the word vector, and the over-fitting phenomenon is easy to occur. Therefore, it is appropriate to select a 100-dimensional word vector.
Comparison of learning rate
The learning rate determines the weight updating of the model, directly influences the convergence state of the model, compares the experimental results of training the model with different learning rate sizes, and selects the optimal learning rate parameter.
TABLE 3 learning rates of different sizes
Figure BDA0003104060550000122
As can be seen from table 3, the learning rate is preferably 0.001, and the subsequent model learning rate can be set to 0.001.
Comparison with the baseline model
Training a deep learning model by comparing three parameters of pre-training word vectors, the size of the dimension of the word vectors and the size of the learning rate
And selecting a group of optimal parameters to train different models for comparison, wherein the experimental results of comparison with the base line model CRF are as follows:
TABLE 4 comparison of experimental results with baseline model
Figure BDA0003104060550000123
Figure BDA0003104060550000131
As can be seen from Table 4, the combination of the deep learning method and the statistical method has a better effect than the traditional statistical method, wherein the F1 value in the data set of the "people's daily report" is improved by 10.4%, and the F1 value in the MSRA data set is improved by 11.41%. The IDCNN-CRF model has excellent performance in the named entity recognition task
First, BERT-BilSTM-CRF model experiment
1, configuration of experimental parameters
Through previous experiments, experimental parameters of the BERT-BilSTM-CRF model based on BERT-Base can be determined as shown in Table 5:
TABLE 5 Experimental parameters
Figure BDA0003104060550000132
2. Comparative analysis with other models
In order to make a model perform comprehensive evaluation, on the premise of ensuring the same parameters, comparing the model with different models longitudinally, and comparing and analyzing the accuracy, recall rate and F1 value of three types of entities (name of person, name of place and name of organization), the experimental results are shown in table 6:
table 6 results of different model experiments
Figure BDA0003104060550000141
Table 6 shows the performance of different models in three types of entities, and according to the above experimental results, histograms are plotted, which respectively show the accuracy, recall rate, and F1 value of different models in the three types of entities. The accuracy of the different models is shown in fig. 7.
As can be seen from the figure, the accuracy of the BERT-BilSTM-CRF model on three entities, namely the human name, the place name and the organization structure name, is higher than that of other models, and is respectively 5.94%, 8.99% and 4.64% higher than that of the IDCNN-CRF model. Referring to FIG. 8, recall rates for different models are shown.
As can be seen from the figure, the recall rate of the BERT-BilSTM-CRF model on three entities, namely the human name, the place name and the organization structure name, is higher than that of other models, and is respectively 3.79 percent, 6.25 percent and 4.08 percent higher than that of the IDCNN-CRF model.
As in fig. 9, F1 values for different models are shown.
As can be seen from the figure, the F1 value of the BERT-BilSTM-CRF model is higher than that of other models in three entities of human name, place name and organization structure name, and is respectively 4.67%, 7.63% and 4.36% higher than that of the IDCNN-CRF model.
In conclusion, the performance of the BERT-BilSTM-CRF model designed in this chapter is better than that of other models, and the introduction of the BERT pre-training word vector has a remarkable improvement effect. The effectiveness of the BERT-BilSTM-CRF model in the named entity recognition task is illustrated. From the perspective of an entity, the LOC effect is weaker than that of the ORG and the PER, and the LOC effect is caused by a large number of influence factors such as place name nesting, abbreviation, semantic divergence and the like of organization names.
The above analysis is based on the entity, comparing the effects of different models, and now based on the overall effect of the models, comparing the performances of different models on the data set of the "daily report of people" and the MSRA data set, and the experimental results are shown in table 7.
TABLE 7 Overall comparison test results of different models
Figure BDA0003104060550000151
In order to clearly see the level of the model performance, the results are shown in bar chart form as shown in fig. 10 and fig. 11.
As can be seen from the figure, the BERT-BilSTM-CRF model has higher accuracy, recall rate and F1 value evaluation indexes than other models. The data set of the ' people's daily newspaper ' is respectively 5.2 percent, 5.93 percent and 4.91 percent higher than the IDCNN-CRF model, and the data set of the MSRA is respectively 4.01 percent, 6.98 percent and 4.63 percent higher than the IDCNN-CRF model. The effectiveness of the BERT-BilSTM-CRF model is verified from the aspect of the whole model.
While the invention has been described above with reference to an embodiment, various modifications may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In particular, the various features of the disclosed embodiments of the invention may be used in any combination, provided that no structural conflict exists, and the combinations are not exhaustively described in this specification merely for the sake of brevity and resource conservation. Therefore, it is intended that the invention not be limited to the particular embodiments disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (7)

1. A BERT-BilSTM-CRF named entity detection model, comprising: IDCNN-CRF named entity recognition model and BERT-BilSTM-CRF named entity recognition model, which are characterized in that,
the IDCNN-CRF named entity recognition model architecture is as follows:
the Embdding layer is a Word vector layer and is used for processing input data into Word vectors and sending the Word vectors into the model, and the Word2Vec is represented by adopting distributed vectors;
the IDCNN layer is used for sending the word vectors or the word vectors processed by the embedding layer into the IDCNN layer, and recalculating the input word vectors through the expansion convolution operation of the expansion convolution neural network to obtain new vector representation;
the projection layer is used for carrying out linear conversion on vector representations calculated from the IDCNN layer neural network, the converted dimensions are the dimensions of the tags and are consistent with the dimensions of the tags, then Softmax normalization processing is carried out to obtain a probability p, the probability representations of the m-dimensional word vectors are combined to obtain vectors if the mapped vector representation dimensions are m dimensions, each dimension vector can be regarded as the probability of each class of tags, the class with the maximum probability is selected to obtain a classification result, and then the named entity identification task can be completed;
the CRF layer is used for screening out the optimal result through the transfer matrix and feeding the optimal result back to the user;
the BERT-BilSTM-CRF named entity recognition model is structured as follows:
the BERT layer is used for inputting sentences composed of single characters, and after the BERT processes the text sequence and obtains vector representation of each character, the BERT layer is used as the input of the next layer of BilSTM;
and in a BilSTM-CRF layer, the text sequence is subjected to BERT processing to obtain vector representation of a corresponding BERT pre-training word vector, the vector representation enters a BilSTM unit, the output result of the BilSTM is calculated, the result is sent to the CRF, and the optimal sequence label is calculated.
2. The BERT-BilSt-CRF named entity detection model as claimed in claim 1, wherein the Embdding layer obtains the dependency relationship between upper and lower characters by training a large-scale corpus, and feeds pre-trained 100-dimensional wikipedia word vectors and 20-dimensional participle features as input into the next layer.
3. The model of claim 1, wherein the CRF layer combines the results of deep learning with the statistical learning model, maintains a matrix using CRF, shifts the probability between labels, converts m-dimensional labels to (m +2) (m +2), and corrects invalid labels by learning the rule of label conversion according to the change of the two parameters.
4. The model of claim 1, wherein the beginning of a sentence in the BERT layer is labeled with cls, the end and separation of the sentence are represented by sep, and the input of BERT is composed of three parts of a word vector, a segment vector and a position vector.
5. The model of claim 1, wherein forward LSTM of the BilTM in the BilTM-CRF layer computes semantic representations of the current word and its left word, backward LSTM computes semantic representations of the current word and its right word, and the state representations of the two hidden layers are concatenated to obtain the output result of the BilTM.
6. The BERT-BilSTM-CRF named entity detection model of claim 1, wherein the algorithm implements the main formula:
Figure FDA0003104060540000021
Figure FDA0003104060540000022
Figure FDA0003104060540000023
Figure FDA0003104060540000024
7. a BERT-BilSTM-CRF named entity detection device is characterized by comprising:
the information extraction module is used for extracting entity information and semantic relations between entities;
the information extraction module is connected with the information retrieval module and is used for screening out information related to the keywords through the query of the keywords, identifying the entity type of the keywords by using a named entity, classifying the text information and reducing the retrieval range;
the information retrieval module is connected with the machine translation module and used for identifying entity information of a translation target and analyzing the lexical method by using a translation rule;
and the machine translation module is connected with a question-answering system, and the question-answering system searches answers of the questions by matching the relation between the keywords and the entities and feeds back the result output to the user.
CN202110631994.7A 2021-06-07 2021-06-07 BERT-BilSTM-CRF named entity detection model and device Pending CN113204970A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110631994.7A CN113204970A (en) 2021-06-07 2021-06-07 BERT-BilSTM-CRF named entity detection model and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110631994.7A CN113204970A (en) 2021-06-07 2021-06-07 BERT-BilSTM-CRF named entity detection model and device

Publications (1)

Publication Number Publication Date
CN113204970A true CN113204970A (en) 2021-08-03

Family

ID=77024155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110631994.7A Pending CN113204970A (en) 2021-06-07 2021-06-07 BERT-BilSTM-CRF named entity detection model and device

Country Status (1)

Country Link
CN (1) CN113204970A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468344A (en) * 2021-09-01 2021-10-01 北京德风新征程科技有限公司 Entity relationship extraction method and device, electronic equipment and computer readable medium
CN113569016A (en) * 2021-09-27 2021-10-29 北京语言大学 Bert model-based professional term extraction method and device
CN113723104A (en) * 2021-09-15 2021-11-30 云知声智能科技股份有限公司 Method and device for entity extraction under noisy data
CN113851190A (en) * 2021-11-01 2021-12-28 四川大学华西医院 Heterogeneous mRNA sequence optimization method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670179A (en) * 2018-12-20 2019-04-23 中山大学 Case history text based on iteration expansion convolutional neural networks names entity recognition method
CN111199152A (en) * 2019-12-20 2020-05-26 西安交通大学 Named entity identification method based on label attention mechanism
CN111209738A (en) * 2019-12-31 2020-05-29 浙江大学 Multi-task named entity recognition method combining text classification
CN111339318A (en) * 2020-02-29 2020-06-26 西安理工大学 University computer basic knowledge graph construction method based on deep learning
CN112101009A (en) * 2020-09-23 2020-12-18 中国农业大学 Knowledge graph-based method for judging similarity of people relationship frame of dream of Red mansions
CN112182243A (en) * 2020-09-27 2021-01-05 中国平安财产保险股份有限公司 Method, terminal and storage medium for constructing knowledge graph based on entity recognition model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670179A (en) * 2018-12-20 2019-04-23 中山大学 Case history text based on iteration expansion convolutional neural networks names entity recognition method
CN111199152A (en) * 2019-12-20 2020-05-26 西安交通大学 Named entity identification method based on label attention mechanism
CN111209738A (en) * 2019-12-31 2020-05-29 浙江大学 Multi-task named entity recognition method combining text classification
CN111339318A (en) * 2020-02-29 2020-06-26 西安理工大学 University computer basic knowledge graph construction method based on deep learning
CN112101009A (en) * 2020-09-23 2020-12-18 中国农业大学 Knowledge graph-based method for judging similarity of people relationship frame of dream of Red mansions
CN112182243A (en) * 2020-09-27 2021-01-05 中国平安财产保险股份有限公司 Method, terminal and storage medium for constructing knowledge graph based on entity recognition model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蒋翔 等: "基于BiLSTM-IDCNN-CRF模型的生态治理技术领域命名实体识别", 《计算机应用与软件》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468344A (en) * 2021-09-01 2021-10-01 北京德风新征程科技有限公司 Entity relationship extraction method and device, electronic equipment and computer readable medium
CN113468344B (en) * 2021-09-01 2021-11-30 北京德风新征程科技有限公司 Entity relationship extraction method and device, electronic equipment and computer readable medium
CN113723104A (en) * 2021-09-15 2021-11-30 云知声智能科技股份有限公司 Method and device for entity extraction under noisy data
CN113569016A (en) * 2021-09-27 2021-10-29 北京语言大学 Bert model-based professional term extraction method and device
CN113569016B (en) * 2021-09-27 2022-01-25 北京语言大学 Bert model-based professional term extraction method and device
CN113851190A (en) * 2021-11-01 2021-12-28 四川大学华西医院 Heterogeneous mRNA sequence optimization method

Similar Documents

Publication Publication Date Title
CN108733792B (en) Entity relation extraction method
CN109657239B (en) Chinese named entity recognition method based on attention mechanism and language model learning
CN111444726B (en) Chinese semantic information extraction method and device based on long-short-term memory network of bidirectional lattice structure
CN108984526B (en) Document theme vector extraction method based on deep learning
CN108460013B (en) Sequence labeling model and method based on fine-grained word representation model
CN113204970A (en) BERT-BilSTM-CRF named entity detection model and device
CN109871541B (en) Named entity identification method suitable for multiple languages and fields
CN106569998A (en) Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN110287323B (en) Target-oriented emotion classification method
CN112541356B (en) Method and system for recognizing biomedical named entities
CN111753058B (en) Text viewpoint mining method and system
CN111914556B (en) Emotion guiding method and system based on emotion semantic transfer pattern
CN111274794B (en) Synonym expansion method based on transmission
CN113626589B (en) Multi-label text classification method based on mixed attention mechanism
CN117076653B (en) Knowledge base question-answering method based on thinking chain and visual lifting context learning
CN114492441A (en) BilSTM-BiDAF named entity identification method based on machine reading understanding
CN114510946B (en) Deep neural network-based Chinese named entity recognition method and system
CN114153973A (en) Mongolian multi-mode emotion analysis method based on T-M BERT pre-training model
CN116258137A (en) Text error correction method, device, equipment and storage medium
CN114417851A (en) Emotion analysis method based on keyword weighted information
CN116796744A (en) Entity relation extraction method and system based on deep learning
CN113076718B (en) Commodity attribute extraction method and system
CN117094325B (en) Named entity identification method in rice pest field
CN116522165A (en) Public opinion text matching system and method based on twin structure
CN115906846A (en) Document-level named entity identification method based on double-graph hierarchical feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210803