CN114692568A

CN114692568A - Sequence labeling method based on deep learning and application

Info

Publication number: CN114692568A
Application number: CN202210310976.3A
Authority: CN
Inventors: 贾国辉; 谢伟; 张晨; 王玮昕; 王敏; 闫凯; 张友根
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2022-07-01

Abstract

The invention discloses a sequence labeling method based on deep learning and application. The method comprises the following steps: acquiring a text to be marked, and preprocessing the text; performing text translation processing and error correction processing on the preprocessed text; constructing a rule dictionary and a regular expression, pre-labeling the text after error correction processing based on the rule dictionary and the regular expression, and outputting pre-labeled content; and inputting the pre-labeled text into a plurality of sequence labeling models based on deep learning, and calculating according to the weight values of the sequence labeling models for the output result to obtain the optimal labeling content. The invention can improve the efficiency and accuracy of sequence labeling.

Description

Sequence labeling method based on deep learning and application

Technical Field

The invention belongs to the technical field of NLP (non-line segment) natural language processing, and relates to a sequence labeling method based on a deep learning model and application thereof.

Background

In the application process of the knowledge graph, sequence tagging is an essential link, and a sequence tagging model is widely applied to the related fields of text processing, such as word segmentation, part of speech tagging, named entity recognition and the like. The conventional data labeling needs a large amount of manpower and material resources for labeling, the efficiency is low, common labeling software cannot adapt to multi-language labeling, entity pronouncing models of linguistic data cannot be well identified and extracted, common labeling tools cannot be purposefully included in word banks, the labeled linguistic data cannot be automatically labeled, the conventional labeling software has a plurality of models, but does not use multiple models for joint labeling, or only simply predicts results respectively by the plurality of models and selects a mode to determine a final labeling result, the updating of the models is greatly influenced by the labeling speed and the accuracy, and therefore the improvement of the efficiency and the accuracy of the labeling tools is quite important for the construction of knowledge maps.

Disclosure of Invention

The invention provides a sequence labeling method based on deep learning and application thereof, which can improve the efficiency and accuracy of sequence labeling.

To achieve the above object, according to a first aspect of the present invention, a sequence annotation method based on deep learning includes the steps of:

acquiring a text to be marked, and preprocessing the text;

performing text translation processing and error correction processing on the preprocessed text;

constructing a rule dictionary and a regular expression, pre-labeling the text after error correction processing based on the rule dictionary and the regular expression, and outputting pre-labeled content;

and inputting the pre-labeled text into a plurality of sequence labeling models based on deep learning, and calculating according to the weight values of the sequence labeling models for the output result to obtain the optimal labeling content.

Further, the translation process includes:

translating by adopting a seq2seq model introduced with an attention mechanism, wherein the seq2seq model comprises an encoder and a decoder, the attention mechanism enables the decoder to have different attentions on different parts when decoding, the decoder comprises a labeling sequence prediction module and a constraint module, the labeling sequence prediction module is used for outputting a plurality of prediction labeling sequences, the constraint module is used for defining a feature function set, the set comprises a plurality of feature functions, each feature function is used for judging whether the labeling sequence accords with the feature corresponding to the feature function, scores of all the feature functions in the set on the same prediction labeling sequence are respectively obtained, the scores of all the feature functions in the set on the same prediction labeling sequence are summed to be used as the final score of the prediction labeling sequence, and thus the final score of each prediction labeling sequence output by the labeling sequence prediction module is obtained, and selecting the prediction annotation sequence with the highest final score as a translation processing result.

Further, the error correction processing includes:

collecting professional name word libraries in different fields, presetting correct translation texts of each professional name word in the professional name word libraries, and directly replacing the translations of the professional names with the preset correct translation texts if the proper name words of the professional name word libraries exist in the translated texts and the translations of the professional names are different from the preset correct translation texts;

then, the text sequence processed in the last step is sent to an error correction model based on deep learning, the error correction model comprises a word vector mapping module, an error detection module, a coding module, a multi-head attention mechanism module, a fusion module and a classification module, the word vector mapping module is used for converting the input text sequence into a text vector and outputting the text vector to the error detection module, and is also used for identifying an entity in the text sequence according to a pre-constructed knowledge map and converting the identified entity into an entity vector and outputting the entity vector to the multi-head attention mechanism module, the error detection module is used for obtaining a text semantic representation tensor according to the text vector, the coding module is used for coding the text semantic representation tensor and outputting the text semantic representation tensor to the multi-head attention mechanism module, and the multi-head attention mechanism is used for outputting text embedding representation and entity embedding representation to the fusion module for feature fusion, and the classification module is used for outputting an error correction result according to the fused features.

Further, the plurality of deep learning based sequence annotation models comprises a first model, a second model and a third model;

the first model comprises an ALBERT model, a BiGRU model and a CRF model, wherein the ALBERT model is used for obtaining word vectors and extracting important text features, the BiGRU model is used for carrying out named entity recognition through deep learning context feature information, the CRF model is used for processing output sequences of the BiGRU model, and a global optimal sequence is obtained according to labels between adjacent BiGRU models by combining state transition matrixes in the CRF model;

the second model comprises a BERT model, a BilSTM model and a CRF model, wherein the BERT model is used for obtaining a word vector and extracting important text features, the BilSTM model is used for carrying out named entity recognition through deep learning context feature information, the CRF model is used for processing an output sequence of the BilSTM model, and a global optimal sequence is obtained according to labels between adjacent two states by combining a state transition matrix in the CRF model;

the third model comprises an ERNIE model, wherein in the pre-training of the ERNIE model, basic character masks are adopted firstly in the first stage, masks at the phrase level are adopted in the second stage, the model is used for predicting phrases, and masks of named entities are adopted in the third stage, so that the model is used for predicting the named entities.

Furthermore, a pooling layer is added behind the basic RNN model of both the BiGRU model and the BiLSTM model.

Further, the obtaining of the optimal labeling content according to the output results of the plurality of sequence labeling models includes:

determining a weight coefficient of each sequence labeling model;

if the output labels of all the sequence labeling models are the same, taking the output label as the optimal labeling content;

if the output labels of the labeling models with different sequences are different, multiplying the output labels of all the labeling models with the same output label by the corresponding weight coefficient, summing to obtain the summation probability value of each type of output label, and selecting the output label with the maximum summation probability value as the optimal labeling content.

Further, the weight coefficient of each sequence labeling model is determined through training.

According to a second aspect of the present invention, a deep learning based sequence annotation system comprises:

the preprocessing module is used for acquiring a text to be marked and preprocessing the text;

the translation error correction module is used for performing text translation error correction on the preprocessed text;

the rule matching module is used for constructing a rule dictionary and a regular expression, pre-labeling the translated and corrected text based on the rule dictionary and the regular expression and outputting pre-labeled content;

and the combined model labeling module is used for inputting the pre-labeled text into a plurality of sequence labeling models based on deep learning and obtaining the optimal labeling content according to the weight values of the sequence labeling models to the output result.

According to a third aspect of the present invention, the present invention provides a multi-model joint labeling, for each model, a prediction result is obtained for a predicted text, and an optimal label of a text type is determined according to a weight value of each model.

According to a fourth aspect of the invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the methods described above.

In general, compared with the prior art, the invention has the following beneficial effects: the sequence marking combines text translation, text error correction, rule matching and a plurality of deep learning models, thereby achieving the purpose of fast marking of the text.

Drawings

FIG. 1 is a flowchart illustrating a deep learning-based sequence labeling method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating a deep learning-based sequence tagging method according to an embodiment of the present invention;

FIG. 3 is a network structure diagram of a deep learning-based sequence annotation model according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating the pooling level of the sequence annotation model according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a deep learning-based sequence annotation system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

As shown in fig. 1 and fig. 2, the sequence annotation method based on deep learning according to the embodiment of the present invention includes the following steps:

step 1: and acquiring a text to be marked, and preprocessing the text.

Preferably, the training data used is wikipedia data; the data source is as follows: climbing from Wikipedia.

The obtained structured data and the unstructured data are classified, and the optimization method mainly aims at the unstructured and semi-structured data. It is directly available for structured data.

And cleaning data and carrying out data preprocessing. The pretreatment may be carried out by any method known in the art.

Step 2: and performing text translation processing on the preprocessed text.

In order to adapt to different countries and different languages, a seq2seq model introducing an Attention mechanism (Attention mechanism) is adopted for translation, and an improvement is made on the mechanism to realize text translation. The basic seq2seq model is a neural network of Encoder (Encoder) -Decoder (Decoder) architecture, whose input is a Sequence (Sequence) and output is also a Sequence (Sequence). The attention mechanism causes the decoder to pay different attention to different parts when decoding. The decoder comprises an annotation sequence prediction module and a constraint module, wherein the annotation sequence prediction module is used for outputting a plurality of prediction annotation sequences, the constraint module is used for defining a feature function set, respectively obtaining scores of all feature functions in the set on the same prediction annotation sequence, summing the scores of all feature functions in the set on the same prediction annotation sequence to serve as a final score of the prediction annotation sequence, so that the final score of each prediction annotation sequence output by the annotation sequence prediction module is obtained, and selecting the prediction annotation sequence with the highest final score as a translation processing result. The feature function set includes a plurality of feature functions, each feature function is used to determine whether the tag sequence conforms to the feature corresponding to the feature function, for example, the output value of the feature function is 0 or 1, 0 indicates that the tag sequence to be scored does not conform to the feature, and 1 indicates that the tag sequence to be scored conforms to the feature.

The improvement point of the invention lies in the optimization of the output sequence of the seq2seq model. The output of the existing seq2seq model is greedy decoding, the greedy decoding is the maximum probability value of the prediction output of each selected model, so that the rationality of a text sequence after the whole text is output is not considered, and only the result of the current text output is considered. Therefore, the method optimizes the decoding process, sequentially selects k probability values each time, takes k outputs obtained this time as the input of the next time to participate in the prediction of the next time, finally obtains a plurality of groups of sequence outputs, and finally selects a group of output with the maximum overall probability value from the plurality of groups of sequence outputs as the final output of the model, but the rationality of the output sequence, namely the grammar of the statement, is not considered in the final output. For example, in chinese, the text translation result is "i have a meal today", but the maximum probability value predicted by the model is "i have a meal today", which is a preposition and should not be placed at the beginning of a sentence as a subject, and a grammar error phenomenon is obviously generated in the sentence. After the use condition is limited, according to the result of part-of-speech tagging, the finally output sequence can slowly learn that the non-normative output is unreasonable, so that the output of the sequence is reduced, and the mutual translation process of various countries is better completed. The model trains dozens of countries, and can complete the 30-language inter-translation tasks of Chinese English, Chinese German, Chinese-Japanese, Japanese-English, Chinese-French, Chinese-Russian, Japanese-French, German and Russian, and Chinese Arabic, English-Spanish and Chinese-Spanish, so that the use limitation of the labeling device is greatly reduced, and people in different countries can use the labeling device.

The translation process is specifically implemented as follows:

seq2Seq is divided into an encoder and a decoder, in the encoder, a text is firstly divided into words, the words are converted into word vectors after being divided into words and are represented by Vi, the word vectors and the word vectors are input with a hidden state h (i-1) at the previous moment according to a time sequence, a hidden state hi is output at each moment, the transformation of an RNN hidden layer can be represented by a function f, hi is f (Vi, h (i-1)), the input of the RNN at the current moment at the previous moment h (i-1) can be understood to obtain the result of the current hidden state, t words are assumed, the hidden state at each moment is finally reserved by the encoder, after the encoder finishes encoding an input sequence, the initial state of the decoder is S0, and the last state of the encoderThe state h, attention mechanism is to calculate the weight correlation between the current state St of the decoder and all states h1, h2, h3 of the encoder, and let the c vector encoded by the encoder and each output in the decoding process of the decoder perform weighting operation, the weight a_ijThere are many ways to calculate (c), given i, a_ijA probability distribution at time j ═ 1,2,3 … T, a_ij＝softmax(e(S_i-1,h_i))，S_iFor the hidden states of the decoder at time i-1, it is assumed that each hidden state of the encoder is h_jAnd the calculation mode of the vector c at the ith moment is as follows:

vi (representing a word vector, which is simply understood to mean a sequence vector for converting a word into a model input, e.g., apple [0.20, 0.30, 0.1, -0.21, -0.23, -0.43, -0.34, 0.234, 0.456, … … ]), hi (representing an output representation of the word vector at the ith time point, which is obtained preliminarily by calculation of the encoder, and there is a word input model at each time point), RNN is a structure in seq2seq, W (a parameter matrix, which is initially set by itself and is updated by back propagation through the model), and e is a distance calculation function.

In a traditional decoding mode, Beam Search is adopted, a word of top K is selected at each moment as the output of the moment, the word is used as the input of the next moment one by one to participate in the prediction of the next moment, then the first K top values with the maximum probability value are selected from K x L (L is the size of a word list) results as the output of the next moment, and the like. At the last moment, top 1 is selected as the final output, which does not effectively reduce the output of the error sequence.

Therefore, in the embodiment of the present invention, a sequence X ═ is received in the process of using Beam Search (X)₁,X₂,X₃...), and outputs the target sequence Y ═ (Y)₁,Y₂,..Y_n) In this process, a characteristic function f (X, i, y) is used_i,y_i-1) Wherein X represents an input sequence, e.g. "Bob drank coffee at Starbucks ", i denotes the current node, y_iPredicted value of current node, y_i-1Representing the predicted value of the previous node, wherein the function is related to the context of the node, namely, the function is related to the node i and the node i-1, the part of speech is labeled by utilizing the existing ltp tool, and the part of speech of each word is marked as follows: there are many possible alternative labeling sequences for "Bob (noun) drift (verb) coffee (preposition) at (noun). (noun, verb, noun, preposition, noun) as a tag sequence, called l, for example l can also be: (noun, verb, preposition, noun). In so many alternative annotation sequences, the most reliable one is chosen as the annotation for that word. How to judge whether a tagged sequence is reliable or unreliable? If each annotation sequence is scored, a higher score represents that the annotation sequence is more reliable, and at least, whether the annotation sequence after the verb or the annotation sequence of the verb appears in the annotation is given a negative score. The verb followed by the verb is a feature function. A set of feature functions may be defined, which is used to score a tag sequence and from this to select the most reliable tag sequence. That is, each feature function can be used to score a tag sequence, and the final score value of the tag sequence is obtained by combining the scores of all feature functions in the set for the same tag sequence.

Given that X (X1, X2., Xn) and Y (Y1, Y2., Yn) are all random variable sequences represented by linear chains, if a conditional probability distribution P (Y | X) of a random variable sequence Y constitutes a conditional random field given a random variable sequence X, P (Y | X) constitutes a conditional random field_i|X,Y1,...,Y_i-1,Y_i+1,....,Yn)＝P(Y_i|X,Y_i-1,Y_i+1)

i

1,2, n (only one side is considered when i 1 and n).

Wherein the content of the first and second substances,

in the labeling problem, X represents the input observation sequence, Y represents the corresponding output marker sequence or state sequence, t_kAnd s_lIs a characteristic function, λ_kAnd mu_lIs the corresponding weight. Z (x) is a normalization factor, which is performed over all possible output sequences when summing.

t_kIs a feature function defined on an edge, called a transition feature, which depends on the current and previous positions, t (y)_i-1,y_iX, i) express "from the last node y given an observation x_i-1Transfer to this node y_iThe case of (1) ".

s_lIs a feature function defined on a node, called a state feature, expressing "current node y" depending on the current position_iIs not the case for flag x.

t_kAnd s_lAre both position dependent and are local feature functions. In general, the characteristic function t_kAnd s_lThe value is 1 or 0; and when the characteristic condition is met, the value is 1, otherwise, the value is 0. Conditional random fields are completely defined by the characteristic function t_k，s_lAnd corresponding weight lambda_k，μ_lAnd (4) determining. A sentence s (i.e., a sentence to be part-of-speech tagged); i, used for representing the ith word in the sentence s; l_iThe word which indicates the labeling sequence to be scored and labels the ith word is represented; l_i-1And the part of speech of the label sequence to be scored to label the i-1 th word is represented.

For example, the following steps are carried out: suppose there is a feature function f in the feature function set₁And f₂。f₁And f₂Is defined as follows. When l is_iIs a "adverb" and the ith word ends with a "ly", let f₁Other cases f1₁Is 0. Weight λ of f1 feature function₁Should be positive. And lambda₁The larger the representation, the more likely it is to employ tagging sequences that tag words ending with "ly" as "adverbs".

f₂(s，i，l_i，l_i-1)＝1

If i is 1, l_iVerb, and sentence s is "? "at the end, f₂

Other cases f1

₂0. Likewise, λ₂Should be positive, and λ₂The larger the representation, the more likely it is to adopt the tagging sequences that tag the first word of a question as a "verb". After the constraint condition, some sequences with obvious errors in the translation result of the model are deleted and cannot be output, if the constraint condition does not exist, the wrong output sequences are very likely to be output according to the probability value predicted by the model, and thus, compared with the original model, the sequence output effect is obviously improved.

And step 3: and performing text error correction processing on the translated text.

Because different characteristics, especially proper nouns, exist for different national languages, the effect of model translation is poor, so that for the phenomenon, the marking device is added with a text error correction function, and the error correction can be carried out by using the text error correction function according to the translation result.

Further, the error correction process includes the steps of:

firstly, professional noun libraries in different fields are collected, correct translation texts of each professional noun in the professional noun libraries are preset, if the professional nouns in the professional noun libraries exist in the texts after translation processing, if the translations of the professional nouns are different from the preset correct translation texts, the preset correct translation texts are directly used for replacing the translations of the professional nouns.

Then, the text sequence processed in the previous step is sent to an error correction model based on deep learning, the error correction model comprises a word vector mapping module, an error detection module and a coding module, the system comprises a multi-head attention machine module, a fusion module and a classification module, wherein the word vector mapping module is used for converting an input text sequence into a text vector and outputting the text vector to the error detection module, the word vector mapping module is also used for identifying an entity in the text sequence according to a pre-constructed knowledge graph and converting the identified entity into an entity vector and outputting the entity vector to the multi-head attention machine module, the error detection module is used for obtaining a text semantic representation tensor according to the text vector, the coding module is used for coding the text semantic representation tensor and outputting the text semantic representation tensor to the multi-head attention machine module, the multi-head attention machine module is used for outputting a text embedded representation and an entity embedded representation to the fusion module for feature fusion, and the classification module is used for outputting an error correction result according to the fused features.

The error correction model improves the existing Soft-MaskedBert model, adopts a text error correction method fusing knowledge map information, and the knowledge map is a reticular knowledge base and is formed by entities through relational linking. The method describes concepts, entities and relations among the concepts and the entities in the objective world in a structured form, expresses information of the Internet into a form which is closer to the human cognitive world, and enables a model to understand vocabulary entry collocation among text semantics more comprehensively due to fusion of related information of a knowledge map, so that the probability of wrong vocabulary entry collocation is reduced.

The expression form of the knowledge graph is G (E, R, S), wherein E is { E1, E2, …, E | E | } represents an entity set in a knowledge base, R is { R1, R2, …, R | E | } represents a relation set in the knowledge graph, S is E multiplied by R multiplied by E represents a triple set in the knowledge graph, the traditional BERT can better dig out semantic information of text data for text error correction, knowledge graph information is added in the method, word embedding and knowledge embedding are combined, a model learns the complete semantics of a semantic knowledge unit, and the error correction performance of the model is improved.

Setting the input term sequence as w₁，w₂，…，w_nAnd n is the length of the term sequence, and the entity vector sequence identified by the word vector mapping module is marked as { e }₁，e₂，…，e_mWhere m represents the length of the entity sequence. Then will { w₁，w₂，…，w_nAnd { e } and₁，e₂，…，e_mas the input of the encoder, respective vectors w are obtained through the encoder_1o，w2_o，…，w_noAnd { e }_1o，e_2o，…，e_moThese are characteristic of the relevant tasks. Then will { w_1o，w2_o，…，w_noAnd { e } and_1o，e_2o，…，e_moand inputting the data into a fusion module to obtain fusion characteristics. The fused features are then input to a decoding module based on a multi-head attention mechanism.

Because Soft-MaskedPert has a mask, the method can be simply understood as performing a masking operation on an entity in a knowledge graph, for example, in a music app, a singer "Wujunyu" sings a song in 17 years old rainy season ", and the two data resources are in a" singing "relationship; in the video app, a television station 'Hunan satellite television' makes a first-file entertainment program 'deformeter', and the two data resources are in a 'making' relationship. The relation exists in the storage process of the knowledge graph, and a certain relation exists between the entities in the graph, so that data with the relation can be emphasized when model masks are learned, the error type belonging to 'entry collocation' is adopted in text error correction, the entities and the entities have the relation through the fusion of the knowledge graph, and the phenomenon of entry collocation error can be reduced.

And 4, step 4: and constructing a rule dictionary and a regular expression, pre-labeling the replaced text based on the rule dictionary and the regular expression, and outputting pre-labeled content.

For the recommended label, different colors are displayed on the page of the labeling tool, technicians can also perform matching first to modify the text, and then the text is sent to the labeling tool after modification, and the text is redisplayed on the labeling page.

Further, the pre-labeling the replaced text based on the rule dictionary and the regular expression comprises: after the data are sent to a labeling tool, different dictionaries are constructed according to different tasks, firstly, hard matching is carried out on texts based on the dictionaries, a hard matching principle is adopted, long character strings are preferred, the probability of manual withdrawal after matching is reduced, text characters based on the hard matching of the dictionaries are equivalently marked directly, 15% of time can be saved in a text labeling task when the texts are labeled in the text labeling task, the effect is obvious, secondly, matching based on a regular expression is carried out, the entity characters with the same rule only need to be displayed in different colors based on the rule, in addition, the number of times of the same entity appears is set through a threshold value, after the threshold value is exceeded, the same entity after software is automatically labeled, and finally, the labeling workload of 10% is saved after manual determination. And further setting a threshold, setting a confidence threshold aiming at the entity which appears repeatedly in the client labeling process, and prompting the entity when the confidence exceeds a certain threshold.

And 5: and inputting the pre-labeled text into a plurality of sequence labeling models based on deep learning, and calculating the weight value of the output result according to the sequence labeling models to obtain the optimal labeling content.

Preferably, the plurality of deep learning based sequence annotation models comprises a first model, a second model and a third model. The first model and the second model are shown in fig. 3.

The first model comprises an ALBERT model, a BiGRU model and a CRF model, and consists of 3 BERT modules, BiGRU modules and CRF modules, wherein the ALBERT model is used for acquiring word vectors and extracting important text features; then, named entity identification is carried out through BiGRU deep learning context feature information; and finally, processing the output sequence of the BiGRU by the CRF layer, and obtaining a global optimal sequence according to the labels between adjacent layers by combining the state transition matrix in the CRF.

The second model comprises a BERT model, a BilSTM model and a CRF model, which are similar to the previous model and are also three modules, wherein the effect generated by each module is similar, and finally, an optimal sequence is obtained through the output of the model.

The third model includes an ERNIE model. In the pretraining of the ERNIE model, a mask of a basic level is adopted at first in the first stage, namely, a word in a Chinese text is randomly covered, a mask of a phrase level is adopted in the second stage, so that the model predicts phrases, entities such as names of people, places and organizations are named in the third stage, the entities are covered in the third stage, and the entities are learned by the model after the training is finished.

Wherein, in the first model and the second model, BilSTM and BiGRU are improved. Both BilSTM and BiGRU added a pooling layer behind the base RNN model, as shown in FIG. 4. This pooling layer can be effective because, after a feature is found, its exact position is much less important than its relative position to other features, and the pooling layer can continually reduce the spatial size of the data, and thus the number and amount of calculations of parameters can also be reduced, which in turn controls the overfitting to some extent.

The method for obtaining the optimal labeling content according to the output results of the sequence labeling models comprises the following steps: determining a weight coefficient of each sequence labeling model; if the output labels of all the sequence labeling models are the same, taking the output labels as the optimal labeling content; if the output labels of the labeling models with different sequences are different, multiplying the output labels of all the labeling models with the same output label by the corresponding weight coefficient, summing to obtain the summation probability value of each type of output label, and selecting the output label with the maximum summation probability value as the optimal labeling content.

The optimal result obtained by combining the multiple models becomes an automatic labeling result, aiming at different data, the models can be compared based on the result of manual labeling and the result of self labeling, and for different models, a threshold value is determined for different texts, and the models can send the texts into different models according to the probability of the previous predicted result. The model is selected according to the mode of combining the previous models, the average is calculated and the like, and the method combines three models.

The weight coefficient of each sequence labeling model is determined through training. The method comprises the following steps: and for each sequence labeling model, calculating the error value of each round of prediction labels and the artificial labeling result, averaging the error values of each round, calculating the reciprocal of the average, and dividing the reciprocal by 100 to obtain a quotient as a weight coefficient of the sequence labeling model. And acquiring corresponding weight coefficients of the three sequence labeling models by adopting the method. The weight coefficients are changed according to the training rounds.

The result of the manual labeling and the result of the model labeling have an error value, the model has a predicted value for the input text x, and each model has a weight value W, for example, W is (0.2,0.34,0.04), if the prediction of each model for the label is calculated, and if the prediction of the three models is O label, the final result is three models W1 (probability of predicted value O) +2 (probability of predicted value O) + W3 (probability of predicted value O), if the prediction results of the three models are different, the result with the smallest value of the weight value x probability is selected, the same addition is subtracted by different difference values, W1 (probability of predicted value I) -W2 (probability of predicted value O) + W3 (probability of predicted value O), because there may be various prediction probability values for the output of the model, so if three models predict three labels, the predicted values of the three models to the three labels are obtained respectively, so that the weight value of each model to different labels is solved, the final predicted values of the three labels are obtained respectively, the three models respectively compare whether the calculation result is greater than 0, and the prediction result is selected.

The deep learning model ALBERT/BERT + BiGRU/BilSTM + CRF and ERNIE of the invention have obvious effect on saving labeling time, including a model training phase and a knowledge entity recognition phase.

For the model training phase, based on the characteristics of the ALBERT/BERT model, the ALBERT/BERT takes a single character as input, the beginning and the end of a sentence are added with a start identifier CLS and an end identifier SEP, and the output is embedding of each word. The model framework mainly utilizes the word embedding function of a pre-training model ALBERT to perform fine adjustment of ALBERT/BERT, ERNIR is similar to the two models and is also adjusted based on the BERT model.

The input of BiGRU/BiLSTM is the embedding output of ALBERT/BERT, and a layer of project _ layer is added in the middle to ensure that the output is [ batch _ size, num _ length, num _ tags ]. The batch _ size is the size of batch in the model, num _ length is the length of the input sentence, num _ tags is the number of sequence labels, that is, the score of each word on each tag is output.

The CRF layer is used for restricting the output of the BiGRU, and if the maximum fraction of each word of the BiGRU at each tag is directly used as the output, the sequence of [ B-NAME, O, I-NAME, O, I-Location ] can appear, which obviously does not meet the practical situation. The CRF layer may add some constraints to ensure that the final prediction results are valid. Training by the CRF layer results in a transition probability score, the probability value for a transition from one label to another.

Processing the obtained json file by the marking software, processing cleaning data, converting the data into a BIO form, wherein B represents the beginning part of the knowledge entity, I represents the non-initial character of the knowledge entity, [ START ] represents the sentence head label of the text, and [ END ] represents the sentence tail label of the text.

According to different numbers of rounds, the effect of the model can be correspondingly improved until the model approaches a stable state, and the following is described as an index:

the precision ratio is as follows: TP/(TP + FP), recall: r — TP/(TP + FN), F1 score: f1-score 2 × P × R/(P + R), TP: positive samples, predicted as positive samples, FP: negative samples, predicted as positive samples, FN: positive samples, predicted negative samples, TN: negative examples, predicted as negative examples.

The multi-model combination is characterized in that the mode predicted by the two models is not simply taken to be valued, but a weight value of each model is obtained through results of manual work and model training, the results predicted by the models are judged through the weight values, the models are combined, each model has a predicted value for an input text, and the results of the predicted values are compared to calculate the optimal result.

The sequence labeling system based on deep learning of the embodiment of the present invention, as shown in fig. 5, includes:

the rule matching module is used for constructing a rule dictionary and a regular expression, pre-labeling the translated and corrected text based on the rule dictionary and the regular expression and outputting pre-labeled contents;

The implementation principle and technical effect of the system are similar to those of the method, and are not described herein again.

The embodiment also provides an electronic device, which includes at least one processor and at least one memory, where the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the sequence labeling method based on deep learning, and the specific steps refer to the method embodiments and are not described herein again; in this embodiment, the types of the processor and the memory are not particularly limited, for example: the processor may be a microprocessor, digital information processor, on-chip programmable logic system, or the like; the memory may be volatile memory, non-volatile memory, a combination thereof, or the like.

The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement any of the above technical solutions of the embodiments of the sequence labeling method based on deep learning. The implementation principle and technical effect are similar to those of the above method, and are not described herein again.

It should be noted that in any of the above embodiments, the methods are not necessarily executed in sequential order, but as long as it cannot be assumed from the execution logic that they are necessarily executed in a certain order, it means that they can be executed in any other possible order.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A sequence labeling method based on deep learning is characterized by comprising the following steps:

acquiring a text to be marked, and preprocessing the text;

2. The deep learning-based sequence annotation method of claim 1, wherein the translation process comprises:

3. The deep learning-based sequence labeling method of claim 1, wherein the error correction process comprises:

collecting professional noun libraries in different fields, presetting correct translation texts of each professional noun in the professional noun libraries, and if the translated texts contain the professional nouns in the professional noun libraries and the translations of the professional nouns are different from the preset correct translation texts, directly replacing the translations of the professional nouns with the preset correct translation texts;

sending the text sequence processed in the last step into an error correction model based on deep learning, wherein the error correction model comprises a word vector mapping module, an error detection module, a coding module, a multi-head attention mechanism module, a fusion module and a classification module, the word vector mapping module is used for converting an input text sequence into a text vector and outputting the text vector to the error detection module, and is also used for identifying an entity in the text sequence according to a pre-constructed knowledge map and converting the identified entity into an entity vector and outputting the entity vector to the multi-head attention mechanism module, the error detection module is used for obtaining a text semantic representation tensor according to the text vector, the coding module is used for coding the text semantic representation and outputting the text semantic representation tensor to the multi-head attention mechanism module, and the multi-head attention mechanism is used for outputting a text embedded representation and an entity embedded representation to the fusion module for feature fusion, and the classification module is used for outputting an error correction result according to the fused features.

4. The deep learning-based sequence annotation method of claim 1, wherein the plurality of deep learning-based sequence annotation models comprises a first model, a second model, and a third model;

5. The method as claimed in claim 4, wherein the BiGRU model and the BilSTM model are added with a pooling layer after the base RNN model.

6. The method for sequence annotation based on deep learning of claim 1, wherein the obtaining of the optimal annotation content according to the output results of the plurality of sequence annotation models comprises:

determining a weight coefficient of each sequence labeling model;

7. The deep learning-based sequence annotation method of claim 6, wherein the weight coefficient of each sequence annotation model is determined by training.

8. A deep learning based sequence annotation system, comprising:

and the combined model labeling module is used for inputting the pre-labeled text into a plurality of sequence labeling models based on deep learning and obtaining the optimal labeling content according to the weight values of the sequence labeling models for the output result.

9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any of claims 1 to 7 when executing the computer program.

10. A storage medium on which a computer program is stored, which computer program, when being executed by a processor, carries out the method according to any one of claims 1 to 7.