CN110210035B

CN110210035B - Sequence labeling method and device and training method of sequence labeling model

Info

Publication number: CN110210035B
Application number: CN201910481021.2A
Authority: CN
Inventors: 李正华; 黄德朋; 张民
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2023-01-24
Anticipated expiration: 2039-06-04
Also published as: CN110210035A

Abstract

The application discloses a sequence labeling method, a sequence labeling device, a sequence labeling model training method, a sequence labeling model training device and a computer readable storage medium, wherein the scoring layers of the sequence labeling model in the scheme comprise second scoring layers which correspond to labeling standards one by one, and first scoring layers which correspond to all the labeling standards. In addition, the output result of the model is a binding tag sequence, which is equivalent to directly obtaining tag sequences under various labeling specifications, and is convenient for the conversion of texts between different labeling specifications.

Description

Sequence labeling method and device and training method of sequence labeling model

Technical Field

The present application relates to the field of natural language processing, and in particular, to a sequence labeling method and apparatus, a training method and device for a sequence labeling model, and a computer-readable storage medium.

Background

In a natural language processing task, it is often necessary to use annotation data as a training sample of a natural language processing model, wherein the scale of the annotation data significantly affects the performance of the model. Because the construction cost of manual annotation data is very expensive, some researchers propose a scheme for realizing the enlargement of the data scale by utilizing heterogeneous data resources. However, since heterogeneous data follows different labeling specifications, the heterogeneous data cannot be directly mixed. Therefore, how to effectively utilize heterogeneous data to improve the model performance becomes a research problem.

At present, a scheme for improving model performance by using heterogeneous data is as follows: the method is similar to accumulation learning by using one data resource to generate additional characteristics on another data resource, and takes CTB and PKU as examples, firstly, model parameter parameters which are trained independently by CTB linguistic data are used, then, the linguistic data characteristics of PKU are added additionally, and the model is trained continuously by PKU linguistic data. However, since the two corpora are different in research direction and different in part-of-speech tagging specifications, noise is generated for the model, and the purpose of improving performance cannot be achieved.

Therefore, the existing scheme for training the model by using data with different labeling specifications has the problem of noise introduction, and the purpose of improving the labeling performance of the model cannot be realized.

Disclosure of Invention

The application aims to provide a sequence labeling method, a sequence labeling device, a sequence labeling model training method, a sequence labeling model training device and a computer readable storage medium, which are used for solving the problem that the improvement of the model labeling performance cannot be realized by the existing scheme of training a model by using data with different labeling specifications. The specific scheme is as follows:

in a first aspect, the present application provides a sequence annotation apparatus, including a sequence annotation model, where the sequence annotation model includes:

an input layer: the method comprises the steps of obtaining a text to be marked;

presentation layer: the system comprises a first scoring layer and a plurality of second scoring layers, wherein the first scoring layer is used for identifying the vector representation of each word of the text to be labeled;

the first scoring layer: the system is used for determining the original score of each binding label in the binding label set according to the vector representation;

the second scoring layer: the system is used for determining scores of all independent tags in a corresponding tag set of the labeling specification according to the vector representation, wherein the second score layer corresponds to the labeling specification in a one-to-one mode, and the binding tags are a group of tag combinations comprising single independent tags of all the labeling specifications;

prediction layer: determining a final score of the bundled label according to the original score of the bundled label and the scores of the individual labels corresponding to the bundled label; determining a target binding tag of the word according to the final score of each binding tag;

an output layer: and the target label sequence is used for outputting the target label sequence of the text to be labeled, and the target label sequence comprises target binding labels of all words of the text to be labeled.

Preferably, the presentation layer includes:

a first encoding unit: the system comprises a text to be labeled, a first vector and a second vector, wherein the text to be labeled comprises words of the text to be labeled;

a second encoding unit: the first bi-directional recurrent neural network is used for coding each word of the word to obtain a second vector of the word;

a representation unit: for determining a vector representation of the word from the first vector and the second vector and sending the vector representation to a first scoring layer and a plurality of second scoring layers, respectively.

Preferably, the representation unit is specifically configured to:

determining a vector representation of the word from the first vector and the second vector; encoding the vector representation of the word by using a second bidirectional cyclic neural network to obtain global information; and sending the global information to a first scoring layer and a plurality of second scoring layers respectively.

Preferably, the prediction layer is specifically configured to:

determining a final score of the binding tag according to the original score of the binding tag and the score of the independent tag corresponding to the binding tag; determining the probability of each binding tag according to the final score of each binding tag by utilizing a softmax function; determining a target binding tag for the word according to the probability of each binding tag.

Preferably, the method further comprises the following steps:

loss layer: the method is used for determining a loss value according to a target loss function, a prediction labeling result and an actual labeling result in a training process, and realizing training by adjusting model parameters, wherein the target loss function is as follows:

where k denotes the number of correct tags, y _j Scoring for correct labels after passing through the prediction layerThe probability of (c).

In a second aspect, the present application provides a training method for a sequence annotation model, which is applied to the sequence annotation model of the sequence annotation apparatus described above, and includes:

acquiring training samples with various labeling specifications, wherein the training samples comprise training texts and actual label sequences of the training texts;

inputting the training samples into a sequence labeling model to obtain a predicted tag sequence output by the sequence labeling model;

and adjusting parameters of the sequence labeling model according to the predicted label sequence and the actual label sequence until a preset termination condition is reached so as to realize the training of the sequence labeling model.

In a third aspect, the present application provides a training apparatus for a sequence annotation model, which is applied to the sequence annotation model of the sequence annotation apparatus described above, and includes:

a memory: for storing a computer program;

a processor: for executing the computer program to implement the steps of:

acquiring training samples with various labeling specifications, wherein the training samples comprise training texts and actual label sequences of the training texts; inputting the training samples into a sequence labeling model to obtain a predicted tag sequence output by the sequence labeling model; and adjusting parameters of the sequence labeling model according to the predicted label sequence and the actual label sequence until a preset termination condition is reached so as to realize the training of the sequence labeling model.

In a fourth aspect, the present application provides a computer-readable storage medium for use in a sequence annotation model of a sequence annotation apparatus as described above, the computer-readable storage medium having stored thereon a computer program for implementing, when executed by a processor, the steps of:

In a fifth aspect, the present application provides a sequence annotation method, including:

acquiring a text to be marked;

determining vector representation of each word of the text to be labeled;

according to the vector representation, determining the score of each independent label in a label set with various labeling specifications, and determining the original score of each binding label in a binding label set, wherein the binding label is a group of label combinations comprising single independent labels with various labeling specifications;

determining a final score of the binding tag according to the original score of the binding tag and the score of the independent tag corresponding to the binding tag;

and determining the target binding label of the word according to the final score of the binding label to obtain a target label sequence of the text to be labeled.

The application provides a sequence labeling method, a sequence labeling device, a training method and equipment of a sequence labeling model and a computer readable storage medium, wherein the scheme can be used for acquiring a text to be labeled and determining the vector representation of each word of the text to be labeled; according to the vector representation, determining the score of each independent label in a label set with various labeling specifications, and determining the original score of each binding label in a binding label set; then determining the final score of the binding label according to the original score of the binding label and the score of the independent label corresponding to the binding label; and finally, determining a target binding tag of the word according to the final score of the binding tag so as to obtain a target tag sequence of the text to be labeled.

It can be seen that, the scoring layers of the sequence labeling model in the scheme include scoring layers corresponding to the labeling standards one by one, and also include scoring layers corresponding to all the labeling standards, due to the unique design of the scoring layers in the model, heterogeneous data of various labeling standards can be used as a training set of the model, the scale of the training corpus is expanded, and the model can learn commonalities among corpora of different labeling standards, so that the labeling performance of the model under a single labeling standard is improved. In addition, the output result of the model is a binding tag sequence, which is equivalent to directly obtaining tag sequences under various labeling specifications, and the conversion of texts between different labeling specifications is facilitated.

Drawings

In order to clearly illustrate the embodiments or technical solutions of the present application, the drawings used in the embodiments or technical solutions of the present application will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a functional block diagram of a sequence labeling apparatus according to a first embodiment of the present application;

FIG. 2 is a schematic diagram of a binding tag in an embodiment of a sequence tagging apparatus provided in the present application;

FIG. 3 is a schematic diagram illustrating vector representations of words in an embodiment of a sequence labeling apparatus provided in the present application;

FIG. 4 is a flowchart illustrating an implementation of an embodiment of a training method for a sequence annotation model provided in the present application;

FIG. 5 is a schematic structural diagram of an embodiment of a training apparatus for a sequence annotation model provided in the present application;

fig. 6 is a flowchart illustrating a sequence tagging method according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings. It should be apparent that the described embodiments are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

At present, in order to improve the performance of a model, the scale of labeled data is often enlarged by using heterogeneous data, however, noise is introduced into the model by using the existing scheme of using heterogeneous data, and the purpose of improving the performance of the model cannot be achieved. In order to solve the problems, the application provides a sequence labeling method, a sequence labeling device, a sequence labeling model training method, sequence labeling model training equipment and a computer readable storage medium.

A first embodiment of a sequence labeling apparatus provided in the present application is described below, and the first embodiment includes a sequence labeling model. It should be noted that, in this embodiment, a deep neural network is used as the sequence labeling model, so as to avoid the disadvantages of the conventional model based on feature engineering, for example, the feature extraction process is complicated, and it is difficult to ensure the rationality of the feature template. As a specific implementation manner, the embodiment selects a BilSTM (Bidirectional Long Short-Term Memory) as a basic model.

Referring to fig. 1, the sequence labeling model specifically includes:

input layer 101: the method comprises the steps of obtaining a text to be marked;

presentation layer 102: the system is used for determining vector representations of all words of the text to be labeled and respectively sending the vector representations to a first scoring layer 103 and a plurality of second scoring layers 104;

the sequence tagging model in this embodiment may be specifically used to implement processing such as named entity recognition, word segmentation, part of speech tagging of a text to be tagged, and since the focus of this embodiment is on the process of assigning labels to each word in the text to be tagged, detailed description of operations such as named entity recognition and word segmentation is not given in this embodiment. Before the text to be labeled enters the hierarchy, each word in the text to be labeled needs to be converted into vector representation. As a specific implementation manner, in this embodiment, the word embedding vector is obtained in a pre-training manner, that is, the word vectors trained by other models are directly obtained during initialization to represent the current word, and for an unknown word that is not found in the pre-training vocabulary, the word embedding vector may be randomly generated.

The first scoring layer 103: the system comprises a vector representation module, a label matching module and a label matching module, wherein the vector representation module is used for determining the original score of each binding label in the binding label set according to the vector representation;

the second scoring layer 104: the system comprises a vector representation module, a label set module, a binding label module and a label analysis module, wherein the vector representation module is used for determining scores of all independent labels in a label set of a corresponding labeling specification according to the vector representation, the second score layer corresponds to the labeling specification in a one-to-one mode, and the binding label is a group of label combinations comprising single independent labels of all the labeling specifications;

the marking specification refers to rules and bases for marking each word of a text sentence, specifically, the words are marked in a label form, and currently known marking specifications include CTB, PKU, MSR, and the like. Taking two marking specifications of CTB and PKU as an example, the method is particularly suitable for the economic uplink of China for the same text. "the sequence labeling result obtained according to the CTB labeling specification is shown in table 1, and the sequence labeling result obtained according to the PKU labeling specification is shown in table 2, so that different sequence labeling results, that is, different tag sequences, can be obtained for the same text according to different labeling specifications. The purpose of this embodiment is to improve the labeling performance of the model by using heterogeneous data, so this embodiment is implemented based on multiple labeling specifications, and more specifically, this embodiment is implemented based on two or more labeling specifications, which labeling specification is specifically selected may be determined according to actual requirements, which is not specifically limited in this embodiment.

TABLE 1

Individual words in text

In particular

I am concerned with

State of China

Economy of production

Uplink is carried out

。

Labeling result of CTB labeling specification

AD

PN

NN

VV

PU

TABLE 2

Individual words in text

In particular

Is that

China's republic of China

Economy of production

Uplink is carried out

。

Labeling result of PKU labeling specification

d

v

n

v

w

It should be noted that, in this embodiment, tags of multiple labeling specifications are bundled to obtain a tag combination including tags in various labeling specifications, and for convenience of description, this embodiment refers to the tag combination as a bundled tag, and refers to the tags in the labeling specifications as independent tags, and a process for constructing the bundled tag is specifically shown in fig. 2. The present embodiment models on an enlarged set of bundled labels, translating a single independent label map into a set of bundled labels by considering all bundled label possibilities.

The first scoring layer 103 and the second scoring layer 104 are only used for distinguishing the two scoring layers, and do not indicate the number and the sequence. As shown in fig. 1, in this embodiment, the sequence annotation model has N +1 scoring layers in total, including a first scoring layer and a plurality of second scoring layers, where the first scoring layer is used to determine an original score of each binding label in the binding label set according to vector representation of a word, the second scoring layers are in one-to-one correspondence with the annotation specification, and the second scoring layers are used to determine a score of each independent label in the label set of the corresponding annotation specification according to vector representation of the word. As a specific embodiment, the first score layer and the second score layer may be different MLP (multi layer Preceptron) layers.

Prediction layer 105: determining a final score of the bundled label according to the original score of the bundled label and the scores of the individual labels corresponding to the bundled label; determining a target binding tag of the word according to the final score of each binding tag;

as a specific implementation manner, the embodiment sums the original scores of the binding tags and the scores of the individual tags corresponding to the binding tags, uses the sum result as the final score of the binding tags, and finally determines the target binding tags of the words according to the size relationship of the final scores of the binding tags.

Output layer 106: and the target label sequence is used for outputting the target label sequence of the text to be labeled, and the target label sequence comprises target binding labels of all words of the text to be labeled.

It should be noted that, due to the unique design of the scoring layer in the sequence labeling model of this embodiment, corpora of various labeling specifications can be selected as the training set of the model, the data scale is extended, and the model can learn commonalities between corpora of different labeling specifications, thereby improving the labeling performance of the model under a single labeling specification. That is to say, the model can implement the labeling mode corresponding to any one of the multiple labeling specifications, and the labeling performance under any one of the multiple labeling specifications is improved. Specifically, in the test process, the sequence labeling model can output a binding tag sequence of a text to be labeled, the binding tag sequence includes tag sequences of the multiple labeling specifications, and the tag sequence under any one of the multiple labeling specifications can be obtained by simply dividing the binding tag sequence.

The embodiment provides a sequence labeling device, including the sequence labeling model, the scoring layer of this model includes the scoring layer that corresponds with labeling standard one-to-one, still include the scoring layer that corresponds with all labeling standards, because the unique design of the scoring layer in this model, consequently can utilize the heterogeneous data of multiple labeling standards as the training set of this model, expand the training corpus scale, and this model can learn the commonality between the corpus of different labeling standards in addition, thereby promote the labeling performance of model under single labeling standard. In addition, the output result of the model is a binding tag sequence, which is equivalent to directly obtaining tag sequences under various labeling specifications, and is convenient for the conversion of texts between different labeling specifications.

The second embodiment of the sequence labeling apparatus provided by the present application is described in detail below, and the second embodiment is implemented based on the first embodiment and is expanded to a certain extent based on the first embodiment.

Specifically, the sequence labeling apparatus provided in the second embodiment includes a sequence labeling model, where the sequence labeling model includes: an input layer, a presentation layer, an encoding layer, a first MLP layer, a plurality of second MLP layers, a prediction layer, an output layer, and a loss layer, each of which is described below:

presentation layer: the vector representation of each word of the text to be labeled is determined;

in the conventional scheme, when a word is converted into a vector representation, an embedded vector of the word is generally directly used as the vector representation of the word, and in order to make the vector representation of the word more fully express text information, as a preferred embodiment, as shown in fig. 3, the embodiment uses a first vector and a second vector together to obtain the vector representation of the word. Specifically, in the presentation layer, the present embodiment first determines a first vector and a second vector, respectively. The first vector, namely the word embedding vector, can be obtained in a pre-training manner, and the unknown words can be obtained in a random initialization manner. For the second vector, the word vector of each word of the word is obtained by random initialization, then, as shown in fig. 3, all the word vectors are input into one layer of BiLSTM to obtain the last output of each of the two directions, and the outputs in the two directions are spliced to obtain the second vector. Since the output of the last character has learned the information of other characters, the text information can be more sufficiently represented by using this as the second vector. Finally, the embodiment splices the first vector and the second vector, and represents the spliced vector as a vector of a word.

Specifically, one text to be labeled S = { w =isgiven ₁ ,w ₂ ,...,w _n }，w _i In the presentation textN represents the number of words in the text to be annotated, w for each word _i ＝{c _{i_1} ,c _{i_2} ,...,c _{i_n} }，c _{i_j} The expression w _i The jth word in (c), m represents the number of words in the word. For w _i All the characters in (1), which the embodiment inputs into BilTM, and outputs h of the last character _lm And h _rm Is spliced to w _i Behind the corresponding word vector, we get w _i Is represented by a vector of (A) _i Then X _i Can be expressed as:

in summary, the representation layers in this embodiment specifically include:

a representation unit: for determining a vector representation of the word from the first vector and the second vector.

And (3) coding layer: encoding the vector representation of the word by using a second bidirectional cyclic neural network to obtain global information; sending the global information to a first MLP layer and a plurality of second MLP layers respectively;

specifically, the coding layer uses BilSTM to code the sentence information, and this embodiment will represent the output X of the layer _i As input of LSTM, the word w is obtained by encoding the entire sentence sequence by LSTM _i Global information h of _i The process involves the formula including:

i _i ＝σ(W _in ·[h _i-1 ,x _i ]+b _in ) (2)

f _i ＝σ(W _fg ·[h _i-1 ,x _i ]+b _fg ) (3)

o _i ＝σ(W _out ·[h _i-1 ,x _i ]+b _out ) (4)

c _i ＝f _i ·c _i-1 +i _i ·tanh(W _c ·[h _i-1 ,x _i ]+b _c ) (5)

h _i ＝o _i ·tanh(c _i ) (6)

wherein i _i ，f _i ，o _i ，c _i Respectively representing the input gate, the forgetting gate, the output gate and the output of the cell state corresponding to the ith word, x _i And h _i Representing the input and hidden layer output corresponding to the ith word. σ denotes the sigmoid activation function, and W and b are the weight and bias of the corresponding gate, respectively.

The hidden state of the LSTM is simply the information obtained from the past and never considered. In order to encode sentence information in two directions, the embodiment concatenates hidden layer outputs of two LSTM in the forward direction and the backward direction to obtain a word w _i Is represented by a BilSTM hidden state of _i ：

The first MLP layer: the system is used for determining the original score of each binding label in the binding label set according to the vector representation;

the second MLP layer: the system is used for determining the score of each independent label in the label set of the corresponding labeling specification according to the vector representation;

specifically, in this embodiment, the score of each tag is calculated by using MLP for the score layer, and the sequence labeling model has N +1 MLP layers in total, which includes: the second MLP layer is used for respectively determining the scores of the independent labels of the N kinds of labeling specifications, and the first MLP layer is used for determining the score of each binding label. Specifically, the output h of BiLSTM is converted into _i As the input of MLP, the score P of each label corresponding to each word in the sentence is obtained _i ：

Wherein, W _mlp And b _mlp The weight and bias of the MLP layer are indicated separately.

Prediction layer: determining a final score of the binding tag according to the original score of the binding tag and the scores of the independent tags corresponding to the binding tag; determining a target binding tag of the word according to the final score of each binding tag;

specifically, according to the coupling mapping relationship between the binding tag and the independent tags, the original score of the binding tag and the scores of the N independent tags corresponding to the binding tag are added to obtain the final score of the binding tag. Taking N =2 as an example, the ith word in S in a sentence is marked as a binding tag [ t ^a ,t ^b ]The score of (a) is:

wherein, score _joint (s,i,[t ^a ,t ^b ]) Indicating that the ith word in the sentence S is labeled as a joint tag [ t ] ^a ,t ^b ]Raw Score of, score _{sep_a} (s,i,[t ^a ,t ^b ]) The independent tag t in the tag set indicating that the ith word in the sentence S is marked as the first marking specification ^a Score of (1), score _{sep_b} (s,i,[t ^a ,t ^b ]) Independent tags t in the set of tags representing the second annotation specification ^b Is scored.

After the final score of the binding tag is obtained, as a specific implementation manner, in this embodiment, a Softmax function is used to normalize the scores of all the binding tags obtained through calculation, so as to obtain the probability of each binding tag, and predict the target binding tag of each word according to the probability:

wherein p is _i For normalized probability of the ith binding tag in the set of binding tags, score _i And n is the number of the binding tags in the binding tag set.

In summary, the prediction layer in this embodiment is specifically configured to: determining a final score of the binding tag according to the original score of the binding tag and the scores of the independent tags corresponding to the binding tag; determining the probability of each binding tag according to the final score of each binding tag by utilizing a softmax function; and determining the target binding label of the word according to the probability of each binding label.

Loss layer: and determining a loss value according to the target loss function, the prediction labeling result and the actual labeling result in the training process so as to adjust the model parameters to realize the training.

In the art, the cross control function is generally adopted by the model as an objective function for parameter estimation of the model, and the model performance is solved and evaluated by minimizing the objective function. Wherein the crossEntrol function is:

wherein, y _i Is the probability distribution of the correct label,

and loss is the corresponding loss between the sample result obtained by calculation and the model prediction result and is used for returning to carry out parameter estimation, so that the model is trained, and the purpose of model training is to minimize the loss.

On this basis, the present embodiment takes into account the correct part of speech due to each wordMore than one label, assume the number of labels in label Specification 1 is | T ₁ I, the number of labels labeled in Specification 2 is T ₂ L of 8230, the number of tags labeled with specification N is T _N If, then, the correct answer for each word in annotation Specification 1 is | T ₂ |*...*|T _N L, and the correct answer for each word in the labeling Specification 2 is | T ₁ |*|T ₃ |*...*|T _N I, and so on, the correct answer of each word in the labeling specification N is | T ₁ |*...|T _N-1 L, are provided. Therefore, as a preferred implementation, this embodiment proposes an improved objective function, specifically:

where k denotes the number of correct tags, y _j Probability after Softmax normalization of the score for the correct tag.

To prove the performance improvement effect of the sequence labeling model of this embodiment, the following description is made by comparing the sequence labeling model of this embodiment with the existing model:

it is assumed that an existing practical application scenario aims to improve the labeling performance of the model under the CTB labeling specification, and it is assumed that the number of the labeling specifications in this embodiment is 2, and the two labeling specifications are CTB and PKU, respectively. Then, in terms of experimental data set-up, the data set setup for the existing model is shown in table 3, and the input and output of the model is shown in table 4. It can be seen that the existing model can only use a single corpus of labeling specifications as a training set, and the input of the model only considers the vector of a word and can only output a single label sequence of labeling specifications. Referring to table 5, table 6 and table 7, the sequence annotation model of the embodiment can use corpora of various annotation specifications as a training set, so that the data scale is enlarged; the word vector and the word vector are comprehensively considered in the input of the model, the learning capability of the model to the text can be improved through better vector representation, and the performance of the model is improved; the model can output the binding tag sequence, namely the tag sequence under various labeling specifications is directly obtained, the text is convenient to convert under different labeling specifications, and the method is simple and efficient.

TABLE 3

TABLE 4

Input of existing models	Output tags for existing models
		In particular	AD
I am	PN
		State of China	NN
Economy of production	NN
		Uplink (UL)	VV
。	PU

TABLE 5

TABLE 6

TABLE 7

In summary, the sequence annotation apparatus provided in this embodiment achieves the purpose of improving the sequence annotation performance by using wide heterogeneous specification data through improving the score layer of the sequence annotation model. In addition, the model directly outputs label sequences of various labeling specifications, so that the accuracy of labeling under a single labeling specification is improved, and the conversion of texts between different labeling specifications is facilitated.

The following introduces a training method of a sequence annotation model provided in an embodiment of the present application, and the training method of the sequence annotation model described below is applied to the sequence annotation model of the sequence annotation apparatus described above.

As shown in fig. 4, the training method of the sequence labeling model includes:

step S401: acquiring training samples with various labeling specifications, wherein the training samples comprise training texts and actual label sequences of the training texts;

step S402: inputting the training samples into a sequence labeling model to obtain a predicted tag sequence output by the sequence labeling model;

step S403: and adjusting parameters of the sequence labeling model according to the predicted label sequence and the actual label sequence until a preset termination condition is reached so as to realize the training of the sequence labeling model.

Specifically, the adjustment process of the model parameters may be an automatic process. The preset termination condition for completing the model training may be that the iteration number reaches a preset maximum iteration number, or that the model training is determined to be completed when the performance of the model does not reach an expected improvement after a certain number of iterations, which is specifically determined according to actual requirements, and this embodiment is not specifically limited.

The following introduces a training apparatus of a sequence labeling model provided in an embodiment of the present application, and the training apparatus of the sequence labeling model described below is applied to the sequence labeling model of the sequence labeling apparatus.

As shown in fig. 5, the training apparatus for the sequence annotation model includes:

the memory 501: for storing a computer program;

the processor 502: for executing the computer program to implement the steps of:

acquiring training samples with various labeling specifications, wherein the training samples comprise training texts and actual label sequences of the training texts; inputting the training sample into a sequence labeling model to obtain a predicted tag sequence output by the sequence labeling model; and adjusting parameters of the sequence labeling model according to the predicted label sequence and the actual label sequence until a preset termination condition is reached so as to realize the training of the sequence labeling model.

The following describes a computer-readable storage medium provided by an embodiment of the present application, and the computer-readable storage medium described below is applied to the sequence annotation model of the sequence annotation device described above.

In particular, the computer readable storage medium has stored thereon a computer program which, when executed by a processor, is adapted to carry out the steps of:

As shown in fig. 6, the sequence annotation method provided in the embodiment of the present application is introduced as follows, and the sequence annotation method includes:

step S601: acquiring a text to be marked;

step S602: determining vector representation of each word of the text to be labeled;

step S603: according to the vector representation, determining the score of each independent label in a label set with various labeling specifications, and determining the original score of each binding label in a binding label set, wherein the binding label is a group of label combinations comprising single independent labels with various labeling specifications;

step S604: determining a final score of the binding tag according to the original score of the binding tag and the score of the independent tag corresponding to the binding tag;

step S605: and determining the target binding label of the word according to the final score of the binding label to obtain a target label sequence of the text to be labeled.

In the present specification, the embodiments are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same or similar parts between the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above detailed descriptions of the solutions provided in the present application, and the specific examples applied herein are set forth to explain the principles and implementations of the present application, and the above descriptions of the examples are only used to help understand the method and its core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A sequence annotation apparatus, comprising a sequence annotation model, wherein the sequence annotation model comprises:

the second scoring layer: the system comprises a vector representation module, a label set module, a binding label module and a label analysis module, wherein the vector representation module is used for determining scores of all independent labels in a label set of a corresponding labeling specification according to the vector representation, the second score layer corresponds to the labeling specification in a one-to-one mode, and the binding label is a group of label combinations comprising single independent labels of all the labeling specifications;

2. The sequence annotation apparatus of claim 1, wherein the presentation layer comprises:

a representation unit: for determining a vector representation of the word from the first vector and the second vector and sending the vector representation to a first scoring level and a plurality of second scoring levels, respectively.

3. The sequence labeling apparatus of claim 2, wherein the representation unit is specifically configured to:

4. The sequence labeling apparatus of claim 1, wherein the prediction layer is specifically configured to:

determining a final score of the binding tag according to the original score of the binding tag and the scores of the independent tags corresponding to the binding tag; determining the probability of each binding tag according to the final score of each binding tag by utilizing a softmax function; determining a target binding tag for the word according to the probability of each binding tag.

5. The sequence annotation apparatus of claim 4, further comprising:

wherein, k isIndicating the number of correct labels, y _j The probability of scoring a correct label after passing through the prediction layer.

6. A method for training a sequence annotation model, which is applied to the sequence annotation apparatus of any one of claims 1 to 5, and comprises:

7. A training device of a sequence annotation model, which is applied to the sequence annotation device of any one of claims 1 to 5, and comprises:

a memory: for storing a computer program;

a processor: for executing the computer program to implement the steps of:

8. A computer-readable storage medium, characterized by a sequence annotation model applied to the sequence annotation apparatus of any one of claims 1 to 5, the computer-readable storage medium having stored thereon a computer program for implementing, when executed by a processor, the steps of:

9. A method for sequence annotation, comprising:

acquiring a text to be marked;

determining vector representation of each word of the text to be labeled;