CN111985229A

CN111985229A - Sequence labeling method and device and computer equipment

Info

Publication number: CN111985229A
Application number: CN201910424100.XA
Authority: CN
Inventors: 谭莲芝; 龙梓; 郭豪; 涂建超; 夏武; 曹祥文
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2020-11-24
Anticipated expiration: 2039-05-21
Also published as: CN111985229B

Abstract

The invention discloses a sequence labeling method, a sequence labeling device and computer equipment, belongs to the technical field of computers, and is used for improving the accuracy of sequence labeling. The method comprises the following steps: obtaining a character-level feature vector of each word of a text sequence to be labeled and a word feature vector based on context semantics, and splicing the character-level feature vector of each word with the word feature vector to obtain spliced word vector representation of each word; determining the attention weight of each word in the text sequence based on a pre-trained attention prediction model, wherein the attention prediction model is obtained by training according to a plurality of text sequence training samples, and the words included in each text sequence training sample are labeled with corresponding attention weights; and according to the corresponding attention weight, performing sequence labeling processing on the spliced word vector representation of each word to obtain a label labeling sequence of the text sequence.

Description

Sequence labeling method and device and computer equipment

Technical Field

The invention relates to the technical field of computers, in particular to a sequence labeling method and device and computer equipment.

Background

With the rapid development of emerging media such as the internet in recent years, people have entered the era of information explosion. Meanwhile, it is increasingly desired that a computer understand human language to better help human beings perform various daily tasks, so Natural Language Processing (NLP) becomes a research hotspot in recent years. In natural language processing, sequence labeling is widely used, for example, labeling for named entity recognition, and how to improve the accuracy of sequence labeling is a considerable problem.

Disclosure of Invention

The embodiment of the application provides a method and a device for labeling a pre-sequence and computer equipment, which are used for improving the accuracy of sequence labeling.

In one aspect, a method for sequence annotation is provided, where the method includes:

performing word segmentation processing on a text sequence to be labeled to obtain words included in the text sequence;

obtaining a character-level feature vector of each word, and obtaining a word feature vector of each word based on context semantics;

splicing the character level feature vector of each word with the word feature vector to obtain spliced word vector representation of each word;

determining the attention weight of each word in the text sequence based on a pre-trained attention prediction model, wherein the attention prediction model is obtained by training according to a plurality of text sequence training samples, and the words included in each text sequence training sample are labeled with corresponding attention weights;

And according to the corresponding attention weight, performing sequence labeling processing on the spliced word vector representation of each word to obtain a label labeling sequence of the text sequence.

In one aspect, a sequence annotation apparatus is provided, the apparatus comprising:

the word segmentation processing module is used for carrying out word segmentation processing on the text sequence to be labeled so as to obtain words included in the text sequence;

the first representation module is used for obtaining character-level feature vectors of all words;

the second representation module is used for obtaining a word feature vector of each word based on context semantics;

the splicing representation module is used for splicing the character-level feature vectors of the words with the word feature vectors to obtain spliced word vector representations of the words;

the attention prediction module is used for determining the attention weight of each word in the text sequence based on a pre-trained attention prediction model, wherein the attention prediction model is obtained by training according to a plurality of text sequence training samples, and the words included in each text sequence training sample are marked with corresponding attention weights;

and the sequence labeling module is used for performing sequence labeling processing on the spliced word vector representation of each word according to the corresponding attention weight to obtain a label labeling sequence of the text sequence.

In one possible design, the apparatus further includes a determining module, configured to determine, before obtaining the character-level feature vector and the word feature vector of each word, a target word that meets a preset condition from words included in the text sequence; then the process of the first step is carried out,

the first representation module is used for obtaining character-level feature vectors of all target words;

the second representation module is used for obtaining a word feature vector of each target word based on context semantics;

in one possible design, the sequence tagging module is further configured to:

carrying out regularization processing on each non-target word in the text sequence by using a regular expression;

determining a labeling label of each non-target word according to the regularized matching result;

and labeling the corresponding non-target words by the determined labeling labels to obtain label labeling sequences of the non-target words.

In one possible design, the determination module is to:

determining a file format of the text sequence, and determining words, of which fields corresponding to the words in the text sequence are different from the file format, as the target words; or,

and determining the words of which the fields corresponding to the words in the text sequence do not belong to the preset fields as the target words.

In one possible design, the apparatus further includes a model training module to:

obtaining a text sequence training sample set, wherein a word in each text sequence training sample is labeled with an attention weight, and the attention weight labeled by each word is used for indicating the attention degree of the word in the text sequence training sample;

aiming at each text sequence training sample, obtaining a character-level feature vector of each word in the text sequence training sample and a spliced word feature vector representation of the word vector;

and training the initial attention prediction model according to the spliced word vector representation of each word and the corresponding attention weight to obtain the trained attention prediction model.

In one possible design, the model training module is further configured to:

obtaining an actual labeling result of the text sequence;

if the actual labeling result is not consistent with the label labeling sequence obtained based on the attention prediction model, re-labeling the attention weight of each word in the text sequence according to the actual labeling result to obtain a new text sequence training sample;

and retraining the attention prediction model by using the new text sequence training sample to obtain an updated attention prediction model.

In one possible design, the second representation module is configured to:

obtaining an initial word vector of each word;

carrying out forward iteration on the initial word vector of each word by utilizing a first circulation layer of a circulation neural network to obtain a forward output sequence of each word;

carrying out reverse iteration on the initial word vector of each word by utilizing a second circulation layer of the recurrent neural network to obtain a reverse output sequence of each word;

and splicing the forward output sequence and the reverse output sequence of each word to obtain a sequence which is used as a word feature vector of the word based on the context semantics.

In one aspect, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps included in the method as described in the above aspects when executing the computer program.

In one aspect, a computer-readable storage medium is provided, having stored thereon computer-executable instructions for causing a computer to perform the steps included in the method of the above-described aspects.

According to the technical scheme in the embodiment of the application, in the process of carrying out sequence labeling on the text sequence, due to the fact that an attention mechanism is adopted, an attention network is added between an LSTM layer and a CRF layer, namely, a model of LSTM-ATTN-CRF is adopted for carrying out sequence labeling, specifically, a pre-trained attention prediction model is adopted for predicting the attention weight of each word in the text sequence to be labeled, and then the characteristics of the word and the attention weight are combined for carrying out sequence labeling during labeling, so that the influence of the characteristics of each layer network on final output can be more fully considered, the characteristics of a hidden layer are combined for carrying out sequence labeling, and the accuracy of sequence labeling can be improved to a certain extent due to more considered parameters.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1a is a schematic diagram of a prior art LSTM + CRF model;

FIG. 1b is another schematic diagram of a prior art LSTM + CRF model;

FIG. 2 is a schematic diagram of an application scenario in an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a sequence tagging apparatus in an embodiment of the present application;

FIG. 4 is a flowchart of a sequence tagging method in an embodiment of the present application;

FIG. 5 is a diagram illustrating sequence labeling using an attention model in an embodiment of the present application;

fig. 6 is a block diagram showing a sequence prediction apparatus according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a computer device in an embodiment of the present application;

Fig. 8 is another schematic structural diagram of a computer device in the embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments given herein without making any creative effort, shall fall within the scope of the claimed protection. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the term "comprises" and any variations thereof, which are intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. The "plurality" in the present application may mean at least two, for example, two, three or more, and the embodiments of the present application are not limited.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document generally indicates that the preceding and following related objects are in an "or" relationship unless otherwise specified.

Some terms referred to herein are explained below to facilitate understanding by those skilled in the art.

1. A Recurrent Neural Network (RNN) has a memory function, which memorizes a state value generated when the Network is operated at the previous time and uses the value for generating an output value at the current time. The recurrent neural network is composed of an input layer, a recurrent layer and an output layer, and may further comprise a fully-connected layer in the fully-connected neural network. The input of the recurrent neural network is a vector sequence, receiving an input x at each time instant_tThe network will generate an output y_tAnd this output is determined jointly by the input sequence at the previous time instant.

The recurrent neural network is particularly suitable for sequence labeling tasks because of the memory function.

2. The bidirectional recurrent neural network, as mentioned above, is particularly suitable for sequence tagging tasks because of its memory function. However, one problem encountered by recurrent neural networks in dealing with such tasks is that the labeled recurrent neural network is unidirectional, but some problems require not only information about past times of the sequence, but also information about future times. For example, we need to understand a word in a sentence, which is related not only to the previous word in the sentence, but also to the following word, i.e. context, and the solution to this problem is a bi-directional recurrent neural network.

The bidirectional recurrent neural network scans data from the forward direction and the reverse direction respectively by using two different recurrent layers. The input sequence of the bidirectional recurrent neural network is assumed as follows: x is the number of₁，x₂，x₃，x₄。

Firstly, forward iteration is carried out by using a first loop layer to obtain a forward output sequence of a hidden layer (also called hidden layer):

wherein,

from x₁It is decided that,

from x₁、x₂It is decided that,

from x₁、x₂、x₃It is decided that,

from x₁、x₂、x₃、x₄The decision, i.e. the state value at each time instant, is determined by all input value sequences to the current time instant position, which utilizes the past time information of the sequence.

Then, a second loop layer is used for reverse iteration, and the input sequence is as follows: x is the number of ₄，x₃，x₂，x₁The reverse output sequence of the hidden layer is obtained as follows:

wherein,

from x₄It is decided that,

from x₄、x₃It is decided that,

from x_4、x_3、x₂It is decided that,

from x₄、x₃、x₂、x₁The decision, i.e. the state value at each moment in time is determined by the input sequence following it, where information of the future moments of the sequence is utilized.

Then, combining and splicing the forward output sequence and the reverse output sequence of the hidden layer at each moment to obtain

And then the output value is sent to the later layer in the neural network for processing to obtain the output value.

3. The Long Short-Term Memory model (LSTM) is used for transforming a circulation layer unit and avoiding directly calculating a hidden layer state value by using a formula. LSTM is a long and short term memory network, a time-recursive neural network, suitable for processing and predicting important events of relatively long intervals and delays in a time series. LSTM is understood to be a recurrent neural network, while Bi-directional LSTM is simply BLSTM or BiLSTM or Bi-LSTM.

4. A Conditional Random Field (CRF) is a Conditional probability distribution model of a set of output Random variables given a set of input Random variables, and is characterized by assuming that the output Random variables constitute a markov Random Field. CRFs may be used for different predictive problems, for example in tagging applications.

5. The word vector features, or called Embedding features or word vectors, are used to describe semantic relationships between words included in text data, the description idea of the word vector features is to convert words represented by natural language into vectors or matrix forms that can be understood by a computer, and the extraction of the word vector features may be performed by a deep learning model, for example, may be performed by a Convolutional Neural Network (CNN) model, an LSTM model, an RNN model, or a Gated CNN (G-CNN) model, and of course, may also be performed by other possible deep learning models.

6. Attention (Attention) mechanism, an Attention model used in the field of artificial neural networks. The essence of the attention mechanism is that human vision attention mechanism, when people perceive things, people generally do not see a scene from head to tail, but see a specific part according to needs, and when people find that a scene often appears something they want to observe in a certain part, people can learn to pay attention to the part when similar scenes reappear in the future. Thus, the attention mechanism is essentially a means of screening out high-value information from a large amount of information in which different information has different importance to the result, which can be reflected by giving attention weights of different sizes, in other words, the attention mechanism can be understood as a rule of assigning weights when synthesizing a plurality of sources.

The idea of the present application is presented below.

In the related prior art, with the rise of deep learning technology, deep learning based technology is generally adopted at presentSequence labeling methods are used for sequence labeling, such as sequence to sequence, LSTM-CRF, CNN-LSTM-CRF, etc. are emerging. The general network architecture of the LSTM-CRF, as shown in fig. 1a, mainly includes an imbeddings layer, a BiLSTM layer, and a CRF layer. This is understood in conjunction with FIG. 1b, where x is₁、x₂、……x_nRepresenting a word, corresponding, x_1,1、x_1,2、……、x_1,t(1)Denotes x₁The characters included in this word. In the sequence labeling process, word vectors are trained by taking words as units, and each word x_nThe word vector is processed by a bidirectional LSTM to obtain char-LSTM hidden layer characteristics, and the hidden layer characteristics are used for expressing relevant characteristics of words based on context semantics. The word vector of each word also gets the character features of the word through the embeddings layer. E obtained by splicing the obtained hidden layer features and character features_nAs input to the subsequent network. Specifically, through the bidirectional LSTM and a fully connected layer, a CRF layer, wherein the hidden layer can randomly chop parameters between half nodes by using a dropout mechanism to increase the robustness of the model. By adopting the existing LSTM + CRF model, because the hidden layer parameters of the model are less, the influence of the characteristics of the hidden layer on the final labeling is not considered when the sequence labeling is carried out, so that the accuracy of the final labeling result is lower.

In view of this, an embodiment of the present application provides a sequence labeling method, where an attention mechanism is added, an attention network is added between an LSTM and a CRF, that is, a model of LSTM-ATTN-CRF is used to perform sequence labeling, specifically, a pre-trained attention prediction model is used to predict an attention weight of each word in a text sequence to be labeled, and finally, a labeling is performed in combination with a characteristic of the word itself and the attention weight when performing labeling.

After introducing the design concept of the embodiment of the present application, some simple descriptions are provided below for application scenarios to which the technical solution provided by the embodiment of the present application is applicable, and it should be noted that the application scenarios described below are only used for illustrating the embodiment of the present invention and are not limited. In specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

Please refer to a schematic diagram of an application scenario shown in fig. 2, in the application scenario, a terminal device 201, a terminal device 202, a terminal device 203, and a server 204 are included. The sequence labeling scheme in the embodiment of the application can be adopted in each terminal device to label the front-end sequence, and the front-end sequence labeling can be used for predicting the labeling result by a manual verification algorithm. The server 204 may perform model training in the background, for example, train an attention model based on a training sample, and supply the trained attention model to each terminal device, and certainly, the server 204 may step through the training model, and may also perform, for example, front-end labeling at each terminal device by using the sequence labeling method in this embodiment of the present application. The server 204 may obtain an original text from a training database, may form a training sample after preprocessing the original text, and meanwhile, the training sample may also be used as a verification sample, and then train a model based on the obtained training sample, where the model training includes initial training and update training. Taking the terminal device 203 as an example, the terminal device 203 can automatically label the text sequence to be detected through the model trained by the server 204 and the sequence labeling method introduced in the embodiment of the present application, that is, through the ideas of deep learning and machine learning, automatic sequence labeling is realized, so that a large amount of workload of sequence labeling due to human being can be reduced, and the efficiency of sequence labeling is improved.

The terminal 201, the terminal 202, and the terminal 203 may be a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a notebook computer, an intelligent wearable device (such as a smart watch and a smart bracelet), a Personal computer, or the like, and no matter which type of terminal, sequence labeling may be performed in the terminal. Also, the aforementioned server 204 may be a personal computer, a large or medium sized computer, a cluster of computers, or the like.

Of course, the method provided by the embodiment of the present invention is not limited to be used in the application scenario shown in fig. 2, and may also be used in other possible application scenarios, and the embodiment of the present invention is not limited. The functions that can be implemented by each device in the application scenario shown in fig. 2 will be described in the following method embodiments, and will not be described in detail herein.

To further illustrate the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide the method operation steps as shown in the following embodiments or figures, more or less operation steps may be included in the method based on the conventional or non-inventive labor. In steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by the embodiments of the present application. The method can be executed in sequence or in parallel according to the method shown in the embodiment or the figure when the method is executed in an actual processing procedure or a device.

The technical solution in the embodiment of the present application is roughly described below with reference to fig. 3.

The sequence labeling device may label a text sequence to be labeled, and the sequence labeling device may run a sequence labeling model, and the sequence labeling model may include a plurality of functional modules including the data cleaning module 301 to the CRF module 307. After obtaining the text sequence to be labeled, the sequence labeling apparatus may perform data cleaning on the text sequence through the data cleaning module 301, for example, perform data cleaning operations of pdf-to-txt text and filtering out punctuations that are consistent with certain moods, so as to obtain a purer text sequence. Then, the word segmentation processing module 302 performs word segmentation processing on the text sequence to obtain each word. Further, for each word, the character-level feature vector of each word is determined by the character vector identification module 303, and the word feature vector of each word is determined by the word vector representation module 304. However, the character-level feature vector and the word feature vector for each word are concatenated using the concatenation vector representation module 305 to obtain a concatenated word vector representation. In addition, attention weights for individual words are also calculated by the attention weight prediction module 306. Finally, by the CRF module 307, sequence labeling processing is performed on the spliced word vector representation of each word according to the attention weight of each word to obtain a tag labeling sequence of the text sequence, that is, a sequence labeling result is obtained, and the sequence labeling result can be output.

The following describes the technical solution in the embodiment of the present application with reference to a flowchart of a sequence tagging method shown in fig. 4. The sequence annotation method can be executed by any terminal device (terminal device 201, or terminal device 202, or terminal device 203) or server 204 as in fig. 2.

The method flow in fig. 4 is explained below.

Step 401: and obtaining a text sequence to be marked.

In a specific implementation process, the text sequence to be labeled may be any text, for example, a security report text or a network protection log in the network security field, or may also be other types of text, which is not limited in the embodiment of the present application.

After the text to be labeled is obtained, data cleaning can be performed on the text, for example, the text is converted into a text format file, some special symbols can be deleted, for example, punctuation marks, numbers or letter numbers can be deleted, some language auxiliary words such as "yes", "a", "he" and the like can be deleted, and the like.

Step 402: and performing word segmentation processing on the text sequence to be labeled to obtain all words included in the text sequence.

Further, the text after the cleaning processing may be subjected to word segmentation processing in any existing word segmentation processing manner, so as to obtain all words included in the text sequence corresponding to the text. For example, for the text sequence of "zhang san in beijing at concert", the 5 words "zhang san", "in", "beijing", "open", and "concert" can be obtained after the word segmentation process.

Step 403: and dividing words included in the text sequence into two types of words, namely target words and non-target words according to preset conditions.

That is, all words included in the text sequence may be divided into two types of words by a preset condition, and for convenience of description, the divided type of words is referred to as target words, and the divided type of words is referred to as non-target words. Furthermore, according to different division types, different modes are subsequently adopted for sequence labeling processing on target words and non-target words, so that the requirements of different types of words on sequence labeling can be met, and sequence labeling processing is carried out on different words in the same sequence text in a differentiated mode, so that the flexibility of sequence labeling can be improved.

In the implementation, for example, a word satisfying a preset condition is referred to as a target word, and a word not satisfying the preset condition is referred to as a non-target word.

In a possible implementation manner, a file format corresponding to the text sequence may be determined, and when the file format is a predetermined file format, a word corresponding to a field of a word in the text sequence that is different from the file format may be determined as a target word. For example, for some network security class report texts, including some related information such as pdb (also called debug information), services, startup items, etc., taking pdb information as an example, the pdb information is usually stored in a security class report with a format suffix name of pdb, so that the security report can be considered as successfully identified as long as the field of "pdb" is filtered, because training samples of information such as pdb information are few, it is generally difficult to label correctly if a model training algorithm is used for labeling, and for information in these more specific files, sequence labeling can be performed in other manners (such as regularization), so that the accuracy of final sequence labeling can be ensured as much as possible. Therefore, based on this consideration, words corresponding to words in the text sequence and having different file formats can be determined as target words and then sequence labeling is performed by using a model algorithm, words having the same file format as the target words are determined as non-target words, and then sequence labeling is performed on the non-target words by using a regularization processing mode and the like, so as to ensure the accuracy of labeling as much as possible.

In another possible implementation manner, words in the text sequence, for which the field corresponding to the word does not belong to the preset field, may be determined as target words, and correspondingly, words in the text sequence, for which the field corresponding to the word belongs to the preset field, may be determined as non-target words. In this embodiment, it is also considered that, as described above, because some words in the special field have fewer training samples at ordinary times, and therefore, if a model algorithm is used for sequence labeling, there is a problem of low accuracy, so that the number of training samples corresponding to a certain word can be determined by the preset field, where the number of training samples corresponding to the preset field is generally small, for example, the number of training samples is less than 10, for this reason, words belonging to the preset field can be used as non-target words to perform regular extraction and sequence labeling, and words not belonging to the preset field indicate that there are more training samples, the sample coverage is relatively uniform, and these words can be divided into target words to perform sequence labeling by the model algorithm.

In the specific implementation process, some other preset conditions may also be set to divide the target word and the non-target word, and the embodiments of the present application are not illustrated.

Step 404: for the target words, a character-level feature vector is obtained for each target word, and a context-based semantic word feature vector is obtained for each target word.

As mentioned above, the training samples corresponding to the target words are generally sufficient, and the sequence labeling can be performed by using a model algorithm for such words. Then first, a character-level feature vector for each target word may be obtained, as well as a context-based semantic word feature vector for each target word.

Wherein the character-level feature vector may be used to characterize the objective meaning of the word itself, as determined by the individual characters included in the word. In particular implementations, the character-level feature vectors for words may be determined from a character vector table. The character vector table may be considered as a set of all character vectors, similar to the character vector library, that is, the character vector table includes the character vectors of all included characters. For example, by looking up the character vector table, it can be obtained that the character vectors of the two characters of "north" and "jing" are:

north ═ 0.9812937, 0.8238749, … …, 0.6275763;

jing ═ 0.8749298, 0.5661328, … …, 0.7671823.

The character vectors of the two characters, namely "north" and "jing", are 3000 dimensions, for example, that is, 3000 elements are included in the character vectors of the two characters, namely "north" and "jing", and for the sake of brevity, only 3 elements are identified, and the rest are replaced with "… …".

After obtaining the character vectors of the two characters of "north" and "beijing", the character-level feature vector of the word "beijing" can be obtained as follows:

further, there is a need to obtain a word feature vector for each word based on context semantics, which refers to the meaning actually expressed by placing a word in a segment of a word or even the entire document in conjunction with context and semantics. Because context semantics need to be combined, the word feature vector of a word can be determined by using the bidirectional neural network introduced in the foregoing, for example, can be determined by bidirectional LSTM (Bi-LSTM), specifically, an initial word vector of each word can be obtained first; further, performing forward iteration on the initial word vector of the word by using a first loop layer of a bidirectional loop neural network to obtain a forward output sequence of the word, and performing reverse iteration on the initial word vector of the word by using a second loop layer of the bidirectional loop neural network to obtain a reverse output sequence of the word; and finally, splicing the forward output sequence and the reverse output sequence of each word to obtain a sequence which is used as a word feature vector of the word based on the context semantics. For the specific implementation process of the word feature vector of the word, reference may be made to the foregoing description of the bidirectional recurrent neural network, and a description thereof will not be further provided here.

Step 405: and splicing the character-level feature vector of each target word with the word feature vector to obtain spliced word vector representation of each target word.

Here, the term "vector concatenation" is understood to mean that one vector is directly concatenated after another vector, for example, a ═ 1, 2, 3]，b＝[5]The vector a and the vector b are spliced to form [1, 2, 3, 5 ]]Or [5, 1, 2, 3 ]]. Referring to the LSTM + ATTN + CRF network shown in FIG. 5, the character-level feature vectors of words and the spliced word feature vectors are e₁、e₂、……、e_nExpressed, i.e. can be used as e_nTo represent a concatenate vector representation of the target word in embodiments of the present application.

Step 406: based on the attention prediction model, an attention weight for each target word is determined.

As will be understood in conjunction with FIG. 5, in the embodiment of the present application, on the basis of the existing LSTM + CRF model, an Attention (Attention) network is added between the LSTM layer and the CRF layer, that is, a new network model of LSTM + ATTN + CRF is used for sequence annotation, so that an Attention weight can be set for each target word by using the Attention mechanism, and the Attention weight of a word is used for representing the annotation sequence of the word to the final prediction (i.e., y in FIG. 5) ₁、y₂、……、y_n) The degree of influence of (c).

In a specific implementation process, the attention prediction model may be trained in advance, and the attention prediction model is trained according to a plurality of text sequence training samples, and each text sequence training sample includes words labeled with corresponding attention weights, and a training process for the attention prediction model will be described later. Specifically, the concatenated word vector representation of each target word may be input to the attention prediction model, and the attention prediction model may obtain the attention weight corresponding to the target word through calculation, for example, for a text sequence of "zhang san in beijing karaoke," 5 words of "zhang san", "beijing", "kaiko" and "concert" are obtained through word segmentation processing, and the attention weights predicted for the 5 words by the attention prediction model are: w is a₁＝0.1，w₂＝0.05，w₃＝0.35，w₄＝0.05，w₅0.45. Among these, attention weights (i.e., w) for "Beijing" are seen₃0.35) and attention weight of "concert" (w)₅0.45), which shows that two words with larger influence are Beijing and the concert in the text sequence of Zhangsan in Beijing Kangsing (Kangqing Kangsing concert), which also conforms to the general conventional thinking of people, and the focus of attention is Beijing and the concert in the sentence of Zhangsan in Beijing Kangsing concert.

Step 407: and performing sequence labeling processing on the spliced word vector representation of each target word according to the corresponding attention weight to obtain label labeling sequences of all the target words.

Further, the labeling process may be performed based on attention weight, for example, for the text sequence of "zhang san zhang kao sing concert" in the above example, the 5 words included in the text sequence are all target words, and the corresponding concatenated word vector is represented by e_n＝{e₁,e₂,e₃,e₄,e₅Are w, and the respective corresponding attention weights are w_n＝{w₁,w₂,w₃,w₄,w₅Is a simple oneThe way of understanding of (1) is that multiplication can be directly performed, and as can be understood with reference to fig. 5, further: a is_n＝{w₁*e₁,w₂*e₁,w₃*e₃,w₄*e₄,w₅*e₅}. Further, a to be obtained_nAnd the CRF layer is used as an input of the CRF layer to complete a final labeling process so as to obtain label labeling sequences of all target words.

For the named entity annotation example, a BIO annotation set can be employed or a BMEO annotation set can be employed, for example. For the BIO labeling set, B-PER and I-PER represent first characters of names of people and non-first characters of names of people, B-LOC and I-LOC represent first characters of names of places and non-first characters of names of places, B-ORG and I-ORG represent first characters of names of organizations and non-first characters of names of organizations, and O represents that the character does not belong to one part of a named entity; for the BMEO label set, BA represents that the character is the first address word, MA represents that the character is the middle address word, and EA represents that the character is the last address word; BO represents that the character is the first character of the organization name, MO represents that the character is the middle character of the organization name, and EO represents that the character is the end character of the organization name; BP represents that the character is a first name, MP represents that the character is a middle name, EP represents that the character is a last name, and O represents that the character does not belong to a named entity.

Step 408: for non-target words, a regularization process is performed on each non-target word.

As previously described, the non-target words may be regularized using a regular expression to obtain a regularized matching result for each non-target word.

Step 409: and determining the labeling label of each non-target word according to the regularized matching result.

Further, the labeling label of each non-target word is obtained according to the regularized matching result of each non-target word, and for example, the labeling label of each non-target word can be obtained through the preset corresponding relationship between the regularization result and the labeling label.

Step 410: and labeling each corresponding non-target word by using the determined labeling label to obtain label labeling sequences of all the non-target words.

Through steps 408-410, the tag labeling sequence of all non-target words can be completed.

Step 411: according to the tag labeling sequence of the target word obtained in step 407 and the tag labeling sequence of the non-target word obtained in step 410, a final sequence labeling result for the text sequence to be labeled is obtained, and the final sequence labeling result can be output.

That is, the label labeling sequences of all target words obtained by labeling the algorithm model and the label labeling sequences of all non-target words obtained by regularization processing can be combined to be used as a final sequence labeling result, and the sequence labeling result can be displayed at the front end.

Based on the aforementioned attentive prediction model, the training process thereof is described below.

Firstly, a text sequence training sample set is obtained, wherein each word in the text sequence training sample is labeled with an attention weight, and the attention weight of each word label is used for indicating the attention degree of the word in the text sequence training sample.

Further, for each text sequence training sample, a character-level feature vector of each word in the text sequence training sample and a spliced word vector representation of the word vector are obtained. The manner of obtaining the vector representation of the concatenated word is described above, and will not be described again here.

And finally, training the initial attention prediction model according to the spliced word vector representation of each word and the corresponding attention weight to obtain the trained attention prediction model. The specific mode of training the model can be performed according to the existing mode, and the embodiment of the application is not limited.

In the embodiment of the present application, the actual labeling result of the text sequence to be labeled may be obtained through a manual labeling manner or other manners, then, whether the actual labeling result is consistent with the label labeling sequence obtained based on the attention prediction model is judged, if not, it indicates that the prediction labeling has an error, then, it indicates that the prediction of the attention prediction model is not accurate enough, at this time, the text sequence may be re-labeled based on the manual labeling manner according to the actual labeling result to obtain a new text sequence training sample, then, the new text sequence training sample is used to re-train the attention prediction model that was previously subjected to attention weight prediction, that is, the previously used attention prediction model is updated through the new text sequence training sample to implement updating and iterative training of the attention prediction model, therefore, the accuracy of model prediction can be improved as much as possible, and the accuracy of final sequence labeling is improved.

In the sequence labeling method in the embodiment of the application, as an attention mechanism is added, an attention network is added between an LSTM and a CRF, namely, a model of the LSTM-ATTN-CRF is adopted for sequence labeling, specifically, a pre-trained attention model is adopted for predicting the attention weight of each word in a text sequence to be labeled, and finally, the sequence labeling is carried out by combining the characteristics of the word and the attention weight during labeling, so that the influence of the characteristics of each layer of the network on final output can be more fully considered, the sequence labeling is carried out by combining the characteristics of a hidden layer, and the accuracy of the sequence labeling can be improved to a certain extent due to more considered parameters.

Based on the same inventive concept, the embodiment of the application provides a sequence labeling device. The sequence labeling means may be a hardware structure, a software module, or a hardware structure plus a software module. The sequence marking device can be realized by a chip system, and the chip system can be formed by a chip and can also comprise the chip and other discrete devices. Referring to fig. 6, the sequence annotation apparatus in the embodiment of the present application includes a word segmentation processing module 601, a first representation module 602, a second representation module 603, a concatenation representation module 604, an attention prediction module 605, and a sequence annotation module 606, where:

a word segmentation processing module 601, configured to perform word segmentation processing on a text sequence to be labeled to obtain words included in the text sequence;

a first representation module 602, configured to obtain a character-level feature vector of each word;

a second representing module 603, configured to obtain a word feature vector of each word based on context semantics;

a concatenation representation module 604, configured to concatenate the character-level feature vector of each word with the word feature vector to obtain a concatenated word vector representation of each word;

an attention prediction module 605, configured to determine an attention weight of each word in the text sequence based on a pre-trained attention prediction model, where the attention prediction model is obtained by training according to a plurality of text sequence training samples, and each word included in each text sequence training sample is labeled with a corresponding attention weight;

And a sequence labeling module 606, configured to perform sequence labeling processing on the spliced word vector representation of each word according to the corresponding attention weight, so as to obtain a tag labeling sequence of the text sequence.

In a possible implementation manner, the sequence labeling apparatus in the embodiment of the present application further includes a determining module, configured to determine, before obtaining the character-level feature vector and the word feature vector of each word, a target word that meets a preset condition from words included in the text sequence; then the process of the first step is carried out,

a first representation module 602, configured to obtain a character-level feature vector of each target word;

a second representing module 603, configured to obtain a word feature vector of each target word based on context semantics;

in one possible implementation, the sequence annotation module 606 is further configured to:

In a possible implementation, the aforementioned determining module is configured to:

Determining a file format of the text sequence, and determining words, of which the fields corresponding to the words in the text sequence are different from the file format, as target words; or,

and determining words of which the fields corresponding to the words in the text sequence do not belong to the preset fields as target words.

In a possible implementation manner, the sequence labeling apparatus in the embodiment of the present application further includes a model training module, configured to:

aiming at each text sequence training sample, obtaining a character-level feature vector of each word in the text sequence training sample and a spliced word vector representation of the word feature vector;

In a possible implementation, the aforementioned model training module is further configured to:

obtaining an actual labeling result of the text sequence;

In a possible implementation, the second representing module 603 is configured to:

obtaining an initial word vector of each word;

All relevant contents of each step related to the foregoing embodiments of the sequence labeling method can be cited to the functional description of the functional module corresponding to the sequence labeling apparatus in the embodiments of the present application, and are not described herein again.

The division of the modules in the embodiments of the present application is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, may also exist alone physically, or may also be integrated in one module by two or more modules. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Based on the same inventive concept, the embodiment of the present application further provides a computer device, which is, for example, any one of the terminal devices or the server 204 in fig. 2. As shown in fig. 7, a computer device in this embodiment of the present application includes at least one processor 701, a memory 702 and a communication interface 703, where the memory 702 and the communication interface 703 are connected to the at least one processor 701, and a specific connection medium between the processor 701 and the memory 702 is not limited in this embodiment of the present application, and in fig. 7, the processor 701 and the memory 702 are connected by a bus 700 as an example, the bus 700 is shown by a thick line in fig. 7, and connection manners between other components are only schematically illustrated and are not limited. The bus 700 may be divided into an address bus, a data bus, a control bus, etc., and is shown in fig. 7 with only one thick line for ease of illustration, but does not represent only one bus or one type of bus.

In the embodiment of the present application, the memory 702 stores instructions executable by the at least one processor 701, and the at least one processor 701 may execute the steps included in the foregoing fault detection method by executing the instructions stored in the memory 702.

The processor 701 is a control center of the computer device, and may connect various parts of the entire fault detection device by using various interfaces and lines, and perform various functions and process data of the computing device by operating or executing instructions stored in the memory 702 and calling data stored in the memory 702, thereby performing overall monitoring on the computing device. Optionally, the processor 701 may include one or more processing units, and the processor 701 may integrate an application processor and a modem processor, where the processor 701 mainly handles an operating system, a user interface, an application program, and the like, and the modem processor mainly handles wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 701. In some embodiments, processor 701 and memory 702 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 701 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

Memory 702, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 702 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 702 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 702 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

The communication interface 703 is a transmission interface that can be used for communication, and can receive data or transmit data via the communication interface 703, for example, can receive an original product image of an industrial product transmitted by another device via the communication interface 703, and can also transmit an obtained failure detection result to another device via the communication interface 703.

With reference to the further structural schematic diagram of the computer device shown in fig. 8, the computer device also includes a basic input/output system (I/O system) 801 to facilitate the transfer of information between the various components within the computer device, and a mass storage device 805 to store an operating system 802, application programs 803, and other program modules 804.

The basic input/output system 801 includes a display 806 for displaying information and an input device 807 such as a mouse, keyboard, or the like for inputting information by a user. Wherein a display 806 and an input device 807 are connected to the processor 701 through a basic input/output system 801 connected to the system bus 700. The basic input/output system 801 may also include an input/output controller for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, an input-output controller may also provide output to a display screen, a printer, or other type of output device.

The mass storage device 805 is connected to the processor 701 through a mass storage controller (not shown) connected to the system bus 700. The mass storage device 805 and its associated computer-readable media provide non-volatile storage for the server package. That is, the mass storage device 805 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM drive.

According to various embodiments of the invention, the computing device package may also be operated by a remote computer connected to the network through a network, such as the Internet. That is, the computing device may connect to the network 808 through the communication interface 703 coupled to the system bus 700, or may connect to another type of network or remote computer system (not shown) using the communication interface 703.

Based on the same inventive concept, embodiments of the present application further provide a computer-readable storage medium, which stores computer instructions, and when the computer instructions are executed on a computer, the computer is caused to perform the steps of the method for performing annotation in sequence as described above.

Based on the same inventive concept, the embodiment of the present application further provides a chip system, where the chip system includes a processor and may further include a memory, and is used to implement the steps of the foregoing sequence tagging method. The chip system may be formed by a chip, and may also include a chip and other discrete devices.

In some possible embodiments, the aspects of the sequence annotation method provided in the embodiments of the present application can also be implemented in the form of a program product, which includes program code for causing a computer to perform the steps of the sequence annotation method according to the various exemplary embodiments of the present invention described above when the program product runs on the computer.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for labeling sequences, the method comprising:

2. The method of claim 1, wherein prior to obtaining the character-level feature vector and the word vector for each word, the method further comprises:

determining target words meeting preset conditions from words included in the text sequence;

Then, obtaining the character-level feature vector and the word feature vector of each word includes:

and obtaining character-level features and word feature vectors of all target words.

3. The method of claim 2, wherein the method further comprises:

4. The method of claim 2, wherein determining a target word satisfying a predetermined condition from among the words included in the text sequence comprises:

determining a file format of the text sequence, and determining words, which correspond to the words in the text sequence and are different from the file format, as the target words when the file format is a preset file format; or,

5. The method of claim 1, wherein the attention prediction model is trained in the following manner:

6. The method of claim 5, wherein the method further comprises:

obtaining an actual labeling result of the text sequence;

if the actual labeling result is not consistent with the label labeling sequence obtained based on the attention prediction model, re-labeling the attention weight of each word in the text sequence according to the actual labeling result to obtain a new text sequence training sample; wherein the new text sequence training samples are used to retrain the attention prediction model.

7. The method of any one of claims 1-6, wherein obtaining a word feature vector for each word based on context semantics comprises:

Obtaining an initial word vector of each word;

8. A sequence annotation apparatus, characterized in that the apparatus comprises:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps comprised by the method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the steps comprising the method of any one of claims 1-7.