CN111950278A

CN111950278A - Sequence labeling method and device and computer readable storage medium

Info

Publication number: CN111950278A
Application number: CN201910399055.7A
Authority: CN
Inventors: 孟茜; 童毅轩; 张永伟; 姜珊珊; 董滨
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2020-11-17

Abstract

The invention provides a sequence labeling method, a sequence labeling device and a computer readable storage medium. According to the sequence labeling method provided by the invention, the part of speech and/or the syntactic characteristics are introduced into the sequence labeling process, and the richer part of speech and syntactic information are utilized, so that the better sequence labeling effect can be obtained, and the accuracy of sequence labeling is improved.

Description

Sequence labeling method and device and computer readable storage medium

Technical Field

The invention relates to the technical field of Natural Language Processing (NLP), in particular to a sequence labeling method and device and a computer readable storage medium.

Background

In the field of artificial intelligence, an information extraction technology is an indispensable important technology. Currently, the information extraction technology mainly includes three algorithms. The first is a knowledge-graph based extraction algorithm. The extraction algorithm requires knowledge base map data and rule support. The knowledge graph is established by consuming a large amount of human resources, and the finally obtained data volume is not ideal. The second is an extraction algorithm based on the traditional statistical machine learning algorithm, which can use manually labeled training data and apply different learning models to deal with different scenes, and the algorithm has the disadvantages of high labor cost and poor popularization, so that the algorithm encounters a bottleneck in wide application. The last algorithm is an algorithm using a neural network model, which has prevailed in recent years. Compared with the traditional machine learning algorithm, the neural network-based model using the large-scale training data set shows excellent performance in natural language processing tasks.

As one of the basic tasks of natural language processing, sequence labeling is usually required. Sequence tagging refers to marking or tagging elements in a sequence for a given sequence. Sequence annotation typically includes Named Entity Recognition (NER), Chinese tokenization, and classification questions (e.g., relationship Recognition, sentiment analysis, intent analysis, etc.).

For example, Named Entity Recognition (NER) is a common task in natural language processing, and Named entities are used as basic units of semantic representation in many applications, and the range of use of the Named entities is very wide, so that the Named Entity Recognition technology plays an important role. Named entities generally refer to entities of particular significance or strong reference in text, and generally include names of people, places, organizations, time, proper nouns, and the like. Named entity recognition techniques play an important role because named entities are used as the fundamental unit of semantic representation in many tasks. Sequence labeling problems typically require label data for model training, and deep learning based neural network models can be typically utilized.

Therefore, the high-precision sequence identification method has important significance in developing systems such as high-performance translation, conversation, public opinion monitoring, topic tracking, semantic understanding and the like.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present invention is to provide a sequence labeling method and device, so as to improve the accuracy of sequence labeling.

According to an aspect of the embodiments of the present invention, there is provided a sequence annotation method, including:

generating first labels of words in a training sentence, the first labels comprising part-of-speech labels and/or syntax labels;

constructing a first feature vector based on the first label aiming at the training statement, and generating a first hidden state of the first feature vector through a neural network model;

generating a second feature vector containing dictionary features of a preset dictionary aiming at the training sentences, and generating a second hidden state of the second feature vector through the neural network model, wherein the preset dictionary comprises a plurality of reference labeling results;

combining the first hidden state and the second hidden state to obtain a third hidden state;

and carrying out sequence labeling according to the third hidden state to obtain a sequence labeling result of the training sentence.

Further in accordance with at least one embodiment of the present invention, the step of constructing a first feature vector based on the first label for the training sentence comprises:

replacing each word of the training sentence with the probability corresponding to the first label to which the word belongs to obtain the first feature vector; the probability corresponding to the first label to which each word belongs is positively correlated with the proportion of the first class of words in the second class of words, the second class of words are words under the first label to which the word belongs in the training sentence, and the first class of words are words belonging to the reference labeling result in the second class of words.

Furthermore, according to at least one embodiment of the present invention, the step of generating a second feature vector containing dictionary features of a preset dictionary for the training sentence includes:

obtaining word embedding vectors of all words in the training sentence;

generating an unique hot code corresponding to each word according to whether the word context including the word in the training sentence exists in the preset dictionary or not and obtaining an unique hot vector corresponding to the training sentence;

and combining the word embedding vector of the word in the training sentence and the unique heat vector corresponding to the training sentence to obtain a second feature vector containing the dictionary features of the preset dictionary.

Further in accordance with at least one embodiment of the present invention, the step of merging the first hidden state and the second hidden state includes:

and carrying out vector connection operation or vector addition operation on the first hidden state and the second hidden state to obtain the third hidden state.

Furthermore, according to at least one embodiment of the present invention, the step of performing sequence labeling according to the third hidden state of the training sentence includes:

and generating a segmentation sequence of the training sentence based on the third hidden state, inputting the segmentation sequence to an output layer softmax layer of the neural network model, training the neural network model, and obtaining the label of the class to which each segmentation sequence of the training sentence output by the softmax layer belongs and the probability of the label.

Further, in accordance with at least one embodiment of the present invention, after training the neural network model, the method further comprises:

and carrying out sequence labeling on the sentences to be processed by utilizing the neural network model obtained by training.

According to another aspect of the embodiments of the present invention, there is provided a sequence annotation apparatus, including:

a label generating unit, configured to generate a first label of a word in a training sentence, where the first label includes a part-of-speech label and/or a syntax label;

a first hidden state generating unit, configured to construct, for the training sentence, a first feature vector based on the first label, and generate a first hidden state of the first feature vector through a neural network model;

a second hidden state generating unit, configured to generate, for the training sentence, a second feature vector including dictionary features of a preset dictionary, and generate a second hidden state of the second feature vector through the neural network model, where the preset dictionary includes multiple reference labeling results;

a state merging unit, configured to merge the first hidden state and the second hidden state to obtain a third hidden state;

and the first labeling processing unit is used for carrying out sequence labeling according to the third hidden state to obtain a sequence labeling result of the training sentence.

In addition, according to at least one embodiment of the present invention, the first hidden state generating unit is further configured to replace each word of the training sentence with a probability corresponding to a first label to which the word belongs, so as to obtain the first feature vector; the probability corresponding to the first label to which each word belongs is positively correlated with the proportion of the first class of words in the second class of words, the second class of words are words under the first label to which the word belongs in the training sentence, and the first class of words are words belonging to the reference labeling result in the second class of words.

Furthermore, according to at least one embodiment of the present invention, the second hidden state generating unit is further configured to obtain a word embedding vector of each word in the training sentence; generating an unique hot code corresponding to each word according to whether the word context including the word in the training sentence exists in the preset dictionary or not and obtaining an unique hot vector corresponding to the training sentence; and combining the word embedding vector of the word in the training sentence and the unique heat vector corresponding to the training sentence to obtain a second feature vector containing the dictionary features of the preset dictionary.

Furthermore, according to at least one embodiment of the present invention, the state merging unit is further configured to perform a vector join operation or a vector add operation on the first hidden state and the second hidden state to obtain the third hidden state.

Furthermore, according to at least one embodiment of the present invention, the first label processing unit is further configured to generate a segmentation sequence of the training sentence based on the third hidden state, input the segmentation sequence to a softmax layer, which is an output layer of the neural network model, train the neural network model, and obtain a label and a probability thereof of a category to which each segmentation sequence of the training sentence output by the softmax layer belongs.

Furthermore, in accordance with at least one embodiment of the present invention, the sequence labeling apparatus further includes:

and the second labeling processing unit is used for performing sequence labeling on the sentences to be processed by using the neural network model obtained by training.

The embodiment of the present invention further provides a sequence labeling apparatus, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the sequence tagging method as described above.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the sequence annotation method are implemented as described above.

Compared with the prior art, the sequence tagging method, the sequence tagging device and the computer readable storage medium provided by the embodiment of the invention introduce the part of speech and/or the syntactic characteristics into the sequence tagging process, and because rich part of speech information and syntactic information are utilized, the embodiment of the invention can obtain a better sequence tagging effect and improve the accuracy of sequence tagging.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive labor.

FIG. 1 is a flowchart illustrating a sequence tagging method according to an embodiment of the present invention;

FIG. 2 is an exemplary diagram of a syntactic analysis according to an embodiment of the present invention;

FIG. 3 is an exemplary diagram of constructing part-of-speech and syntactic label based feature vectors in an embodiment of the present invention;

FIG. 4 is an exemplary diagram of training a Bi-LSTM model based on hidden states in an embodiment of the present invention;

FIG. 5 is an exemplary diagram of a join operation performed on a word-embedded vector and a one-hot vector according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a sequence annotation result obtained by a neural network model according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a sequence labeling apparatus according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a sequence labeling apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments. In the following description, specific details such as specific configurations and components are provided only to help the full understanding of the embodiments of the present invention. Thus, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Referring to fig. 1, a flow diagram of a sequence tagging method provided by an embodiment of the present invention is shown, and the sequence tagging method can be applied to tasks including named entity recognition, chinese word segmentation, and classification problems, and can improve the accuracy of sequence tagging. As shown in fig. 1, a sequence annotation method provided in the embodiment of the present invention includes:

step 11, generating a first label of a word in a training sentence, wherein the first label comprises a part-of-speech label and/or a syntax label.

Here, the training sentence is a sentence in training data collected in advance, the training data including a plurality of training sentences. Specifically, a preset dictionary may be pre-constructed, where the dictionary includes a plurality of reference labeling results in a sequence labeling task, and the training sentence is a sentence that has been subjected to data preprocessing and includes reference labeling results that are pre-labeled in the training sentence in a manual labeling manner for training a subsequent model. For example, for sequence labeling of named entity recognition, the reference labeling result in the dictionary may be various reference named entity sequences obtained in advance, and each named entity sequence existing in the training sentence may be labeled through manual labeling. In addition, the data preprocessing may include document slicing, text word segmentation, deletion of stop words and other noise (including punctuation, numbers, single words and other meaningless words), and the like.

Natural language has rich semantic features and strong structural features, which are usually expressed in that a single word has a corresponding part-of-speech (such as may include nouns, verbs, adjectives or other predefined parts-of-speech), and a specific syntactic relationship is formed between words (such as subject predicate structures, modified and modified structures, etc.). Thus, embodiments of the present invention introduce part-of-speech and/or syntax in semantic understanding in sequence tagging. In general, part of speech or syntax can be summarized into a limited number of categories, respectively, depending on the language that needs to be analyzed, and part of speech tags and syntax tags can be tagged for each word in a sentence by related tools of the prior art.

In step 11, part-of-speech tags and/or syntactic tags may be generated on the training data by the NLP parser. Part-of-speech tags and syntax tags for words of each training sentence are obtained.

Taking the training sentence "London is the topic and most probability city" as an example, FIG. 2 shows an example of a syntax tree formed by syntactic analysis. Each box in fig. 2 represents a word in the sentence, and the corresponding label is shown above each word. For example, in fig. 2, S denotes a root of the tree structure, NP, VP, adpp, and the like are part-of-speech tags, and NNP, VBZ, DT, NN, CC, RBS, and JJ, and the like denote syntax tags. If NP represents a noun phrase, VP represents a verb phrase, NNP represents a place name, and VBZ represents a conjoin verb, and so on. In addition, the type and number of part-of-speech/syntax tags generated by different tools may be different, and the embodiment of the present invention does not limit the specifically adopted tools.

In addition, to simplify the processing, in the embodiment of the present invention, only the part-of-speech tag or only the syntax tag may be generated in step 11. Of course, the embodiment of the present invention may also generate two kinds of tags, namely, a part-of-speech tag and a syntax tag. By introducing the part-of-speech and/or syntactic characteristics described above into sequence annotation, the accuracy of sequence annotation (e.g., named entity recognition) can be improved.

And step 12, constructing a first feature vector based on the first label aiming at the training sentence, and generating a first hidden state of the first feature vector through a neural network model.

Here, when constructing the first feature vector of the first label, each word of the training sentence may be replaced with a probability corresponding to the first label to which the word belongs according to the position of the word in the training sentence, so as to obtain the first feature vector. The probability corresponding to the first label to which each word belongs is positively correlated with the proportion of the first class of words in the second class of words, the second class of words is the words under the first label to which the word belongs in the training sentence, and the first class of words is the words belonging to the reference labeling result in the preset dictionary in the second class of words. Taking the sequence label recognized by the named entity as an example, a word of a first type belongs to a reference labeling result, and the word may be the same as a reference named entity sequence in the predetermined dictionary, or the word is a part of a reference named entity sequence in the predetermined dictionary.

According to at least one embodiment of the present invention, the probability corresponding to the first label to which each word belongs is positively correlated with the proportion of the first class of words in the second class of words. That is, the larger the ratio, the larger the corresponding probability. A simpler implementation is that the probability corresponding to the first label to which the word belongs is equal to the proportion of the first class of words in the second class of words. Of course, the probability may also be determined according to the ratio in a more complex manner, and this is not specifically limited in the embodiment of the present invention.

Still taking the named entity recognition as an example, assuming that the first tag is a part-of-speech tag, and a certain training sentence includes 10 words, in the above step 11, part-of-speech tags of the words in the training sentence are generated, and it is assumed that the 10 words in the training sentence include 5 nouns, 1 verb, 2 adjectives and 2 conjunctions. It is assumed that 2 nouns (i.e. the first type words) in the above 5 nouns (i.e. the second type words) belong to the reference named entity sequence in the dictionary, specifically, the two nouns may both belong to the same reference named entity sequence, or may belong to different reference named entity sequences, which is not limited in the embodiment of the present invention. Under the label of the noun, the proportion of the first class word in the second class word is 2/5, so that the probability corresponding to the label of the noun can be determined to be 2/5, and further the noun in the training sentence is replaced by 2/5. Assuming that none of the 2 conjunctions belongs to the reference named entity sequence in the dictionary, the ratio of the first class word in the second class word is 0 under the label of the conjunctions, so that the probability corresponding to the label of the conjunctions can be determined to be 0, and then the conjunctions in the training sentence are replaced by 0 or a constant close to 0. Through the above manner, each word in the training sentence can be replaced by the probability corresponding to the label to which the word belongs, so that a first feature vector is obtained, and the part-of-speech feature is introduced into the first feature vector.

In addition, considering that the number of words in different training sentences may be different, the number of words of all training sentences may be normalized for the convenience of calculation processing. Specifically, a normalized number may be set according to the number of words of most training sentences, for example, when the number of words of 95% of the training sentences is 20 or less, the normalized number may be set to 20. Then, for training sentences that exceed the normalized number, one or more words at the end of the sentence may be deleted until the normalized number is satisfied. For training sentences smaller than the normalized number, after the first feature vector of the first label is generated in step 12, a padding operation (padding) may be performed on the first feature vector, such as padding 0, to make the dimension of the first feature vector reach the normalized number.

When the part-of-speech tag and the syntax tag are considered at the same time, in step 12, the embodiment of the present invention may construct a feature vector based on the part-of-speech tag and a feature vector based on the syntax tag for the training sentence. FIG. 3 shows an example of constructing feature vectors based on the part-of-speech tags and the syntactic tags, where f in FIG. 3_posRepresenting feature vectors based on said part-of-speech tags, f_synRepresenting a feature vector based on the syntactic label. And then, combining the two feature vectors to obtain a first feature vector based on the part of speech feature and the syntactic feature. Specifically, the merging mode of the vectors may be vector connection or vector addition, where the vector connection refers to connecting the head and the tail of two vectorsConcatenating to get a higher-dimensional vector, for example, when both feature vectors are 20-dimensional, a 40-dimensional vector can be obtained after vector concatenation.

After obtaining the first feature vector of the first tag, the embodiment of the present invention may input the first feature vector of the first tag into a neural network model, and through transformation and calculation, a representation of a hidden layer may be obtained, that is, a first hidden state (usually, a vector) of the first feature vector may be obtained, and the first hidden state may be used as a deep learning representation of a part of speech and/or a syntactic structure.

According to at least one embodiment of the present invention, a Bi-directional Long Short-Term Memory network (Bi-LSTM) may be utilized to obtain the hidden state. The Bi-LSTM model is trained with the first feature vector of the first label as input, such that a first hidden state can be obtained by equation 1 below. As shown in FIG. 4, f_iA first feature vector generated in step 12 is input into the Bi-LSTM neural network, and the hidden state at the corresponding moment can be obtained by utilizing the calculation of forward propagation and backward propagation algorithms

In addition, as for the specific structure of the Bi-LSTM model, reference can be made to the description of the prior art, which is not repeated herein for brevity.

In the above formula, Bi-LSTM () represents a Bi-LSTM model;

a hidden layer state vector (i.e. a first hidden state) representing all useful information stored at time i; f. of_iRepresenting a first feature vector based on a first label at time i;

respectively representForward-propagating and backward-propagating hidden layer feature vectors at time i.

And step 13, generating a second feature vector containing dictionary features of a preset dictionary aiming at the training sentence, and generating a second hidden state of the second feature vector through the neural network model, wherein the preset dictionary comprises a plurality of reference labeling results.

Here, word embedding vectors are also generated based on the training sentences, and in order to improve the accuracy of sequence labeling, dictionary features are introduced into the word embedding vectors in the embodiment of the present invention. Specifically, step 13 may include:

1) word embedding vectors of each word in the training sentence are obtained, and pre-trained word embedding vectors can be used for improving the efficiency of model training. The pre-training Word embedding vector may use different Word vector generation models (Word2Vec), such as a Continuous Bag of Words (CBOW) model, a Skip-gram model, or a C & W model.

2) And aiming at each word in the training sentence, generating a one-hot code corresponding to each word according to whether the word context including the word in the training sentence exists in the preset dictionary or not, and obtaining a one-hot vector corresponding to the training sentence, thereby converting dictionary features into vector representation.

Here, it may be searched whether each word in the training sentence exists in the dictionary, if so, the corresponding code of the word is 1, otherwise, the corresponding code is 0, so that a set of codes ordered by the words in the training sentence, that is, the one-hot vector corresponding to the training sentence, may be generated.

In order to improve the accuracy of sequence labeling, a window of a word context may be set when generating the above-mentioned unique heat vector, the window size being N words, the N words constituting a word context. N is typically greater than 1, e.g., N ═ 3. And sliding the window in the training sentence according to the step length of a word, judging whether the word context in the current window exists in the dictionary after each sliding, generating the one-hot coding of the word at the preset position in the window, for example, the one-hot coding of the first or last 1 word in the window, and finally obtaining the one-hot vector of the training sentence.

It should be noted that, in order to further improve the accuracy of sequence labeling, the embodiment of the present invention may further adopt a plurality of windows with different sizes to perform the above operations, and generate the unique heat vectors of the training sentences under different windows respectively. And then carrying out merging operation on the one-hot vectors under different windows, such as vector addition or connection, and taking the merged vector as the one-hot vector of the training sentence.

3) And combining the word embedding vector of the word in the training sentence and the unique heat vector corresponding to the training sentence to obtain a second feature vector containing the dictionary features of the preset dictionary.

Here, the word embedding vector of the word in the training sentence and the one-hot vector corresponding to the training sentence may be combined in a mode of vector addition or vector connection, and the like, which is not specifically limited in the embodiment of the present invention. FIG. 5 gives an example of a join operation on a word-embedded vector and a one-hot vector, where e_wRepresenting word-embedded vectors, f_dicRepresenting a one-hot vector (only 3 bits of data are schematically shown in fig. 5), and connecting the word-embedded vector with the one-hot vector end-to-end, e.g., connecting the head of the one-hot vector with the tail of the word-embedded vector (of course, the front and rear positions of the two vectors may be interchanged), so as to obtain a new vector with one dimension being the sum of the dimensions of the two vectors.

After the second feature vector is obtained, a hidden state (i.e., a second hidden state) of the second feature vector may be generated using a Bi-LSTM model. And training the Bi-LSTM model by taking the second feature vector as an input, so that a second hidden state can be obtained by the following formula 2, wherein the second hidden state comprises the semantic features of word embedding enhanced by dictionary features.

In the above formula, the first and second light sources are,Bi-LSTM () represents the Bi-LSTM model;

a hidden layer state vector (i.e. a second hidden state) representing all useful information stored at time i; e.g. of the type_iRepresenting a second feature vector at time i;

representing the forward-propagating and backward-propagating hidden layer feature vectors at time i, respectively.

And 14, combining the first hidden state and the second hidden state to obtain a third hidden state.

Here, the third hidden state may be obtained by performing a vector join operation or a vector add operation on the first hidden state and the second hidden state. For example, the vector addition operation may be expressed in the form of the following equation 3, resulting in a third hidden state h_i：

And step 15, carrying out sequence labeling according to the third hidden state to obtain a sequence labeling result of the training sentence.

Here, based on the third hidden state, a segment sequence of the training sentence having a short length may be generated, input to an output layer (softmax layer) of the neural network model, train the neural network model, and obtain a label of a category to which each segment sequence output by the softmax layer belongs and a probability thereof. Fig. 6 gives an example of inputting the third hidden state into the neural network model, thereby obtaining the output of the softmax layer. For example, for named entity recognition, the softmax layer may output information such as the category and probability of the named entity existing in the training sentence. In addition, reference may be made to the description of the prior art for the structure of the neural network model, and details are not repeated herein for brevity.

Through the mode, the part-of-speech and/or syntactic characteristics are introduced into the sequence tagging process, and the accuracy of sequence tagging can be improved due to the fact that richer part-of-speech information and more abundant syntactic information are utilized.

After step 15, the neural network model obtained by training may be applied to a specific sequence labeling task to perform sequence labeling processing, for example, to identify and label named entities in a to-be-processed sentence. Because the embodiment of the invention introduces rich part-of-speech information and syntax information when training and generating the neural network model, the neural network model obtained by training has better sequence labeling effect, and the labeling accuracy can be improved when the neural network model is applied to a sequence labeling task.

Based on the above method, an embodiment of the present invention further provides a device for implementing the above method, please refer to fig. 7, the sequence labeling device 70 provided in the embodiment of the present invention, the sequence labeling device 70 can be applied to respective sequence labeling scenarios, and can improve the accuracy of sequence labeling. As shown in fig. 7, the sequence labeling apparatus 70 specifically includes:

a label generating unit 71, configured to generate a first label of a word in a training sentence, where the first label includes a part-of-speech label and/or a syntax label;

a first hidden state generating unit 72, configured to construct, for the training sentence, a first feature vector based on the first label, and generate a first hidden state of the first feature vector through a neural network model;

a second hidden state generating unit 73, configured to generate, for the training sentence, a second feature vector including dictionary features of a preset dictionary, and generate a second hidden state of the second feature vector through the neural network model, where the preset dictionary includes multiple reference labeling results;

a state merging unit 74, configured to merge the first hidden state and the second hidden state to obtain a third hidden state;

and a first labeling processing unit 75, configured to perform sequence labeling according to the third hidden state, and obtain a sequence labeling result of the training sentence.

In addition, according to at least one embodiment of the present invention, the first hidden state generating unit 72 is further configured to replace each word of the training sentence with a probability corresponding to a first label to which the word belongs, so as to obtain the first feature vector; the probability corresponding to the first label to which each word belongs is positively correlated with the proportion of the first class of words in the second class of words, the second class of words are words under the first label to which the word belongs in the training sentence, and the first class of words are words belonging to the reference labeling result in the second class of words.

Furthermore, according to at least one embodiment of the present invention, the second hidden state generating unit 73 is further configured to obtain a word embedding vector of each word in the training sentence; generating an unique hot code corresponding to each word according to whether the word context including the word in the training sentence exists in the preset dictionary or not and obtaining an unique hot vector corresponding to the training sentence; and combining the word embedding vector of the word in the training sentence and the unique heat vector corresponding to the training sentence to obtain a second feature vector containing the dictionary features of the preset dictionary.

Furthermore, according to at least one embodiment of the present invention, the state merging unit 74 is further configured to perform a vector join operation or a vector add operation on the first hidden state and the second hidden state to obtain the third hidden state.

Furthermore, according to at least one embodiment of the present invention, the first labeling processing unit 75 is further configured to generate a segmentation sequence of the training sentence based on the third hidden state, input the segmentation sequence to a softmax layer, which is an output layer of the neural network model, train the neural network model, and obtain labels and probabilities thereof of classes to which the segmentation sequences of the training sentence output by the softmax layer belong.

Furthermore, according to at least one embodiment of the present invention, the sequence labeling apparatus may further include the following units (not shown in fig. 7):

Through the units, the sequence labeling device provided by the embodiment of the invention can introduce the part of speech information and the syntactic information into the sequence labeling, so that the accuracy of the sequence labeling is improved.

Referring to fig. 8, an embodiment of the present invention further provides a hardware structure block diagram of a sequence annotation apparatus, as shown in fig. 8, the sequence annotation apparatus 800 includes:

a processor 802; and

a memory 804, in which memory 804 computer program instructions are stored,

wherein the computer program instructions, when executed by the processor, cause the processor 802 to perform the steps of:

Further, as shown in fig. 8, the sequence labeling apparatus 800 may further include a network interface 801, an input device 803, a hard disk 805, and a display device 806.

The various interfaces and devices described above may be interconnected by a bus architecture. The bus architecture may be any architecture that includes any number of interconnected buses and bridges. Various circuits of one or more Central Processing Units (CPUs), represented in particular by processor 802, and one or more memories, represented by memory 804, are coupled together. The bus architecture may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like. It will be appreciated that a bus architecture is used to enable communications among the components. The bus architecture includes a power bus, a control bus, and a status signal bus, in addition to a data bus, all of which are well known in the art and therefore will not be described in detail herein.

The network interface 801 may be connected to a network (e.g., the internet, a local area network, etc.), receive data (e.g., training sentences) from the network, and store the received data in the hard disk 805.

The input device 803 may receive various commands input by an operator and send the commands to the processor 802 for execution. The input device 803 may include a keyboard or a pointing device (e.g., a mouse, trackball, touch pad, touch screen, or the like).

The display device 806 may display a result obtained by the processor 802 executing the instruction, for example, a result of displaying the sequence annotation.

The memory 804 is used for storing programs and data necessary for operating the operating system, and data such as intermediate results in the calculation process of the processor 802.

It is to be understood that the memory 804 in embodiments of the present invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. The memory 804 of the apparatus and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, memory 804 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof: an operating system 8041 and application programs 8042.

The operating system 8041 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application program 8042 includes various application programs such as a Browser (Browser) and the like for implementing various application services. A program implementing a method according to an embodiment of the present invention may be included in application program 8042.

The sequence labeling method disclosed in the above embodiments of the present invention can be applied to the processor 802, or implemented by the processor 802. The processor 802 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above sequence labeling method may be implemented by hardware integrated logic circuits in the processor 802 or instructions in the form of software. The processor 802 may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, configured to implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present invention. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 804, and the processor 802 reads the information in the memory 804 and performs the steps of the above method in combination with the hardware thereof.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

In particular, the computer program, when executed by the processor 802, may further implement the steps of:

In particular, the computer program, when executed by the processor 802, may further implement the steps of: obtaining word embedding vectors of all words in the training sentence; generating an unique hot code corresponding to each word according to whether the word context including the word in the training sentence exists in the preset dictionary or not and obtaining an unique hot vector corresponding to the training sentence; and combining the word embedding vector of the word in the training sentence and the unique heat vector corresponding to the training sentence to obtain a second feature vector containing the dictionary features of the preset dictionary.

In particular, the computer program, when executed by the processor 802, may further implement the steps of: and carrying out vector connection operation or vector addition operation on the first hidden state and the second hidden state to obtain the third hidden state.

In particular, the computer program, when executed by the processor 802, may further implement the steps of: and generating a segmentation sequence of the training sentence based on the third hidden state, inputting the segmentation sequence to an output layer softmax layer of the neural network model, training the neural network model, and obtaining the label of the class to which each segmentation sequence of the training sentence output by the softmax layer belongs and the probability of the label.

In particular, the computer program, when executed by the processor 802, may further implement the steps of: and carrying out sequence labeling on the sentences to be processed by utilizing the neural network model obtained by training.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part thereof, which essentially contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the sequence labeling method described in the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for labeling a sequence, comprising:

2. The method of claim 1, wherein the step of constructing, for the training sentence, a first feature vector based on the first label comprises:

3. The method of claim 1, wherein the step of generating a second feature vector containing dictionary features of a predetermined dictionary for the training sentence comprises:

obtaining word embedding vectors of all words in the training sentence;

4. The method of claim 1, wherein the step of merging the first hidden state with the second hidden state comprises:

5. The method of claim 1, wherein the step of performing sequence labeling according to the third hidden state of the training sentence comprises:

6. The method of claim 5, wherein after training the neural network model, the method further comprises:

7. A sequence annotation apparatus, comprising:

8. The sequence annotation apparatus of claim 7,

the first hidden state generating unit is further configured to replace each word of the training sentence with a probability corresponding to a first label to which the word belongs, so as to obtain the first feature vector; the probability corresponding to the first label to which each word belongs is positively correlated with the proportion of the first class of words in the second class of words, the second class of words are words under the first label to which the word belongs in the training sentence, and the first class of words are words belonging to the reference labeling result in the second class of words.

9. The sequence annotation apparatus of claim 7,

the second hidden state generating unit is further configured to obtain a word embedding vector of each word in the training sentence; generating an unique hot code corresponding to each word according to whether the word context including the word in the training sentence exists in the preset dictionary or not and obtaining an unique hot vector corresponding to the training sentence; and combining the word embedding vector of the word in the training sentence and the unique heat vector corresponding to the training sentence to obtain a second feature vector containing the dictionary features of the preset dictionary.

10. The sequence annotation apparatus of claim 7,

the state merging unit is further configured to perform vector join operation or vector addition operation on the first hidden state and the second hidden state to obtain the third hidden state.

11. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the sequence labeling method of any one of claims 1 to 7.