CN112487796B

CN112487796B - Method and device for sequence labeling and electronic equipment

Info

Publication number: CN112487796B
Application number: CN202011351553.3A
Authority: CN
Inventors: 孟茜; 唐杰; 刘德兵; 仇瑜
Original assignee: Beijing Zhipu Huazhang Technology Co ltd
Current assignee: Beijing Zhipu Huazhang Technology Co ltd
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2022-02-18
Anticipated expiration: 2040-11-27
Also published as: CN112487796A

Abstract

The invention discloses a method and a device for sequence labeling and electronic equipment. According to the method, local context characteristics of a sentence are extracted according to a word vector obtained by converting the sentence, attention expression is obtained according to the word vector and the local context characteristics, and finally the attention expression is input into a sequence labeling layer to obtain a model for sequence labeling. By enhancing the characteristic representation of the local context correlation, the obtained model for sequence labeling can more effectively learn the local context information of the text, so that an accurate recognition result is obtained, and a sequence labeling task and more natural language processing tasks derived from the sequence labeling task are better completed.

Description

Method and device for sequence labeling and electronic equipment

Technical Field

The present invention relates to the field of natural language processing technologies, and in particular, to a method and an apparatus for sequence annotation, and an electronic device.

Background

With the development of artificial intelligence technologies such as deep learning, text mining technology brings increasingly profound influence to the industry, wherein more and more services and applications rely on the assistance of technologies such as knowledge extraction to provide better services.

At present, representative models commonly used for sequence labeling include an LSTM model or an attention model, wherein the LSTM model has low computational efficiency, the attention model greatly improves the computational efficiency through an input parallel mode, but because each input is independent, the disadvantage of insufficient utilization of local context information exists, and a final erroneous recognition result is caused.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides the following technical scheme.

One aspect of the present invention provides a method for sequence annotation, comprising:

converting the sentence into a word vector;

extracting local context characteristics of the sentence according to the word vector;

acquiring attention representation according to the word vector and the local context characteristics;

the attention representation is used as an input training sequence annotation model.

Further, before converting the sentence into a word vector, the method further comprises: the sentence is data-washed and the format of the sentence is converted using the BIO method.

Further, the extracting local context features of the sentence according to the word vector includes:

inputting the word vector into a double-layer convolutional neural network model to obtain a feature map (feature map) representing the local context correlation;

the feature map is converted into a feature representation of local context dependencies.

Further, the inputting the word vector into a two-layer convolutional neural network model to obtain a feature map (feature map) representing local context correlation includes:

based on the selected kernel, performing window sliding from left to right in the sentence direction, and performing two-layer convolution operation on the sentence to obtain a feature map (feature map) representing the local context correlation, wherein the sentence is represented as an L-V matrix, L is the preset sentence length, and V is a word vector at the corresponding position; the feature map is represented as a matrix of N L 'V, N being the number of kernels, L' ≦ L.

Further, the converting the feature map (feature map) into a feature representation of local context dependency includes:

averaging the feature maps (feature maps) represented as a matrix of N L 'V to obtain a vector output of L' V;

the vector is converted into a feature matrix of a specified size, i.e., a feature representation of local context correlation, using a feed forward neural network layer.

Further, the obtaining an attention representation according to the word vector and the local context feature includes:

inputting the word vector into an improved multi-head attention model to calculate multi-head attention, wherein the improved multi-head attention model comprises local context features;

and splicing the multiple attention heads to obtain hidden layer state output as an attention representation.

Further, the sequence labeling layer is a conditional random field layer.

The second aspect of the present invention provides an apparatus for sequence annotation, comprising:

the word vector conversion module is used for converting the sentence into a word vector;

the local context feature extraction module is used for extracting local context features of the sentences according to the word vectors;

an attention representation obtaining module, configured to obtain an attention representation according to the word vector and the local context feature;

and the model generation module is used for inputting the attention representation into the sequence annotation layer to obtain a model for sequence annotation.

A third aspect of the invention provides a memory storing a plurality of instructions for implementing the method described above.

A fourth aspect of the present invention provides an electronic device, comprising a processor and a memory connected to the processor, wherein the memory stores a plurality of instructions, and the instructions are loaded and executed by the processor, so that the processor can execute the method.

The invention has the beneficial effects that: according to the technical scheme provided by the invention, the local context characteristics of the sentence are extracted according to the word vector obtained by converting the sentence, the attention expression is obtained according to the word vector and the local context characteristics, and finally the attention expression is input into a sequence labeling layer to obtain a model for sequence labeling. By enhancing the characteristic representation of the local context correlation, the obtained model for sequence labeling can more effectively learn the local context information of the text, so that an accurate recognition result is obtained, and a sequence labeling task and more natural language processing tasks derived from the sequence labeling task are better completed.

Drawings

FIG. 1 is a schematic flow chart of a method for sequence tagging according to the present invention;

FIG. 2 is a schematic structural diagram of a model for sequence annotation according to the present invention;

FIG. 3 is a schematic structural diagram of an apparatus for sequence labeling according to the present invention.

Detailed Description

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

The method provided by the invention can be implemented in the following terminal environment, and the terminal can comprise one or more of the following components: a processor, a memory, and a display screen. Wherein the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the methods described in the embodiments described below.

A processor may include one or more processing cores. The processor connects various parts within the overall terminal using various interfaces and lines, performs various functions of the terminal and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory, and calling data stored in the memory.

The Memory may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). The memory may be used to store instructions, programs, code sets, or instructions.

The display screen is used for displaying user interfaces of all the application programs.

In addition, those skilled in the art will appreciate that the above-described terminal configurations are not intended to be limiting, and that the terminal may include more or fewer components, or some components may be combined, or a different arrangement of components. For example, the terminal further includes a radio frequency circuit, an input unit, a sensor, an audio circuit, a power supply, and other components, which are not described herein again.

Example one

As shown in fig. 1, an embodiment of the present invention provides a method for sequence annotation, including:

s101, converting a sentence into a word vector;

s102, extracting local context characteristics of the sentence according to the word vector;

s103, acquiring attention representation according to the word vector and the local context characteristics;

and S104, inputting the attention representation into a sequence labeling layer to obtain a model for sequence labeling.

In step S101, in order to avoid introducing wrong word boundary information due to word segmentation errors, the embodiment of the present invention represents semantics included in a text based on a word vector model (chinese processing granularity is word level, and english is word level). Pre-training the text to obtain a set X of expression vector of each word:

X＝{x₀,x₁…x_n}

before step S101, the method further includes: the sentence is data-washed and the format of the sentence is converted using the BIO method. Different methods can be used for data cleaning according to specific text formats. For example, for unformatted data, processing of unicode, removing noise data such as meaningless special symbols, and the like may be performed. When the data is cleaned, paragraphs or sentences can be segmented according to actual needs.

In this embodiment, the starting position of a sequence to be labeled can be represented in other ways, such as B start, I continuous, and O, according to the requirement of the sequence labeling specific task, and a category identifier can be added according to the requirement. For example, "Washington university is located in Seattle" requires that "Washington university" (an organization name) be identified with ORG and "Seattle" (a place name) be identified with LOC, then "Washington university is located in Seattle" is denoted as "Hua-B-ORG, Sheng-I-ORG, dun-I-ORG, bit-O, at-O, West-B-LOC, Ya-I-LOC, Diagram-I-LOC", the remaining characters are identified as a task-related category, or other character O.

Step S102 is performed, including:

Specifically, the following method can be adopted for implementation:

firstly, based on a selected kernel, performing window sliding from left to right in the sentence direction, and performing two-layer convolution operation on a sentence to obtain a feature map (feature map) representing local context correlation, wherein the sentence is represented as an L-V matrix, L is a preset sentence length, and V is a word vector at a corresponding position; feature maps (feature maps) are represented as a matrix of N L 'V, N being the number of kernels, L' ≦ L;

then, averaging the feature maps (feature maps) of the matrix expressed as N L 'V to obtain a vector output of L' V;

Step S103 is performed, including:

Inputting the word vector into an improved multi-head attention model, and calculating the multi-head attention by adopting the following formula:

wherein, F_localIn the form of a local context-feature matrix,

calculated for word vector input in an unmodified multi-head attention modelAttention is paid to the multiple heads.

By the method, the local context features are introduced into the multi-head attention model, so that the attention model can enhance the local context features while acquiring the global context features. Therefore, the model can better learn the text context information so as to more accurately recognize the text.

And step S104 is executed, the attention expression is input into a sequence annotation layer, and a model for sequence annotation is obtained.

In the embodiment of the present invention, the sequence label layer may be a conditional random field layer.

Specifically, the following method can be adopted for implementation:

the attention obtained in step S103 indicates that a Conditional Random Field (CRF) layer is input for sequence type prediction.

Where X (X) is given for a given sentence₁,x₂,…,x_n) Having the predicted tag sequence y ═ (y)₁,y₂,…,y_n) The prediction score can be defined as:

wherein A is a transition score matrix, A_i,jRepresenting the transition score of label i to label j.

Denotes x_tMapping to the y_tThe score of the marker. P_tCan be precisely defined as formula P_t＝W_sh_t+b_sWherein h is_tHidden layer state output for attention, W_sAnd b_sAre trainable parameters.

The probability that the sentence X is labeled as the sequence y can be calculated as

Wherein, Y_XAll possible annotation sequences for a given sentence X

Maximizing annotation sequences using maximum likelihood estimation

Logarithm of likelihood ratio of

Find the highest conditional probability y to serialize the labeled output when decoding:

and finally, outputting a format required by the specific task by using a conversion system.

As an embodiment, fig. 2 is a model for sequence annotation, and the method provided by the present invention can be implemented by the model shown in fig. 2. The model shown in fig. 2 includes: convolutional layers, mapping layers, attention layers, and CRF layers.

The method provided by the embodiment of the invention enhances the characteristic representation of the local context correlation, so that the model for sequence labeling can more effectively learn the local context information of the text, thereby obtaining an accurate identification result.

The method provided by the invention can be applied to all tasks which can be converted into sequence tagging problems, has better effect in the field, can be directly applied to specific tasks of sequence tagging such as word segmentation, part of speech tagging, information extraction and the like through the model for sequence tagging obtained by training of the invention, and can serve more subsequent tasks processed by natural language such as translation, recommendation, reasoning, conversation and the like.

Example two

As shown in fig. 3, another aspect of the present invention further includes a functional module architecture completely corresponding to the foregoing method flow, that is, an embodiment of the present invention further provides an apparatus for sequence tagging, including:

a word vector conversion module 301, configured to convert a sentence into a word vector;

a local context feature extraction module 302, configured to extract local context features of a sentence according to the word vector;

an attention representation obtaining module 303, configured to obtain an attention representation according to the word vector and the local context feature;

and a model generation module 304, configured to input the attention representation into a sequence annotation layer, so as to obtain a model for sequence annotation.

The device provided by the embodiment of the invention also comprises a data preprocessing module which is used for cleaning the sentence and converting the format of the sentence by using a BIO method before converting the sentence into the word vector.

The local context feature extraction module is specifically configured to:

The attention expression obtaining module is specifically configured to:

In the model generation module, the sequence annotation layer may be a conditional random field layer.

The device can be implemented by the method for sequence annotation provided in the first embodiment, and specific implementation methods can be referred to the description in the first embodiment and are not described herein again.

The invention also provides a memory storing a plurality of instructions for implementing the method according to the first embodiment.

The invention also provides an electronic device comprising a processor and a memory connected to the processor, wherein the memory stores a plurality of instructions, and the instructions can be loaded and executed by the processor to enable the processor to execute the method according to the first embodiment.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for sequence annotation, comprising:

converting the sentence into a word vector;

inputting the attention representation into a sequence labeling layer to obtain a model for sequence labeling;

before converting the sentence into a word vector, the method further comprises the following steps: washing the sentences and converting the formats of the sentences by using a BIO method;

the extracting of the local context feature of the sentence according to the word vector comprises:

-converting the feature map (feature map) into a feature representation of local context dependencies;

inputting the word vector into a double-layer convolutional neural network model to obtain a feature map (feature map) representing local context correlation, including:

based on the selected kernel, performing window sliding from left to right in the sentence direction, and performing two-layer convolution operation on the sentence to obtain a feature map (feature map) representing the local context correlation, wherein the sentence is represented as an L-V matrix, L is the preset sentence length, and V is a word vector at the corresponding position; feature maps (feature maps) are represented as a matrix of N L 'V, N being the number of kernels, L' ≦ L;

said converting said feature map (feature map) into a feature representation of local context dependencies, comprising:

converting the vector into a feature matrix with a specified size, namely a feature representation of local context correlation by utilizing a feedforward neural network layer;

the obtaining an attention representation according to the word vector and the local context feature includes:

2. The method for sequence annotation of claim 1, wherein the sequence annotation layer is a conditional random field layer.

3. An apparatus for sequence annotation, comprising:

the model generation module is used for inputting the attention representation into a sequence labeling layer to obtain a model for sequence labeling;

the system also comprises a data preprocessing module, a word vector generating module and a word vector generating module, wherein the data preprocessing module is used for cleaning sentences and converting the formats of the sentences by using a BIO method before converting the sentences into word vectors;

the local context feature extraction module is specifically configured to:

the attention expression obtaining module is specifically configured to:

4. A memory storing a plurality of instructions for implementing the method of any one of claims 1-2.

5. An electronic device comprising a processor and a memory coupled to the processor, the memory storing a plurality of instructions that are loadable and executable by the processor to enable the processor to perform the method according to any of claims 1-2.