CN111160009B

CN111160009B - Sequence feature extraction method based on tree-shaped grid memory neural network

Info

Publication number: CN111160009B
Application number: CN201911398270.1A
Authority: CN
Inventors: 辛欣; 王睿
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-12-08
Anticipated expiration: 2039-12-30
Also published as: CN111160009A

Abstract

The invention relates to a sequence feature extraction method based on a tree-shaped grid memory neural network, and belongs to the technical field of natural language processing. Firstly, representing each word in a sentence as an embedded vector of a word level by an embedding technology; then, aiming at each character interval, extracting a memory vector and a characteristic vector of the character interval through a recursive tree neural network; then, aiming at each position in the sentence, based on all character intervals taking the position as the end, extracting a memory vector and a feature vector of the position; extracting all text sequence features capable of realizing recursion by the feature vector; finally, the feature vectors of each position are spliced together. The method can better extract the context characteristics of the sentence; the features can be screened and fused based on the recursive structure of the natural language, and useful features for specific tasks are extracted; the method can complete the sequence labeling form tasks in various natural language processing fields by utilizing the internal recursive structure of the language.

Description

Sequence feature extraction method based on tree-shaped grid memory neural network

Technical Field

The invention relates to a sequence feature extraction method based on a tree-shaped grid memory neural network, and belongs to the technical field of natural language processing.

Background

In the field of natural language processing, many tasks are ascribed to sequence tagging problems and are modeled by a machine learning method. In a sequence labeling model based on machine learning, a key problem is how to extract sequence features of natural language sentences.

In the natural language processing technology, tasks depending on a sequence tagging model include named entity recognition, Chinese word segmentation, part of speech tagging and the like, which are important tasks in natural language processing. The named entity recognition task aims to recognize entities marked by names in a given natural language sentence, wherein the entities include place names, person names, organization names and the like. Determining whether each word in the natural language sentence is the beginning, middle, or end of a named entity may resolve the named entity recognition task to a sequence tagging problem. The Chinese word segmentation task aims to determine the boundaries of words in a Chinese sentence, and can also be converted into a sequence labeling problem by determining whether a certain word is started, middle or ended. The part-of-speech tagging task is used for judging the part of speech of each word in a sentence, and is a sequence tagging task. Therefore, the sequence tagging problem has important significance in the technical field of natural language processing.

Sequence labeling problems in the field of natural language processing are typically modeled by machine learning methods. In machine learning, a large number of data sets are marked manually, and then the marked data sets are handed to a machine learning model for learning, so that an automatic sequence marker is obtained. There has been a vast amount of published labeled data in a variety of different fields. To exploit these labeled data, a good sequence labeling machine learning model is required.

In the machine learning model of the sequence labeling problem, a key step is to extract the sequence feature representation of the sentence. The sequence feature representation is that for each basic unit in a sentence, a feature vector capable of reflecting the context information of the basic unit is extracted. In an English sentence, the unit is a word; in a Chinese sentence, the unit may be a word or a character.

The current sequence feature extraction method is usually a recurrent neural network and its variants, such as long-short term memory network. The recurrent neural network reads each element in the sequence in turn and updates its state vector. The state vector for each cell may be used as the context feature vector for that cell. The disadvantage of this approach is that the recurrent neural network processes each unit linearly along the sentence, ignoring the inherent recurrences of natural language sentences. Sentences in natural language are usually of recursive structure, simple phrases are formed from basic characters or words, various components are further added, and finally complete sentences are formed through grammatical forms such as main and predicate objects. Although the sentence in which a human being writes and speaks is a linear structure, the human brain actually understands the sentence in such a recursive form when understanding the sentence. Therefore, when the context feature vector of the sequence unit is extracted, the inherent recursion of the natural language is introduced, compared with a linear extraction mode, the meaning of the sentence can be better understood, and the sequence labeling tasks such as Chinese word segmentation, named entity recognition and the like are facilitated.

Disclosure of Invention

The invention aims to provide a sequence feature extraction method based on a tree-shaped grid memory neural network aiming at the defect that a long-term and short-term memory network cannot utilize the internal recursive structure of a language, which is used for the sequence marking problem in the field of natural language processing.

The core idea of the invention is as follows: firstly, each word in a sentence is expressed as an embedded vector of a word level through an embedding technology; on the basis of the word-level embedded vector, aiming at each word interval, extracting a memory vector and a characteristic vector of the word interval through a recursive tree neural network; the character interval refers to a plurality of continuous characters, and the minimum character interval is an interval formed by one character; then, aiming at each position in the sentence, based on all character intervals taking the position as the end, extracting a memory vector and a feature vector of the position; the feature vector extraction can realize the recursive text sequence feature extraction, and utilizes the inherent recursive structure of the language; finally, the feature vectors of each position are spliced together, namely the sequence features of the sentence are obtained, and the label of the position is judged by using a multi-classification model at each position based on the sequence features of the sentence, so that the problem of sequence label is solved.

The sequence feature extraction method based on the tree-shaped grid memory neural network comprises the following steps:

step 1: carrying out embedded representation on each character in an initially input natural language sentence, specifically comprising the following steps: before entering the tree-like lattice memory network, each word is represented as a word vector e by the embedding function of formula (1)_i：

e_i＝embed(x_i) (1)

Where x denotes an initially input natural language sentence, formally a sequence of characters, i.e., x ═ x₁,x₂,…,x_M]，x_iRepresenting the ith character in x, wherein the value range of i is 1 to M; embed (-) is an embedding function, for each input word, finding out the corresponding word vector, the word vector is the modulusThe specific value of a part of the type parameter is obtained through training;

step 2: generating a memory vector and a feature vector of a text interval, and specifically comprising the following substeps:

step 2.1: generating a memory vector of the initial character interval; the length of the initial character interval is 1 and only consists of one character; for the initial character interval corresponding to the ith character, using the embedded vector e of the character_iAs the memory vector c of the initial character interval_i；

Wherein the value range of i is 1 to M;

step 2.2: generating a feature vector of an initial character interval; multiplying the memory vector by the output gate vector to obtain the characteristic vector h of the initial character interval corresponding to the ith character_iThe method specifically comprises the following steps:

step 2.2A: calculating the output gate coefficient o_iSpecifically, the following calculation is carried out through (2):

o_i＝σ(W_coc_i+b_o) (2)

wherein, W_coAnd b_oMapping matrixes and mapping offsets from memory vectors to output gates are respectively used, wherein both the mapping matrixes and the mapping offsets are parameters of the model, and specific values are obtained through a training process; σ (-) is a sigmoid function;

step 2.2B: calculating a feature vector h of the i position by (3)_i：

h_i＝o_i⊙tanh(c_i) (3)

Wherein, tan h () is a hyperbolic tangent function;

step 2.3: combining the character intervals;

the character interval merging refers to merging two small character intervals to obtain a memory vector of a large character interval, and specifically includes: for all the two small character intervals with the forms of (i, j-1) and (i +1, j), combining to obtain a large character interval (i, j); wherein i and j are variables representing the positions of characters, and the value range depends on the length of a sentence;

step 2.4: by (4) and (5)Calculating a forgetting gate vector in the character interval combination; in the feature extraction, the features required by the corresponding tasks need to be concentrated, and all information is not required when the content of the interval is represented; in order to judge which contents are used for representing a large interval between two cells, forgetting gate vectors are calculated for the left and right cells respectively through (4) and (5)

And

to indicate how much content in both cells should be merged into a large text interval:

wherein t is a subscript of a currently processed large interval; in calculating the forgetting gate vector for the left text interval,

respectively mapping matrixes from a left eigenvector, a right eigenvector, a left memory vector and a right memory vector to a left forgetting gate vector, b_flIs the offset vector of the left forget gate vector; when calculating the forgetting gate vector of the right text interval,

respectively mapping matrixes from a left eigenvector, a right eigenvector, a left memory vector, a right memory vector to a right forgetting gate vector, b_frIs the offset vector of the right forget gate vector; σ (-) is a sigmoid function;

step 2.5: calculating the memory vector in the character interval combination, and respectively connecting the memory vectors of two cells with their own forgetting gate vectorsCombining to obtain a large-range memory vector c_t：

Wherein,

and

memory vectors between two cells in the text interval combination are respectively; element-level multiplication of an all-vector;

step 2.6: generating a feature vector h of the merged text interval_tThe method specifically comprises the following steps:

step 2.6A calculate output Gate coefficient o_tSpecifically, by (7), calculating:

wherein,

W_corespectively a mapping matrix of the left eigenvector, a mapping matrix of the right eigenvector and a mapping matrix of the memory vector; b_oIs a bias vector for the mapping process;

W_co、b_oall are parameters of the model, and specific values are obtained through a training process;

step 2.6B: calculating the characteristic vector h of the t position by (8)_t：

h_t＝o_t⊙tanh(c_t) (8)

Wherein, tan h () is a hyperbolic tangent function;

step 2.7: setting the maximum length L of the interval;

the value range of L is 1-N, and N is the length of the current sentence;

step 2.8: repeating the step 2.3-step 2.6L-1 times, and generating memory and feature vectors of all intervals according to the lengths of the intervals from short to long;

and step 3: generating a sequence feature vector of a sentence;

the sequence feature vector is that for each word in a sentence, a feature vector reflecting the context feature of the word is generated; in the tree-shaped mesh memory network, in order to fully utilize the recursive property of a language, a plurality of possible paths are considered when the memory and the characteristics of a certain word are generated; the method specifically comprises the following substeps:

step 3.1: matching a character interval corresponding to each character; for the b-th character in the sentence, finding all character intervals taking the b-th character as the tail, wherein each character interval taking the b-th character as the tail is a path for generating a memory vector of the current character;

step 3.2: generating a path memory vector c by (9)_a,b(ii) a For a path from a to b, fusing character interval features on a memory vector corresponding to the position a to generate a fused path memory vector:

c_a,b＝tanh(W_hch_a,b+W_ece_b+b_c) (9)

wherein, W_hc、W_ecMapping matrixes from the characteristic vector and the embedded vector to the memory vector are respectively; h is_a,bFeature vectors representing paths from a to b, b_cIs an offset vector of the memory vector; specific values of the mapping matrix and the memory vector are obtained through training;

step 3.3: by (10), an input gate vector of a path from a to b in the memory fusion of the plurality of paths is calculated:

i_a,b＝tanh(W_hih_a,b+W_eie_b+b_i) (10)

wherein, W_hi、W_eiMapping matrixes from the characteristic vector and the embedded vector to the input gate vector are respectively; b_iIs an offset vector of the input gate vector; specific values of the mapping matrix and the memory vector are obtained through training;

step 3.4: calculating an attention coefficient vector alpha of each path (a, b) through (11)_a,b；

Wherein exp is an exponential function, and a' means traversing a across all text regions ending with b;

step 3.5; the memory cell amount c of the current word is obtained by weighted averaging (12) of the memory vectors of each path_b：

c_b＝∑_a′α_a′,b⊙c_a′,b (12)

Due to the memory vector c_bFrom text intervals including but not limited to the current word, and the memory of the text intervals comes from a recursive method, so that the memory vector of the current word takes into account the recursive nature inherent in natural language when summarizing the context features;

attention coefficient vector c_a’,bAnd alpha_a’,bThe meaning of a' in (a) is that the traversal is performed on all text regions ending with b;

step 3.6: by (13), for the b position, on the basis of the memory vector, the output gate vector o of the sequence is calculated_b：

o_b＝σ(W_hoh_b-1+W_eoe_b-1+b_o) (13)

Wherein, W_ho、W_eoMapping matrixes from the characteristic vector and the embedded vector to the output gate vector are respectively; h is_b-1And e_b-1A feature vector and a b-1 word vector respectively representing the b-1 position; b_oIs an offset vector of the output gate vector; specific values of the mapping matrix and the memory vector are obtained through training;

step 3.7: by (14), a feature vector h of the position e is generated_e：

h_e＝o_e⊙tanh(c_e) (14)

Wherein for the e position, an output gate vector o of the sequence is calculated on the basis of the memory vector_e(ii) a Memory cell amount c of current word_e；

Step 3.8: and (3) circularly executing the steps from 3.1 to 3.7 from the starting position to the ending position of the sentence, and generating a feature vector for the position of each word, namely extracting the sequence feature of the sentence.

Advantageous effects

The invention relates to a sequence feature extraction method based on a tree-shaped grid memory neural network, which has the following beneficial effects compared with the prior art:

1. the method utilizes the inherent recursion property in the natural language, can better extract the context characteristics of sentences, namely, in a tree-shaped grid memory network, the characteristic vector of each position in the sequence is extracted according to the inherent recursion structure of the language;

2. the method has transverse and longitudinal forgetting gate vectors, can screen and fuse the features based on the recursive structure of natural language, extracts the features useful for specific tasks, and is more favorable for expressing sentence features compared with the prior method;

3. compared with the existing method based on the recurrent neural network, the method has the advantages that all the characteristic vector extraction can realize the recursive text sequence characteristic extraction, and the method can be used for sequence labeling form tasks in various natural language processing fields by utilizing the inherent recursive structure of the language.

Drawings

FIG. 1 is a flow chart of a sequence feature extraction method based on a tree-shaped grid memory neural network according to the present invention.

Detailed Description

The following describes a sequence feature extraction method based on a tree-shaped mesh memory neural network in detail with reference to specific embodiment 1 and fig. 1.

Example 1

This embodiment describes a specific implementation of the method for extracting sequence features based on the tree-shaped mesh memory neural network according to the present invention.

FIG. 1 is a flow chart of the method.

Step A, inputting a sentence, such as 'Xiaoming reads with great concentration in national diagram'; when x is ═ x₁,x₂,…,x_M]Read book, etc. with the meaning of small, bright, in, country, drawing, special attention]，M＝9；

Step B, using an embedding technology, converting each word in the sentence in the step A into an embedded vector thereof by the prior art; the method of the invention converts the whole phrase into an embedded vector;

namely for: the existing method analyzes a word sequence by taking a word as a unit, so that the existing method cannot accurately record the information related to the Xiaoming when analyzing the action of reading the book; because: the 'Xiaoming' in the sentence is far away from the 'reading' and the distance is 5;

in the method of the invention, "in the national drawings" and "concentration" are regarded as two phrases; then, the distance from the book reading is shortened from 5 to 2; therefore, the model can remember small and clear related information more favorably, and sentence characteristics can be expressed more favorably.

C, generating a memory vector and a feature vector of the character interval;

when generating the feature vector, the method utilizes the inherent recursion property in the natural language, can better extract the context feature of the sentence, namely in the tree-shaped grid memory network, the feature vector of each position in the sequence is extracted according to the inherent recursion structure of the language;

the method specifically comprises the following substeps:

c.1, generating a memory vector of each initial character interval; directly taking the embedded vector of each character as a memory vector corresponding to the initial character interval;

c.2, generating a feature vector of each initial character interval; the method specifically comprises the following substeps:

step C.2.1, calculating an output gate vector for each initial character interval;

c.2.2, calculating a characteristic vector for each initial character interval according to the output gate vector and the memory vector;

step C.3, setting the maximum interval length L;

c.4, circularly generating memory vectors and feature vectors of all character intervals; the method specifically comprises the following substeps:

step c.4.1, the current merging length l is 2;

step C.4.2, respectively calculating forgetting gate vectors of two cell intervals (i, j-1) and (i +1, j) for each character interval (i, j) with the length of l; wherein the value range of i is 1-N, and N is the sentence length; j is i + l and j is less than or equal to N;

in the step C.4.2, forgetting gate vectors of (i, j-1) and (i +1, j) between two cells are respectively horizontal and longitudinal forgetting gate vectors, so that the features can be screened and fused based on a recursive structure of natural language, and useful features for specific tasks are extracted;

c.4.3, calculating the memory vector of each merging interval according to the memory vectors and the forgetting gate vectors between the two cells;

c.4.4, calculating an output gate vector of each merging interval;

c.4.5, calculating a feature vector of each merging interval;

step c.4.6 modify l ═ l + 1;

c.4.7 repeating C.4.2 to C.4.6 until L is more than or equal to L;

step D, generating a sequence feature vector of each word; the method specifically comprises the following substeps:

step D.1, setting the current processing position b as 1;

d.2, calculating an input gate vector of each character interval with the right boundary b; the method specifically comprises the following substeps:

step d.2.1, the left boundary a of the current text interval is b-L;

step D.2.2, calculating the path memory of the character intervals (a, b);

step D.2.3, calculating an input gate vector of the character interval (a, b);

step d.2.4 a ═ a + 1;

step d.2.5 repeats steps d.2.2 to d.2.4 until a ═ b;

d.3, normalizing the input gate vectors of all the character intervals with the right boundary b;

d.4, generating a memory vector of the position b according to the normalized coefficient vector and the memory vector of the corresponding interval;

d.5, calculating an output gate vector of the position b;

d.6 calculating the characteristic vector according to the memory vector and the output gate vector of the b position

Step d.7 b ═ b + 1;

step d.8 repeats steps d.2 to d.7 until b > ═ N;

step C and step D correspond to step 2 of the invention, namely:

when step 2.7 is implemented, the value of L should be small, and the empirical value range is L ═ 10, due to the consideration of time complexity and the reason that when the text interval is too long, the internal recursion cannot be well preserved; where L corresponds to L in step C.

And E, splicing the feature vectors of each position together and outputting the feature vectors as sequence feature vectors.

The characteristic vector in the invention can extract all text sequence characteristics capable of realizing recursion, can utilize the inherent recursion structure of the language, and can be used for realizing sequence labeling form tasks in various natural language processing fields.

The concrete embodiment is as follows: the recursive structure in the language is exemplified by "mingming at the national drawing for concentration" in step a, and when the recursive height, i.e. the merging length l in the corresponding inventive content, is 1, the result is: [ Xiaoming, in, map of China, concentration, reading book ]; when the combined length l is 2, the result is: [ Xiaoming, in the national picture, pay attention to reading ].

While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.

Claims

1. A sequence feature extraction method based on a tree-shaped grid memory neural network is characterized by comprising the following steps: the method comprises the following steps:

e_i＝embed(x_i) (1)

Where x denotes an initially input natural language sentence, formally a sequence of characters, i.e., x ═ x₁，x₂，...，x_M]，x_iRepresenting the ith character in x; embed (·) is an embedding function;

step 2.1: generating a memory vector of the initial character interval; for the initial character interval corresponding to the ith character, using the embedded vector e of the character_iAs the memory vector c of the initial character interval_i；

Wherein the value range of i is 1 to M;

o_i＝σ(W_coc_i+b_o) (2)

step 2.2B: calculating a feature vector h of the i position by (3)_i：

h_i＝o_i⊙tanh(c_i) (3)

Wherein, tan h () is a hyperbolic tangent function;

step 2.3: combining the character intervals;

the character interval merging refers to merging two small character intervals to obtain a memory vector of a large character interval, and specifically includes: for all the two small character intervals with the forms of (i, j-1) and (i +1, j), combining to obtain a large character interval (i, j);

wherein i and j are variables representing the positions of characters, and the value range depends on the length of a sentence;

step 2.4: calculating a forgetting gate vector in the character interval combination through the steps (4) and (5); in the feature extraction, the features required by the corresponding tasks need to be concentrated, and all information is not required when the content of the interval is represented; in order to judge which contents are used for representing a large interval between two cells, forgetting gate vectors are calculated for the left and right cells respectively through (4) and (5)

And

step 2.5: calculating the memory vector in the character interval combination, combining the memory vectors of two small intervals with the respective forgetting gate vector to obtain the memory vector c of large interval_t：

Wherein,

and

wherein,

h_t＝o_t⊙tanh(c_t) (8)

Wherein, tan h () is a hyperbolic tangent function;

step 2.7: setting the maximum length L of the interval;

and step 3: generating a sequence feature vector of a sentence;

step 3.2: generating a path memory vector c by (9)_a，b(ii) a For a path from a to b, fusing character interval features on a memory vector corresponding to the position a to generate a fused path memory vector:

c_a，b＝tanh(W_hch_a，b+W_ece_b+b_c) (9)

wherein, W_hc、W_ecMapping matrixes from the characteristic vector and the embedded vector to the memory vector are respectively; h is_a，bFeature vectors representing paths from a to b, b_cIs an offset vector of the memory vector; specific values of the mapping matrix and the memory vector are obtained through training;

i_a，b＝tanh(W_hih_a，b+W_eie_b+b_i) (10)

step 3.4: calculating an attention coefficient vector alpha of each path (a, b) through (11)_a，b；

c_b＝∑_a′α_a′，b⊙c_a′，b (12)

Due to the memory vector c_bFrom text intervals including but not limited to the current word, and the memory of the text intervals comes from a recursive method, so that the memory vector of the current word takes into account the recursive nature inherent in natural language when summarizing the context features; attention coefficient vector c_a’，bAnd alpha_a’，bThe meaning of a 'in (a') is to traverse through all the words ending with bA performed over the interval;

o_b＝σ(W_hoh_b-1+W_eoe_b-1+b_o) (13)

step 3.7: by (14), a feature vector h of the position e is generated_e：

h_e＝o_e⊙tanh(c_e) (14)

2. The method for extracting sequence features based on the tree-shaped grid memory neural network as claimed in claim 1, wherein: in the step 1, the value range of i is 1 to M; and (4) searching each input word to obtain a corresponding word vector, wherein the word vector is a part of the model parameters, and the specific value is obtained by training.

3. The method for extracting sequence features based on the tree-shaped grid memory neural network as claimed in claim 1, wherein: the step 2.1 specifically comprises the following steps: the length of the initial character interval is 1 and only consists of one character.

4. The method for extracting sequence features based on the tree-shaped grid memory neural network as claimed in claim 1, wherein: in step 2.7, the value range of L is 1-N, and N is the length of the current sentence.