CN111160009B - Sequence feature extraction method based on tree-shaped grid memory neural network - Google Patents

Sequence feature extraction method based on tree-shaped grid memory neural network Download PDF

Info

Publication number
CN111160009B
CN111160009B CN201911398270.1A CN201911398270A CN111160009B CN 111160009 B CN111160009 B CN 111160009B CN 201911398270 A CN201911398270 A CN 201911398270A CN 111160009 B CN111160009 B CN 111160009B
Authority
CN
China
Prior art keywords
vector
memory
character
interval
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911398270.1A
Other languages
Chinese (zh)
Other versions
CN111160009A (en
Inventor
辛欣
王睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201911398270.1A priority Critical patent/CN111160009B/en
Publication of CN111160009A publication Critical patent/CN111160009A/en
Application granted granted Critical
Publication of CN111160009B publication Critical patent/CN111160009B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a sequence feature extraction method based on a tree-shaped grid memory neural network, and belongs to the technical field of natural language processing. Firstly, representing each word in a sentence as an embedded vector of a word level by an embedding technology; then, aiming at each character interval, extracting a memory vector and a characteristic vector of the character interval through a recursive tree neural network; then, aiming at each position in the sentence, based on all character intervals taking the position as the end, extracting a memory vector and a feature vector of the position; extracting all text sequence features capable of realizing recursion by the feature vector; finally, the feature vectors of each position are spliced together. The method can better extract the context characteristics of the sentence; the features can be screened and fused based on the recursive structure of the natural language, and useful features for specific tasks are extracted; the method can complete the sequence labeling form tasks in various natural language processing fields by utilizing the internal recursive structure of the language.

Description

Sequence feature extraction method based on tree-shaped grid memory neural network
Technical Field
The invention relates to a sequence feature extraction method based on a tree-shaped grid memory neural network, and belongs to the technical field of natural language processing.
Background
In the field of natural language processing, many tasks are ascribed to sequence tagging problems and are modeled by a machine learning method. In a sequence labeling model based on machine learning, a key problem is how to extract sequence features of natural language sentences.
In the natural language processing technology, tasks depending on a sequence tagging model include named entity recognition, Chinese word segmentation, part of speech tagging and the like, which are important tasks in natural language processing. The named entity recognition task aims to recognize entities marked by names in a given natural language sentence, wherein the entities include place names, person names, organization names and the like. Determining whether each word in the natural language sentence is the beginning, middle, or end of a named entity may resolve the named entity recognition task to a sequence tagging problem. The Chinese word segmentation task aims to determine the boundaries of words in a Chinese sentence, and can also be converted into a sequence labeling problem by determining whether a certain word is started, middle or ended. The part-of-speech tagging task is used for judging the part of speech of each word in a sentence, and is a sequence tagging task. Therefore, the sequence tagging problem has important significance in the technical field of natural language processing.
Sequence labeling problems in the field of natural language processing are typically modeled by machine learning methods. In machine learning, a large number of data sets are marked manually, and then the marked data sets are handed to a machine learning model for learning, so that an automatic sequence marker is obtained. There has been a vast amount of published labeled data in a variety of different fields. To exploit these labeled data, a good sequence labeling machine learning model is required.
In the machine learning model of the sequence labeling problem, a key step is to extract the sequence feature representation of the sentence. The sequence feature representation is that for each basic unit in a sentence, a feature vector capable of reflecting the context information of the basic unit is extracted. In an English sentence, the unit is a word; in a Chinese sentence, the unit may be a word or a character.
The current sequence feature extraction method is usually a recurrent neural network and its variants, such as long-short term memory network. The recurrent neural network reads each element in the sequence in turn and updates its state vector. The state vector for each cell may be used as the context feature vector for that cell. The disadvantage of this approach is that the recurrent neural network processes each unit linearly along the sentence, ignoring the inherent recurrences of natural language sentences. Sentences in natural language are usually of recursive structure, simple phrases are formed from basic characters or words, various components are further added, and finally complete sentences are formed through grammatical forms such as main and predicate objects. Although the sentence in which a human being writes and speaks is a linear structure, the human brain actually understands the sentence in such a recursive form when understanding the sentence. Therefore, when the context feature vector of the sequence unit is extracted, the inherent recursion of the natural language is introduced, compared with a linear extraction mode, the meaning of the sentence can be better understood, and the sequence labeling tasks such as Chinese word segmentation, named entity recognition and the like are facilitated.
Disclosure of Invention
The invention aims to provide a sequence feature extraction method based on a tree-shaped grid memory neural network aiming at the defect that a long-term and short-term memory network cannot utilize the internal recursive structure of a language, which is used for the sequence marking problem in the field of natural language processing.
The core idea of the invention is as follows: firstly, each word in a sentence is expressed as an embedded vector of a word level through an embedding technology; on the basis of the word-level embedded vector, aiming at each word interval, extracting a memory vector and a characteristic vector of the word interval through a recursive tree neural network; the character interval refers to a plurality of continuous characters, and the minimum character interval is an interval formed by one character; then, aiming at each position in the sentence, based on all character intervals taking the position as the end, extracting a memory vector and a feature vector of the position; the feature vector extraction can realize the recursive text sequence feature extraction, and utilizes the inherent recursive structure of the language; finally, the feature vectors of each position are spliced together, namely the sequence features of the sentence are obtained, and the label of the position is judged by using a multi-classification model at each position based on the sequence features of the sentence, so that the problem of sequence label is solved.
The sequence feature extraction method based on the tree-shaped grid memory neural network comprises the following steps:
step 1: carrying out embedded representation on each character in an initially input natural language sentence, specifically comprising the following steps: before entering the tree-like lattice memory network, each word is represented as a word vector e by the embedding function of formula (1)i
ei=embed(xi) (1)
Where x denotes an initially input natural language sentence, formally a sequence of characters, i.e., x ═ x1,x2,…,xM],xiRepresenting the ith character in x, wherein the value range of i is 1 to M; embed (-) is an embedding function, for each input word, finding out the corresponding word vector, the word vector is the modulusThe specific value of a part of the type parameter is obtained through training;
step 2: generating a memory vector and a feature vector of a text interval, and specifically comprising the following substeps:
step 2.1: generating a memory vector of the initial character interval; the length of the initial character interval is 1 and only consists of one character; for the initial character interval corresponding to the ith character, using the embedded vector e of the characteriAs the memory vector c of the initial character intervali
Wherein the value range of i is 1 to M;
step 2.2: generating a feature vector of an initial character interval; multiplying the memory vector by the output gate vector to obtain the characteristic vector h of the initial character interval corresponding to the ith characteriThe method specifically comprises the following steps:
step 2.2A: calculating the output gate coefficient oiSpecifically, the following calculation is carried out through (2):
oi=σ(Wcoci+bo) (2)
wherein, WcoAnd boMapping matrixes and mapping offsets from memory vectors to output gates are respectively used, wherein both the mapping matrixes and the mapping offsets are parameters of the model, and specific values are obtained through a training process; σ (-) is a sigmoid function;
step 2.2B: calculating a feature vector h of the i position by (3)i
hi=oi⊙tanh(ci) (3)
Wherein, tan h () is a hyperbolic tangent function;
step 2.3: combining the character intervals;
the character interval merging refers to merging two small character intervals to obtain a memory vector of a large character interval, and specifically includes: for all the two small character intervals with the forms of (i, j-1) and (i +1, j), combining to obtain a large character interval (i, j); wherein i and j are variables representing the positions of characters, and the value range depends on the length of a sentence;
step 2.4: by (4) and (5)Calculating a forgetting gate vector in the character interval combination; in the feature extraction, the features required by the corresponding tasks need to be concentrated, and all information is not required when the content of the interval is represented; in order to judge which contents are used for representing a large interval between two cells, forgetting gate vectors are calculated for the left and right cells respectively through (4) and (5)
Figure GDA0002649554720000041
And
Figure GDA0002649554720000042
to indicate how much content in both cells should be merged into a large text interval:
Figure GDA0002649554720000051
Figure GDA0002649554720000052
wherein t is a subscript of a currently processed large interval; in calculating the forgetting gate vector for the left text interval,
Figure GDA0002649554720000053
respectively mapping matrixes from a left eigenvector, a right eigenvector, a left memory vector and a right memory vector to a left forgetting gate vector, bflIs the offset vector of the left forget gate vector; when calculating the forgetting gate vector of the right text interval,
Figure GDA0002649554720000054
respectively mapping matrixes from a left eigenvector, a right eigenvector, a left memory vector, a right memory vector to a right forgetting gate vector, bfrIs the offset vector of the right forget gate vector; σ (-) is a sigmoid function;
step 2.5: calculating the memory vector in the character interval combination, and respectively connecting the memory vectors of two cells with their own forgetting gate vectorsCombining to obtain a large-range memory vector ct
Figure GDA0002649554720000055
Wherein,
Figure GDA0002649554720000056
and
Figure GDA0002649554720000057
memory vectors between two cells in the text interval combination are respectively; element-level multiplication of an all-vector;
step 2.6: generating a feature vector h of the merged text intervaltThe method specifically comprises the following steps:
step 2.6A calculate output Gate coefficient otSpecifically, by (7), calculating:
Figure GDA0002649554720000058
wherein,
Figure GDA0002649554720000059
Wcorespectively a mapping matrix of the left eigenvector, a mapping matrix of the right eigenvector and a mapping matrix of the memory vector; boIs a bias vector for the mapping process;
Figure GDA00026495547200000510
Figure GDA00026495547200000511
Wco、boall are parameters of the model, and specific values are obtained through a training process;
step 2.6B: calculating the characteristic vector h of the t position by (8)t
ht=ot⊙tanh(ct) (8)
Wherein, tan h () is a hyperbolic tangent function;
step 2.7: setting the maximum length L of the interval;
the value range of L is 1-N, and N is the length of the current sentence;
step 2.8: repeating the step 2.3-step 2.6L-1 times, and generating memory and feature vectors of all intervals according to the lengths of the intervals from short to long;
and step 3: generating a sequence feature vector of a sentence;
the sequence feature vector is that for each word in a sentence, a feature vector reflecting the context feature of the word is generated; in the tree-shaped mesh memory network, in order to fully utilize the recursive property of a language, a plurality of possible paths are considered when the memory and the characteristics of a certain word are generated; the method specifically comprises the following substeps:
step 3.1: matching a character interval corresponding to each character; for the b-th character in the sentence, finding all character intervals taking the b-th character as the tail, wherein each character interval taking the b-th character as the tail is a path for generating a memory vector of the current character;
step 3.2: generating a path memory vector c by (9)a,b(ii) a For a path from a to b, fusing character interval features on a memory vector corresponding to the position a to generate a fused path memory vector:
ca,b=tanh(Whcha,b+Weceb+bc) (9)
wherein, Whc、WecMapping matrixes from the characteristic vector and the embedded vector to the memory vector are respectively; h isa,bFeature vectors representing paths from a to b, bcIs an offset vector of the memory vector; specific values of the mapping matrix and the memory vector are obtained through training;
step 3.3: by (10), an input gate vector of a path from a to b in the memory fusion of the plurality of paths is calculated:
ia,b=tanh(Whiha,b+Weieb+bi) (10)
wherein, Whi、WeiMapping matrixes from the characteristic vector and the embedded vector to the input gate vector are respectively; biIs an offset vector of the input gate vector; specific values of the mapping matrix and the memory vector are obtained through training;
step 3.4: calculating an attention coefficient vector alpha of each path (a, b) through (11)a,b
Figure GDA0002649554720000071
Wherein exp is an exponential function, and a' means traversing a across all text regions ending with b;
step 3.5; the memory cell amount c of the current word is obtained by weighted averaging (12) of the memory vectors of each pathb
cb=∑a′αa′,b⊙ca′,b (12)
Due to the memory vector cbFrom text intervals including but not limited to the current word, and the memory of the text intervals comes from a recursive method, so that the memory vector of the current word takes into account the recursive nature inherent in natural language when summarizing the context features;
attention coefficient vector ca’,bAnd alphaa’,bThe meaning of a' in (a) is that the traversal is performed on all text regions ending with b;
step 3.6: by (13), for the b position, on the basis of the memory vector, the output gate vector o of the sequence is calculatedb
ob=σ(Whohb-1+Weoeb-1+bo) (13)
Wherein, Who、WeoMapping matrixes from the characteristic vector and the embedded vector to the output gate vector are respectively; h isb-1And eb-1A feature vector and a b-1 word vector respectively representing the b-1 position; boIs an offset vector of the output gate vector; specific values of the mapping matrix and the memory vector are obtained through training;
step 3.7: by (14), a feature vector h of the position e is generatede
he=oe⊙tanh(ce) (14)
Wherein for the e position, an output gate vector o of the sequence is calculated on the basis of the memory vectore(ii) a Memory cell amount c of current worde
Step 3.8: and (3) circularly executing the steps from 3.1 to 3.7 from the starting position to the ending position of the sentence, and generating a feature vector for the position of each word, namely extracting the sequence feature of the sentence.
Advantageous effects
The invention relates to a sequence feature extraction method based on a tree-shaped grid memory neural network, which has the following beneficial effects compared with the prior art:
1. the method utilizes the inherent recursion property in the natural language, can better extract the context characteristics of sentences, namely, in a tree-shaped grid memory network, the characteristic vector of each position in the sequence is extracted according to the inherent recursion structure of the language;
2. the method has transverse and longitudinal forgetting gate vectors, can screen and fuse the features based on the recursive structure of natural language, extracts the features useful for specific tasks, and is more favorable for expressing sentence features compared with the prior method;
3. compared with the existing method based on the recurrent neural network, the method has the advantages that all the characteristic vector extraction can realize the recursive text sequence characteristic extraction, and the method can be used for sequence labeling form tasks in various natural language processing fields by utilizing the inherent recursive structure of the language.
Drawings
FIG. 1 is a flow chart of a sequence feature extraction method based on a tree-shaped grid memory neural network according to the present invention.
Detailed Description
The following describes a sequence feature extraction method based on a tree-shaped mesh memory neural network in detail with reference to specific embodiment 1 and fig. 1.
Example 1
This embodiment describes a specific implementation of the method for extracting sequence features based on the tree-shaped mesh memory neural network according to the present invention.
FIG. 1 is a flow chart of the method.
Step A, inputting a sentence, such as 'Xiaoming reads with great concentration in national diagram'; when x is ═ x1,x2,…,xM]Read book, etc. with the meaning of small, bright, in, country, drawing, special attention],M=9;
Step B, using an embedding technology, converting each word in the sentence in the step A into an embedded vector thereof by the prior art; the method of the invention converts the whole phrase into an embedded vector;
namely for: the existing method analyzes a word sequence by taking a word as a unit, so that the existing method cannot accurately record the information related to the Xiaoming when analyzing the action of reading the book; because: the 'Xiaoming' in the sentence is far away from the 'reading' and the distance is 5;
in the method of the invention, "in the national drawings" and "concentration" are regarded as two phrases; then, the distance from the book reading is shortened from 5 to 2; therefore, the model can remember small and clear related information more favorably, and sentence characteristics can be expressed more favorably.
C, generating a memory vector and a feature vector of the character interval;
when generating the feature vector, the method utilizes the inherent recursion property in the natural language, can better extract the context feature of the sentence, namely in the tree-shaped grid memory network, the feature vector of each position in the sequence is extracted according to the inherent recursion structure of the language;
the method specifically comprises the following substeps:
c.1, generating a memory vector of each initial character interval; directly taking the embedded vector of each character as a memory vector corresponding to the initial character interval;
c.2, generating a feature vector of each initial character interval; the method specifically comprises the following substeps:
step C.2.1, calculating an output gate vector for each initial character interval;
c.2.2, calculating a characteristic vector for each initial character interval according to the output gate vector and the memory vector;
step C.3, setting the maximum interval length L;
c.4, circularly generating memory vectors and feature vectors of all character intervals; the method specifically comprises the following substeps:
step c.4.1, the current merging length l is 2;
step C.4.2, respectively calculating forgetting gate vectors of two cell intervals (i, j-1) and (i +1, j) for each character interval (i, j) with the length of l; wherein the value range of i is 1-N, and N is the sentence length; j is i + l and j is less than or equal to N;
in the step C.4.2, forgetting gate vectors of (i, j-1) and (i +1, j) between two cells are respectively horizontal and longitudinal forgetting gate vectors, so that the features can be screened and fused based on a recursive structure of natural language, and useful features for specific tasks are extracted;
c.4.3, calculating the memory vector of each merging interval according to the memory vectors and the forgetting gate vectors between the two cells;
c.4.4, calculating an output gate vector of each merging interval;
c.4.5, calculating a feature vector of each merging interval;
step c.4.6 modify l ═ l + 1;
c.4.7 repeating C.4.2 to C.4.6 until L is more than or equal to L;
step D, generating a sequence feature vector of each word; the method specifically comprises the following substeps:
step D.1, setting the current processing position b as 1;
d.2, calculating an input gate vector of each character interval with the right boundary b; the method specifically comprises the following substeps:
step d.2.1, the left boundary a of the current text interval is b-L;
step D.2.2, calculating the path memory of the character intervals (a, b);
step D.2.3, calculating an input gate vector of the character interval (a, b);
step d.2.4 a ═ a + 1;
step d.2.5 repeats steps d.2.2 to d.2.4 until a ═ b;
d.3, normalizing the input gate vectors of all the character intervals with the right boundary b;
d.4, generating a memory vector of the position b according to the normalized coefficient vector and the memory vector of the corresponding interval;
d.5, calculating an output gate vector of the position b;
d.6 calculating the characteristic vector according to the memory vector and the output gate vector of the b position
Step d.7 b ═ b + 1;
step d.8 repeats steps d.2 to d.7 until b > ═ N;
step C and step D correspond to step 2 of the invention, namely:
when step 2.7 is implemented, the value of L should be small, and the empirical value range is L ═ 10, due to the consideration of time complexity and the reason that when the text interval is too long, the internal recursion cannot be well preserved; where L corresponds to L in step C.
And E, splicing the feature vectors of each position together and outputting the feature vectors as sequence feature vectors.
The characteristic vector in the invention can extract all text sequence characteristics capable of realizing recursion, can utilize the inherent recursion structure of the language, and can be used for realizing sequence labeling form tasks in various natural language processing fields.
The concrete embodiment is as follows: the recursive structure in the language is exemplified by "mingming at the national drawing for concentration" in step a, and when the recursive height, i.e. the merging length l in the corresponding inventive content, is 1, the result is: [ Xiaoming, in, map of China, concentration, reading book ]; when the combined length l is 2, the result is: [ Xiaoming, in the national picture, pay attention to reading ].
While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.

Claims (4)

1. A sequence feature extraction method based on a tree-shaped grid memory neural network is characterized by comprising the following steps: the method comprises the following steps:
step 1: carrying out embedded representation on each character in an initially input natural language sentence, specifically comprising the following steps: before entering the tree-like lattice memory network, each word is represented as a word vector e by the embedding function of formula (1)i
ei=embed(xi) (1)
Where x denotes an initially input natural language sentence, formally a sequence of characters, i.e., x ═ x1,x2,...,xM],xiRepresenting the ith character in x; embed (·) is an embedding function;
step 2: generating a memory vector and a feature vector of a text interval, and specifically comprising the following substeps:
step 2.1: generating a memory vector of the initial character interval; for the initial character interval corresponding to the ith character, using the embedded vector e of the characteriAs the memory vector c of the initial character intervali
Wherein the value range of i is 1 to M;
step 2.2: generating a feature vector of an initial character interval; multiplying the memory vector by the output gate vector to obtain the characteristic vector h of the initial character interval corresponding to the ith characteriThe method specifically comprises the following steps:
step 2.2A: calculating the output gate coefficient oiSpecifically, the following calculation is carried out through (2):
oi=σ(Wcoci+bo) (2)
wherein, WcoAnd boMapping matrixes and mapping offsets from memory vectors to output gates are respectively used, wherein both the mapping matrixes and the mapping offsets are parameters of the model, and specific values are obtained through a training process; σ (-) is a sigmoid function;
step 2.2B: calculating a feature vector h of the i position by (3)i
hi=oi⊙tanh(ci) (3)
Wherein, tan h () is a hyperbolic tangent function;
step 2.3: combining the character intervals;
the character interval merging refers to merging two small character intervals to obtain a memory vector of a large character interval, and specifically includes: for all the two small character intervals with the forms of (i, j-1) and (i +1, j), combining to obtain a large character interval (i, j);
wherein i and j are variables representing the positions of characters, and the value range depends on the length of a sentence;
step 2.4: calculating a forgetting gate vector in the character interval combination through the steps (4) and (5); in the feature extraction, the features required by the corresponding tasks need to be concentrated, and all information is not required when the content of the interval is represented; in order to judge which contents are used for representing a large interval between two cells, forgetting gate vectors are calculated for the left and right cells respectively through (4) and (5)
Figure FDA0002649554710000021
And
Figure FDA0002649554710000022
to indicate how much content in both cells should be merged into a large text interval:
Figure FDA0002649554710000023
Figure FDA0002649554710000024
wherein t is a subscript of a currently processed large interval; in calculating the forgetting gate vector for the left text interval,
Figure FDA0002649554710000025
respectively mapping matrixes from a left eigenvector, a right eigenvector, a left memory vector and a right memory vector to a left forgetting gate vector, bflIs the offset vector of the left forget gate vector; when calculating the forgetting gate vector of the right text interval,
Figure FDA0002649554710000026
respectively mapping matrixes from a left eigenvector, a right eigenvector, a left memory vector, a right memory vector to a right forgetting gate vector, bfrIs the offset vector of the right forget gate vector; σ (-) is a sigmoid function;
step 2.5: calculating the memory vector in the character interval combination, combining the memory vectors of two small intervals with the respective forgetting gate vector to obtain the memory vector c of large intervalt
Figure FDA0002649554710000027
Wherein,
Figure FDA0002649554710000028
and
Figure FDA0002649554710000029
memory vectors between two cells in the text interval combination are respectively; element-level multiplication of an all-vector;
step 2.6: generating a feature vector h of the merged text intervaltThe method specifically comprises the following steps:
step 2.6A calculate output Gate coefficient otSpecifically, by (7), calculating:
Figure FDA0002649554710000031
wherein,
Figure FDA0002649554710000032
Wcorespectively a mapping matrix of the left eigenvector, a mapping matrix of the right eigenvector and a mapping matrix of the memory vector; boIs a bias vector for the mapping process;
Figure FDA0002649554710000033
Wco、boall are parameters of the model, and specific values are obtained through a training process;
step 2.6B: calculating the characteristic vector h of the t position by (8)t
ht=ot⊙tanh(ct) (8)
Wherein, tan h () is a hyperbolic tangent function;
step 2.7: setting the maximum length L of the interval;
step 2.8: repeating the step 2.3-step 2.6L-1 times, and generating memory and feature vectors of all intervals according to the lengths of the intervals from short to long;
and step 3: generating a sequence feature vector of a sentence;
the sequence feature vector is that for each word in a sentence, a feature vector reflecting the context feature of the word is generated; in the tree-shaped mesh memory network, in order to fully utilize the recursive property of a language, a plurality of possible paths are considered when the memory and the characteristics of a certain word are generated; the method specifically comprises the following substeps:
step 3.1: matching a character interval corresponding to each character; for the b-th character in the sentence, finding all character intervals taking the b-th character as the tail, wherein each character interval taking the b-th character as the tail is a path for generating a memory vector of the current character;
step 3.2: generating a path memory vector c by (9)a,b(ii) a For a path from a to b, fusing character interval features on a memory vector corresponding to the position a to generate a fused path memory vector:
ca,b=tanh(Whcha,b+Weceb+bc) (9)
wherein, Whc、WecMapping matrixes from the characteristic vector and the embedded vector to the memory vector are respectively; h isa,bFeature vectors representing paths from a to b, bcIs an offset vector of the memory vector; specific values of the mapping matrix and the memory vector are obtained through training;
step 3.3: by (10), an input gate vector of a path from a to b in the memory fusion of the plurality of paths is calculated:
ia,b=tanh(Whiha,b+Weieb+bi) (10)
wherein, Whi、WeiMapping matrixes from the characteristic vector and the embedded vector to the input gate vector are respectively; biIs an offset vector of the input gate vector; specific values of the mapping matrix and the memory vector are obtained through training;
step 3.4: calculating an attention coefficient vector alpha of each path (a, b) through (11)a,b
Figure FDA0002649554710000041
Wherein exp is an exponential function, and a' means traversing a across all text regions ending with b;
step 3.5; the memory cell amount c of the current word is obtained by weighted averaging (12) of the memory vectors of each pathb
cb=∑a′αa′,b⊙ca′,b (12)
Due to the memory vector cbFrom text intervals including but not limited to the current word, and the memory of the text intervals comes from a recursive method, so that the memory vector of the current word takes into account the recursive nature inherent in natural language when summarizing the context features; attention coefficient vector ca’,bAnd alphaa’,bThe meaning of a 'in (a') is to traverse through all the words ending with bA performed over the interval;
step 3.6: by (13), for the b position, on the basis of the memory vector, the output gate vector o of the sequence is calculatedb
ob=σ(Whohb-1+Weoeb-1+bo) (13)
Wherein, Who、WeoMapping matrixes from the characteristic vector and the embedded vector to the output gate vector are respectively; h isb-1And eb-1A feature vector and a b-1 word vector respectively representing the b-1 position; boIs an offset vector of the output gate vector; specific values of the mapping matrix and the memory vector are obtained through training;
step 3.7: by (14), a feature vector h of the position e is generatede
he=oe⊙tanh(ce) (14)
Wherein for the e position, an output gate vector o of the sequence is calculated on the basis of the memory vectore(ii) a Memory cell amount c of current worde
Step 3.8: and (3) circularly executing the steps from 3.1 to 3.7 from the starting position to the ending position of the sentence, and generating a feature vector for the position of each word, namely extracting the sequence feature of the sentence.
2. The method for extracting sequence features based on the tree-shaped grid memory neural network as claimed in claim 1, wherein: in the step 1, the value range of i is 1 to M; and (4) searching each input word to obtain a corresponding word vector, wherein the word vector is a part of the model parameters, and the specific value is obtained by training.
3. The method for extracting sequence features based on the tree-shaped grid memory neural network as claimed in claim 1, wherein: the step 2.1 specifically comprises the following steps: the length of the initial character interval is 1 and only consists of one character.
4. The method for extracting sequence features based on the tree-shaped grid memory neural network as claimed in claim 1, wherein: in step 2.7, the value range of L is 1-N, and N is the length of the current sentence.
CN201911398270.1A 2019-12-30 2019-12-30 Sequence feature extraction method based on tree-shaped grid memory neural network Active CN111160009B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911398270.1A CN111160009B (en) 2019-12-30 2019-12-30 Sequence feature extraction method based on tree-shaped grid memory neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911398270.1A CN111160009B (en) 2019-12-30 2019-12-30 Sequence feature extraction method based on tree-shaped grid memory neural network

Publications (2)

Publication Number Publication Date
CN111160009A CN111160009A (en) 2020-05-15
CN111160009B true CN111160009B (en) 2020-12-08

Family

ID=70559526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911398270.1A Active CN111160009B (en) 2019-12-30 2019-12-30 Sequence feature extraction method based on tree-shaped grid memory neural network

Country Status (1)

Country Link
CN (1) CN111160009B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108717409A (en) * 2018-05-16 2018-10-30 联动优势科技有限公司 A kind of sequence labelling method and device
CN109471895A (en) * 2018-10-29 2019-03-15 清华大学 The extraction of electronic health record phenotype, phenotype name authority method and system
CN109815474A (en) * 2017-11-20 2019-05-28 深圳市腾讯计算机系统有限公司 A kind of word order column vector determines method, apparatus, server and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740226A (en) * 2016-01-15 2016-07-06 南京大学 Method for implementing Chinese segmentation by using tree neural network and bilateral neural network
KR20180001889A (en) * 2016-06-28 2018-01-05 삼성전자주식회사 Language processing method and apparatus
CN109086267B (en) * 2018-07-11 2022-07-26 南京邮电大学 Chinese word segmentation method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815474A (en) * 2017-11-20 2019-05-28 深圳市腾讯计算机系统有限公司 A kind of word order column vector determines method, apparatus, server and storage medium
CN108717409A (en) * 2018-05-16 2018-10-30 联动优势科技有限公司 A kind of sequence labelling method and device
CN109471895A (en) * 2018-10-29 2019-03-15 清华大学 The extraction of electronic health record phenotype, phenotype name authority method and system

Also Published As

Publication number Publication date
CN111160009A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN110135457B (en) Event trigger word extraction method and system based on self-encoder fusion document information
CN109284400B (en) Named entity identification method based on Lattice LSTM and language model
CN111738004A (en) Training method of named entity recognition model and named entity recognition method
CN108256066B (en) End-to-end hierarchical decoding task type dialogue system
CN112016271A (en) Language style conversion model training method, text processing method and device
CN111738002A (en) Ancient text field named entity identification method and system based on Lattice LSTM
CN115599901B (en) Machine question-answering method, device, equipment and storage medium based on semantic prompt
CN111460142B (en) Short text classification method and system based on self-attention convolutional neural network
CN114528394B (en) Text triple extraction method and device based on mask language model
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN105389303B (en) A kind of automatic fusion method of heterologous corpus
CN115759254A (en) Question-answering method, system and medium based on knowledge-enhanced generative language model
CN114036950A (en) Medical text named entity recognition method and system
CN114004231A (en) Chinese special word extraction method, system, electronic equipment and storage medium
CN113553847A (en) Method, device, system and storage medium for parsing address text
CN115017890A (en) Text error correction method and device based on character pronunciation and character font similarity
CN113239694B (en) Argument role identification method based on argument phrase
CN117094325B (en) Named entity identification method in rice pest field
CN113705207A (en) Grammar error recognition method and device
CN112149418A (en) Chinese word segmentation method and system based on word vector representation learning
CN111160009B (en) Sequence feature extraction method based on tree-shaped grid memory neural network
CN116702765A (en) Event extraction method and device and electronic equipment
CN112131879A (en) Relationship extraction system, method and device
CN115270795A (en) Small sample learning-based named entity recognition technology in environmental assessment field
CN111401069A (en) Intention recognition method and intention recognition device for conversation text and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant