CN108717434A

CN108717434A - A kind of text sort method of the point-by-point tactful and pairs of strategy of mixing

Info

Publication number: CN108717434A
Application number: CN201810460253.5A
Authority: CN
Inventors: 黄书剑; 王�琦; 戴新宇; 张建兵; 尹存燕; 陈家骏
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2018-05-15
Filing date: 2018-05-15
Publication date: 2018-10-30
Anticipated expiration: 2038-05-15
Also published as: CN108717434B

Abstract

The invention discloses a kind of text sort methods of the point-by-point tactful and pairs of strategy of mixing, are related to one kind to sentence, and syntax tree equal samples preferably utilize the sort method of the different information between sample during sorting, including：Point-by-point phase sorting, pairs of phase sorting.Point-by-point phase sorting gives a mark to sample and does the first minor sort, and filter out the higher candidate of score, on the basis of these candidates, pairs of phase sorting further is carried out to it, wherein tactful sort uses a kind of encode, with reference to the flow for generating, comparing, re-encode, giving a mark in pairs, a kind of node method of weighting based on span is being devised with reference to during, a kind of method that span and attention mechanism are combined, finally according to the basic score of sample and point-by-point sequence score, pairs of phase sorting score carries out integrated ordered.

Description

A kind of text sort method of the point-by-point tactful and pairs of strategy of mixing

Technical field

The invention belongs to sorting by computer learning areas, are related to a kind of text sequence of the point-by-point tactful and pairs of strategy of mixing Method.

Background technology

Sequence study be in handling ordered problem using using machine learning method come the method for training pattern.Sequence is learned Habit can be applied in information retrieval, natural language processing, data mining etc..The research of sequence study is always information retrieval The hot and difficult issue in field.

Strategy during learning tasks are divided into three by sequence study:Point-by-point strategy (pointwise) is tactful in pairs (pairwise), list policy (listwise).In point-by-point strategy, sequencing problem is converted into classification, recurrence or sequence Classification problem.Therefore, the structuring of sequence can be ignored in this fashion.In pairs of strategy, sample is formed into sample pair, Single document is not considered, but considers the relationship between sample pair.Point-by-point method is by each sample inside training set as training Example, control methods is by sample to as training example.

In machine translation, syntactic analysis task, translation result and syntax tree are the result is that by searching decisions step by step It obtains, generally use column searching method decodes, and in Candidate Set after search, often there is the prediction result than model Better result.It is a promising optimization direction that better sample is selected in last Candidate Set, not with Optimized model It is same, it waits for being many complete samples in Candidate Set, model can preferably arrange sample according to its global information Sequence, and it is not limited solely to local feature.

Invention content

Goal of the invention：The present invention provides a kind of text sort method of the point-by-point tactful and pairs of strategy of mixing.Point-by-point strategy Can be given a mark for it according to the structure feature of sample itself, it can model the structure of single sample well, into strategy then more Difference and contact between two samples are paid attention in, by the advantages of both combining, to improve sorting unit performance.

The invention discloses a kind of text sort methods of the point-by-point tactful and pairs of strategy of mixing, include the following steps：

Step 1 is ranked up sentence or syntax tree, and node wherein included is respectively word or tree node, sample when initial Originally there is a basic score, this score comes from original model.Sample is ranked up using point-by-point order models, in point-by-point plan In the result slightly to have sorted, the score of point-by-point phase sorting is obtained, the higher sample of K score is chosen and forms Candidate Set, use Pairs of order models carry out marking sequence to Candidate Set；

Step 2, strategy sequence in pairs is carried out to the Candidate Set of K sample composition, obtains sample in pairs of phase sorting Score；

Step 3, the score, the score of pairs of phase sorting of comprehensive point-by-point phase sorting and the basic score of sample are to sample It is weighted sequence.

Step 1 includes：

Step 1-1 encodes sample using encoder, and the continuous type for obtaining each node indicates, i.e., by sample set Close x={ x₁,…,x_i,…,x_n, be converted to h={ h₁,…,h_i,…,h_nHidden layer expression, x_nIndicate n-th of section in sample set Point, node are also known as unit, h_nIt indicates to obtain x by coding_nHidden layer indicate；The encoder is structure recurrent neural network Recursive Neural Network or tree-like shot and long term memory network Tree-LSTM or two-way shot and long term memory networks Bi- LSTM。

Hidden layer is indicated that carrying out linear transformation obtains the score of each unit in sample, by obtaining for all units by step 1-2 The score S for dividing summation just to obtain sample_pointwise：

Wherein V₁For model parameter, for node to be indicated to be converted to by linear transformation in point phase sorting one by one Point；

The score S of the point-by-point phase sorting of sample is calculated in step 1-3₁：

S₁=α * S_pointwise+(1-α)*S_base,

Wherein, α indicates that hyper parameter, this hyper parameter are obtained by being adjusted in development set data, S_baseFor the basis of sample Score, basic score is from the baseline system for generating sample；

Step 1-4 is ranked up sample by the obtained scores of step 1-3, chooses the higher sample composition of K score Candidate Set, K are hyper parameter, are generally set to 8, and K sample is combined into sample pair, altogether K* (K-1)/2 sample pair.

Step 2 includes：

Step 2-1, sample A and sample B in K sample form sample pair<A, B>, using encoder by sample pair<A, B >In nodes encoding at hidden layer vector indicate；

Step 2-2 is compared to obtain the reference vector of node in sample pair；

Step 2-3 obtains the new vector expression after comparison according to the hidden layer vector sum of node with reference to vector；

Step 2-4 is that the vector that step 2-3 is obtained indicates to be encoded by encoder；It is equivalent to and makees neural network It is encoded again on the expression vector after two samples for comparing so that network can be captured and be concluded between two samples Different information.

Step 2-5, obtains sample pair<A, B>Relative score, all samples pair with respect to giving a mark in comprehensive K sample To sample pairs of phase sorting score.

In step 2-1, the encoder is structure recurrent neural network Recursive Neural Network or tree-like Shot and long term memory network Tree-LSTM or two-way shot and long term memory networks Bi-LSTM.

Step 2-2 includes：Based on attention mechanism (Attention Mechanism), sample A is calculated by following formula In node a_iReference vector

Wherein, n indicates the number of sample A nodes,

b_jIndicate j-th of node in sample B, e_ijIndicate node a_iAnd b_jThe degree of correlation.

The node a in sample A can also be calculated in step 2-2 by following formula_iReference vector

Wherein span () function representation obtains the range intervals of the leaf node included by node x, len () function representation The leaf node number included by node is obtained, b indicates to be included in section span (a in sample B_i) section some node, should Method be the node method of weighting based on span, suitable for reordering for syntax tree, refer in syntax tree, can be according to each The span of internal node, that is, the leaf node that includes range, to establish the antagonistic relations between two samples, for sample This is right<A, B>, A interior joints a's indicates weighting by all nodes in the range of the span of a includes in B with reference to vector and obtains.

In step 2-2, the node a in sample A can also be calculated by following formula_iReference vector

Wherein span () function representation obtains the range intervals of the leaf node included by node, and b indicates section in sample B span(a_i) some node for including, e indicates node a_iWith the degree of correlation of b.This method is that span and attention mechanism are combined Method, it is syntax tree to be equally also only applicable to sample, refers to for sample pair<A, B>, the use of attention mechanism is being A interior joints a_iWhen generation with reference to vector, the range that attention mechanism is paid close attention to is limited to a_iSpan section in avoid incoherent letter Breath.

Step 2-3 includes：New vector is obtained by following formula to indicate

Or new vector is obtained using following formula and is indicated

Wherein g₁、g₂The output vector for indicating two door units respectively, for controlling information flow, ⊙ indicates two together The each dimension of vector of dimension is multiplied, wherein W₁And b₁,W₂And b₁The parameter of respectively two doors, σ are activation primitive, All parameters in middle model are all obtained by stochastic gradient descent SGD and training, and training objective usually uses largest interval knot Structureization predicts training objective (bibliography：Socher,Richard,et al."Parsing with Compositional Vector Grammars."Meeting of the Association for Computational Linguistics 2013:455-465.)。

Step 2-4 includes：Step 2-3 mono- is obtained n vector and indicates, is denoted as For n-th to Amount indicates that l values are 1~n, and the vector that step 2-3 is obtained is indicatedIt is encoded again using encoder Obtain new vector m ..., m_i,…,m_n, m_nIt indicatesNew vector after coding indicates.

Step 2-5 includes：The coding result of step 2-4 is subjected to linear transformation and obtains the score of each node in sample, The score summation of all nodes has just been obtained to a relative score of sampleWherein V₂Join for model Number obtains node marking, m for doing linear transformation to hidden layer expression in pairs of phase sorting_iFor with reference to vectorIt encodes again The vector obtained afterwards indicates, the summation of the relative score of A and every other sample then can be obtained sample A in sequence rank in pairs The score of section, the score for the pairs of phase sorting of each sample are denoted as S_pairwise。

Step 3 includes：Score, the score of pairs of phase sorting and the basic score of sample of comprehensive point-by-point phase sorting, Mode is weighted sequence in the following way, and by the final ranking results S of the tree of highest scoring₂Output, wherein α, β are super ginseng Number, by being adjusted on fraction data set and optimal result being taken to obtain：

S₂=α * S_pointwise+β*S_pairwise+(1-α-*β)*S_base。

The present invention in resetting sequence task by the way that point-by-point tactful and pairs of strategy is combined, using neural network as Encoder carries out the ability to express that coding improves model to sample with tree-like length memory network in tree construction sample, leads to Cross first layer sequence reduction Candidate Set number so that pairs of ordering strategy can be used effectively, in second layer sequencer procedure, In conjunction with attention mechanism so that the contact and difference that model can preferably between modeling sample, this method can effectively improve Sort performance.

Advantageous effect：The present invention uses point-by-point order models first by combining point-by-point tactful and pairs of strategy It is given a mark to sample and does the first minor sort, and filter out the higher candidate of score, it is further right on the basis of these candidates Its pairs of phase sorting of progress is comparing wherein pairs of tactful sequence uses a kind of flow for encoding, comparing, encode, giving a mark Different control methods, the sequence after the comparison marking summation of output as this stage are chosen according to the characteristics of sample in the process Score, finally according to the basic score of sample and point-by-point sequence score, pairs of phase sorting score carries out integrated ordered.The method The advantage for combining point-by-point tactful and pairs of strategy, has preferably modeled the difference between two samples, the spy of sample is utilized Point improves the performance of sorting unit.

Description of the drawings

The present invention is done with reference to the accompanying drawings and detailed description and is further illustrated, it is of the invention above-mentioned or Otherwise advantage will become apparent.

Fig. 1 is the flow chart of the present invention.

Fig. 2 is the structural schematic diagram of order models tactful in pairs of the invention

Fig. 3 is the exemplary plot of ranked object syntax tree of the present invention

Fig. 4 is method of the present invention with reference to generation phase combination attention mechanism and node span.

Fig. 5 is to generate then flow chart that Candidate Set is ranked up to sentence I do like eating fish.

Specific implementation mode

In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and embodiments to this hair It is bright to be described further.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts Every other embodiment, shall fall within the protection scope of the present invention.

Step 1 includes：

S₁=α * S_pointwise+(1-α)*S_base,

Step 2 includes：

Step 2-2 is compared to obtain the reference vector of node in sample pair；

Wherein, n indicates the number of sample A nodes,

Step 2-3 includes：New vector is obtained by following formula to indicate

Or new vector is obtained using following formula and is indicated

Step 2-5 includes：The coding result of step 2-4 is subjected to linear transformation and obtains the score of each node in sample, The score summation of all nodes has just been obtained to a relative score of sampleWherein V₂Join for model Number obtains node marking, m for doing linear transformation to hidden layer expression in pairs of phase sorting_iFor with reference to vectorIt compiles again The vector obtained after code indicates that the relative score summation of A and every other sample then can be obtained sample A is sorting in pairs The score in stage, the score for the pairs of phase sorting of each sample are denoted as S_pairwise。

S₂=α * S_pointwise+β*S_pairwise+(1-α-*β)*S_base。

Embodiment

The present invention proposes a kind of text sort method of mixing point-by-point strategy and Comparing method (as depicted in figs. 1 and 2), Sequence is divided into two stages, the first stage carries out preliminary screening by point-by-point strategy, then again in the Candidate Set after reduction Finer sequence is carried out by sorting in pairs, improves the performance of sorting unit.The point-by-point strategy of a kind of mixing of the present invention The step of with the method for reordering of pairs of strategy, is as follows：

The object to sort in the present embodiment is if Fig. 3 is to syntax tree sorting, and baseline system uses Stamford syntactic analysis Device (Stanford Parser), 50 syntax trees of each sentence generation highest scoring, parser are given in gathering for training Marking with it is known as basic score, and each sentence has a correct syntax tree in training set.

Sample is encoded first, coding implement body uses structure recurrent neural network, calculates and is each saved in tree construction The hidden layer vector of point indicates.Hidden layer is indicated that carrying out linear transformation obtains the score of each unit in sample, by all units Score, which is summed, has just obtained the score of sample, i.e.,Wherein V^TFor model parameter.

This 50 syntax trees are ranked up by weighting sequence, wherein basic score is S_base, it is scored at S point by point₁：

S₁=α * S_pointwise+(1-α)*S_base

According to weighting sequence as a result, and take first 8 to enter in pairs of sequencer procedure, 8 samples are combined into sample It is right, 28 samples pair altogether.

It in the pairs of sorting coding stage, is encoded using pairs of order models, this process and point-by-point Policy model coding The process taken is the same, and only coder parameters are different.In sample pair<A, B>In be that each node is calculated with reference to vector table Show, in as A each node calculated in B with the corresponding information of this node, be used in combination vector to indicate, same method is also made With on B, the method being combined using span and attention mechanism, wherein a_ib_jThe node in sample A and sample B is indicated respectively, Calculating a_iReference vector when, by go traversal sample B in it is all in a_iSpan scope in all nodes, calculate it is similar Degree, is then based on this similarity and is weighted, you can obtain a_iReference vectorWherein span (a_i) indicate node a_iIt is wrapped The range of the leaf node included, as shown in Figure 4, the span of grayed-out nodes is (eating fish), i.e. the 4th word in sample A To the 5th word, so the node being comprised in this section can only be gone in sample B to calculate similarity：

It compares to obtain new vector expression with reference to vectorial by the hidden layer vector sum to node, comparative approach is used and done Difference, the method for doing point multiplication operation, i.e., make the difference hidden layer vector sum with reference to vector and the operation of dot product and splicing obtain new vector It indicates and is passed in next layer network structure.

Using encoder again to the vector after comparisonCoding again is carried out, is equivalent in fact and acts on neural network It is encoded again on expression vector after comparing two samples so that network can capture and conclude the difference between two samples Different information.

Vector after comparison is input to encoder to encode, and one group of parameter is multiplied by according to coding result and obtains node Marking, all nodes marking summation obtain the relative score of sample, and each sample obtains B there are two score, that is, A meeting Divide and B is to the score of A, A is to the score that the sum of relative score of other 7 samples is pairs of phase sorting A.

In the integrated ordered stage, by all steps of front, preceding 8 sample standard deviations have point-by-point order models marking and in pairs row Sequence model give a mark and basic score, the present embodiment overall flow, as Fig. 5 description shown in, input in baseline system about sentence 50 syntax trees namely 50 samples that sub " I do like eating fish " is obtained, wherein Tree4 are correct sample, Per one tree, all there are one basic score S_base, it is seen that the basic score highest of wherein Tree1 is obtained by point-by-point order models To point-by-point marking S_pointwise, both scores are weighted to obtain a ranking results, and take before highest scoring 4, be carried out pairs of Sequence obtains S_pairwisee, finally according to the weighting of these three scores, the highest tree of weight score is selected, is exported as a result, managed In the case of thinking, the not high correct sample Tree4 of script score highest position can be come.

The present invention in resetting sequence task by the way that point-by-point tactful and pairs of strategy is combined, using neural network as Encoder carries out the ability to express that coding improves model to sample with tree-like length memory network in tree construction sample, leads to Cross first layer sequence reduction Candidate Set number so that pairs of ordering strategy can be used effectively, in second layer sequencer procedure, In conjunction with attention mechanism so that the contact and difference that model can preferably between modeling sample, this method can effectively improve Sort performance.By introducing the order models of the tree-like point-by-point strategy of shot and long term memory network optimization, height is gone out by this model discrimination Mass candidates collection reduces the calculation amount of strategy in pairs, in the pairs of tactful stage, is compared two-by-two on the Candidate Set after reduction Compared with by using the correspondence of the intermediate node of sample so that attention mechanism is calculating and more efficiently, optimizing in accuracy The result that reorders.

The present invention provides a kind of text sort methods of the point-by-point tactful and pairs of strategy of mixing, implement the technical side There are many method and approach of case, the above is only a preferred embodiment of the present invention, it is noted that for the art For those of ordinary skill, various improvements and modifications may be made without departing from the principle of the present invention, these improvement Protection scope of the present invention is also should be regarded as with retouching.The available prior art of each component part being not known in the present embodiment is subject to It realizes.

Claims

1. a kind of text sort method of the point-by-point tactful and pairs of strategy of mixing, which is characterized in that include the following steps：

Step 1, sample is ranked up using point-by-point order models, in the result that point-by-point strategy has sorted, is arranged point by point The score in sequence stage chooses the higher sample of K score and forms Candidate Set, and K is preset hyper parameter, uses pairs of sequence Model carries out marking sequence to Candidate Set；

Step 2, strategy sequence in pairs is carried out to the Candidate Set of K sample composition, obtain sample pairs of phase sorting score；

Step 3, the score, the score of pairs of phase sorting of comprehensive point-by-point phase sorting and the basic score of sample carry out sample Weighting sequence.

2. according to the method described in claim 1, it is characterized in that, step 1 includes：

Step 1-1 encodes sample using encoder, and the continuous type for obtaining each node indicates, i.e., by sample x= {x₁,…,x_i,…,x_n, be converted to h={ h₁,…,h_i,…,h_nHidden layer expression, x_iIndicate i-th of node in sample, node is again Referred to as unit, h_iIt indicates to obtain x by coding_iHidden layer indicate；The encoder is structure recurrent neural network Recursive Neural Network or tree-like shot and long term memory network Tree-LSTM or two-way shot and long term memory networks Bi-LSTM；

Hidden layer is indicated that carrying out linear transformation obtains the score of each unit in sample, and the score of all nodes is asked by step 1-2 The score S of sample is just obtained_pointwise, wherein n indicate single sample in node number：

Wherein V₁For model parameter, score is converted to by linear transformation for indicating node in point phase sorting one by one；

The weight score S of the point-by-point phase sorting of sample is calculated in step 1-3₁：

S₁=α * S_pointwise+(1-α)*S_base,

Wherein, α indicates that hyper parameter, this hyper parameter are obtained by being adjusted in development set data, S_baseFor the basic score of sample, Basic score is from the baseline system for generating sample；

Step 1-4 is ranked up sample by the obtained weight scores of step 1-3, chooses the higher sample composition of K score Candidate Set, K are hyper parameter, and K sample is combined into sample pair, altogether K* (K-1)/2 sample pair.

3. according to the method described in claim 2, it is characterized in that, step 2 includes：

Step 2-1, sample A and sample B in K sample form sample pair<A, B>, using encoder by sample pair<A, B>In Nodes encoding at hidden layer vector indicate；

Step 2-2 is compared to obtain the reference vector of node in sample pair；

Step 2-4 is that the vector that step 2-3 is obtained indicates to be encoded by encoder；

Step 2-5, obtains sample pair<A, B>Relative score, opposite give a mark of all samples pair obtains sample in comprehensive K sample Originally in the score of pairs of phase sorting.

4. according to the method described in claim 3, it is characterized in that, in step 2-1, the encoder is structure recurrent neural net Network Recursive Neural Network or tree-like shot and long term memory network Tree-LSTM or two-way shot and long term memory networks Bi-LSTM。

5. according to the method described in claim 4, it is characterized in that, step 2-2 includes：Based on attention mechanism (Attention Mechanism), the node a in sample A is calculated by following formula_iReference vector

Wherein, n indicates the number of sample B node, b_jIndicate j-th of node in sample B, e_ijIndicate node a_iAnd b_jCorrelation Degree.

6. according to the method described in claim 4, it is characterized in that, step 2-2 includes：It is calculated in sample A by following formula Node a_iReference vector

Wherein span () function representation obtains the range intervals of the leaf node included by node, and b indicates to be included in area in sample B Between span (a_i) section some node, len () function representation obtain node included by leaf node number.

7. according to the method described in claim 4, it is characterized in that, step 2-2 includes：It is calculated in sample A by following formula Node a_iReference vector

Wherein span () function representation obtains the range intervals of the leaf node included by node, and b indicates section span in sample B (a_i) some node for including, e indicates node a_iWith the degree of correlation of b.

8. the method according to any one of claim 5~7, which is characterized in that step 2-3 includes：Pass through following formula New vector is obtained to indicate

Or new vector is obtained using following formula and is indicated

Wherein g₁、g₂The output vector for indicating two door units respectively, for controlling information flow, ⊙ indicates two same dimensions The each dimension of vector be multiplied, wherein W₁And b₁,W₂And b₁The parameter of respectively two doors, σ are activation primitive.

9. according to the method described in claim 8, it is characterized in that, step 2-4 includes：N vector table is obtained in step 2-3 mono- Show, is denoted as It is indicated for n-th of vector, l values are 1~n, and the vector that step 2-3 is obtained is indicatedEncoded again using encoder obtain new vector m ..., m_i,…,m_n, m_nIt indicatesAfter coding New vector indicates.

10. according to the method described in claim 9, it is characterized in that, step 2-5 includes：The coding result of step 2-4 is carried out Linear transformation obtains the score of each node in sample, and one that the score summation of all nodes has just been obtained sample obtains relatively PointWherein V₂For model parameter, obtained for doing linear transformation to hidden layer expression in pairs of phase sorting It gives a mark to node, m_iFor with reference to vectorThe vector obtained after encoding again indicates, then by the phase of A and every other sample To score summation can be obtained sample A pairs of phase sorting score, be each sample pairs of phase sorting minute mark For S_pairwise。