CN107194422A

CN107194422A - A kind of convolutional neural networks relation sorting technique of the forward and reverse example of combination

Info

Publication number: CN107194422A
Application number: CN201710354990.2A
Authority: CN
Inventors: 赵翔; 李博; 葛斌; 肖卫东; 王帅; 汤大权
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2017-06-19
Filing date: 2017-06-19
Publication date: 2017-09-22

Abstract

The invention discloses the relation sorting technique based on convolutional neural networks that a kind of positive example of combination and reverse example are combined, it is related to Relation extraction and sorting technique field.This method comprises the following steps：S1. to sentence text entities to be sorted, positive example and reverse example are divided into according to the front and rear linear precedence of word in sentence；S2. positive instance entity and reverse instance entity are encoded respectively using CNN sentences encoder, the coding characteristic vector for constructing the positive example of sentence isCoding characteristic vector with reverse example isS3. it is according to the coding characteristic vector of positive exampleCoding characteristic vector with reverse example isUsing softmax layers of progress relation classification, classification results r is obtained_i.Compared with other method, the present invention classifies on CNN with reference to forward and reverse information summary of same entity pair, improves final classification effect.

Description

A kind of convolutional neural networks relation sorting technique of the forward and reverse example of combination

Technical field

The present invention relates to Relation extraction and sorting technique field, refer in particular to a kind of positive example of combination and reverse example combines Relation sorting technique based on convolutional neural networks.

Background technology

Current worldwide existing Relation extraction technology can be largely classified into 3 kinds：Method based on pattern match, Relation extraction method and open field information extraction method based on machine learning.Method based on pattern match utilizes artificial constructed Pattern match relation, it is necessary to engineer's pattern its to get migration poor；Open field information extraction method extracts certain sentence Then relation character string cluster is obtained relation by predicate as the relation character string between subject and object, and this method extracts accurate The exactness difference and relation that extracts is difficult to be mapped to the relation for building database needs.And the Relation extraction skill based on machine learning Extraction problem is converted into the relation classification problem under known predefined relationship type by art, which ensure that extracting accuracy and only having Seldom manual intervention.

And the Relation extraction technology based on machine learning is broadly divided into 3 kinds：The relation sorting technique of feature based, based on tree The relation sorting technique of core and network relation sorting technique.The method of feature based extract a large amount of linguistics (meaning of a word and Grammer) feature, assemblage characteristic formation characteristic vector simultaneously using all kinds of graders (such as maximum entropy model and SVMs) Progress, which is classified, obtains relationship by objective (RBO), and it needs expert design feature, it is difficult to migrate field.Method based on tree core passes through text Grammer tree representation, obtains inner product of two sentences on higher-dimension evacuated space by designing kernel function and is used as its structured features, should Method kernel function extracts characteristic limitations greatly, point

In addition, the representative sex work of the relation sorting algorithm based on neutral net mainly includes both at home and abroad, based on convolution god Method (CNN) [1] through network, based on the method (CR-CNN) [2] of sequence convolutional neural networks, based on recurrent neural network Method (RNN) [3], based on the method such as method (ATT-BLSTM) [4] of memory models in short-term of the two-way length with notice.These Method is all that the sentence of relation to be extracted and entity are inputted into neutral net, predetermined using being categorized into after neutral net acquisition feature Adopted relationship type, to obtain relationship by objective (RBO).

Immediate with the present invention is the method [1] (such as Fig. 1) based on convolutional neural networks, and it utilizes the word of external trainer Vector sum represents that word is combined with the position vector of physical distance and obtains word vectors, with reference to all word vectors conducts in sentence Sentence vector, then by sentence vector input convolutional neural networks, is obtained local feature using convolutional layer, is obtained using pond layer Obvious characteristic, obtains classification relation by softmax layers afterwards.

Relation sorting technique of the prior art is primarily present following deficiency：The relation sorting technique of feature based needs people Work design feature, migration is poor；Method based on tree core can only obtain feature by defining kernel function, and feature is single；And be based on CNN (convolutional neural networks) is easy to implement in the method for neutral net and training effectiveness is high, good classification effect, and other are more complicated Method training effectiveness it is relatively low and be difficult to obtain the classifying quality suitable with CNN.But its still method still suffers from asking for following aspect Topic:Entity according to different sequencings to inputting neutral net in same sentence, and its classification results may be different.For example, “Financial stress is one of the main causes of divorce." in, entity " stress " is made e₁, entity " divorce " make e₂Result is obtained for Cause-Effect；" stress " is made into e₂, " divorce " make e₁It ought to obtain To result Effect-Cause；But there is the not corresponding situation of two kinds of results during actual classification.

[1] Zeng D, Liu K, Lai S, Zhou G, Zhao J.Relation Classification via Convolutional Deep Neural Network[C]//COLING。2014:2335-2344。

[2] Santos C N, Xiang B, Zhou B.Classifying relations by ranking with convolutional neural networks[C]//ACL(1)。2015:626-634

[3] Hashimoto K, Miwa M, Tsuruoka Y, Chikayama T.Simple Customization of Recursive Neural Networks for Semantic Relation Classification[C]//EMNLP。 2013:1372-1376。

[4] Zhou P, Shi W, Tian J, Qi Z, Li B, Hao H, Xu B.Attention-based bidirectional long short-term memory networks for relation classification [C]//ACL(2)。2016:207。

The content of the invention

It is an object of the invention to overcome above-mentioned the deficiencies in the prior art, the Direct/Reverse based on convolutional neural networks is proposed The relation sorting technique that example is combined.

To achieve the above object, technical solution of the present invention is specific as follows：

The relation sorting technique based on convolutional neural networks that a kind of positive example of combination and reverse example are combined, this method Comprise the following steps：

S1. to sentence text entities to be sorted, positive example is divided into and anti-according to the front and rear linear precedence of word in sentence To example；

S2. positive instance entity and reverse instance entity are encoded respectively using CNN sentences encoder, constructs sentence The coding characteristic vector of positive example of son isCoding characteristic vector with reverse example is

S3. it is according to the coding characteristic vector of positive exampleCoding characteristic vector with reverse example isUtilize Softmax layers of progress relation classification, obtain classification results r_i。

As the further improvement of technical solution of the present invention, wherein, a sentence for having marked two entities is given, according to The front and rear linear precedence of word in sentence, the entity using equivalent in preceding appearance is used as e₁, in the entity of rear appearance be used as e₂Example, Referred to as positive example；

Wherein, a sentence for having marked two entities is given, according to the front and rear linear precedence of word in sentence, equivalent is existed The entity occurred afterwards is as e₁, in the entity of preceding appearance be used as e₂Example, referred to as reverse example.

As the further improvement of technical solution of the present invention, for any sentence, the coding characteristic of the positive example of definition to Measure and beReversely the coding characteristic vector of example isPositive example relationship isReversely example relationship is

Set it there are the positive examples of probability h correct, have the reverse example of 1-h probability correct, define cross entropy design object letter Number is：

Wherein, n is sentence quantity, and θ and θ ' are respectively sentence encoder nerve net in positive example and reverse instance model The mapping of convolutional layer and hidden layer and bigoted parameter in network.

Further, object function is minimized using stochastic gradient descent method, mini- is randomly choosed from training set Batch sample is trained until convergence；Positive Exemplary classes probability vector is C⁺=[c₁, c₂..., c_r], reverse example point Class probability vector is C^-=[c₁, c₂..., c_r], c_iRepresent entity e in the sentence₁With e₂Between there is relation r_iProbability；

The result that must can classify：

C=ω C⁺+(1-ω)C^-

Finally, corresponding classification results r is obtained by maximum inverse function i=argmax (C)_i。

As the further improvement of technical solution of the present invention, the CNN sentences encoder includes three-decker, first layer to the Three-decker is followed successively by coding layer, convolutional layer, pond and non-linear layer；

Wherein, the coding layer is used to the word in sentence being converted to low-dimensional real number vector；The convolutional layer is used to obtain The high-level characteristic of each word；The pondization and non-linear layer are used for the coded representation for constructing sentence.

As the further improvement of technical solution of the present invention, the coding layer includes word for the coded representation of word in sentence Coding, position encoded and dependence coding；

Wherein, Chinese word coding is specifically included：It includes n word to a known sentence x, is expressed as x=[x₁, x₂... x_n], its Middle x_iI-th of word in the sequence is represented, n is filling intercepted length set in advance.Each word x_iBy searching term vector table W Obtain its equivalent vector representation e_i, i.e. e_i=Wx_i；

Wherein, it is position encoded to specifically include：Using each word with and entity distance generation position feature it is vectorial, using every Individual word x_iWith two entities in sentence apart from i-i₁And i-i₂Vector of the correspondence in the feature coding table D of position is used as position Coding, be denoted asWithPosition feature coding schedule is initialized using random value；

Wherein, coding is relied on to specifically include：Using word and upper layer node distance generation rely on direction vector, using word it Between dependence label generation dependence characteristics vector；

The Chinese word coding of each word, position encoded and dependence coded strings are tied together as to the coded representation of each word.

As the further improvement of technical solution of the present invention, the convolutional layer is used to merge all local features, the volume Lamination extracts local feature by a size for w sliding window；

Specifically, convolution kernel is matrix f=[f₁, f₂..., f_w], then characteristic sequence s=[s are obtained after convolution₁, s₂..., s_n]；

Wherein,

Wherein, b is bias term, and g is a nonlinear function, can be obtained not using different convolution kernel and window size Same feature.

As the further improvement of technical solution of the present invention, the pondization and non-linear layer obtain most heavy using max functions Feature is wanted, then its convolution fraction is for each convolution kernel：

p_f=max { s_A}

The pond fraction that each convolution kernel is obtained is connected for the characteristic vector z=[p for representing the sentence₁, p₂..., p_m], wherein m is convolution nuclear volume；

Finally, nonlinear function is added as output to characteristic vector, the output is the coded representation for inputting sentence.

Compared with prior art, the invention has the advantages that：

1st, the present invention proposes the relation based on convolutional neural networks that a kind of positive example of combination and reverse example are combined Sorting technique, it is contemplated that positive and negative two kinds of situations of example, relation classification strong robustness, classifying quality is more preferable；

2nd, the inventive method is easily achieved and training process is fast, first by convolutional neural networks to positive example and direction Example carries out sentence coding, obtains its different coded representation.Same sentence entity is considered to suitable according to different priorities Sequence inputs neutral net, the relation come with reference to two kinds of situations between comprehensive descision entity pair.Compared with other method, the present invention exists Forward and reverse information summary on CNN with reference to same entity pair is classified, and improves final classification effect.

Brief description of the drawings

Fig. 1 is the relation taxonomic structure figure based on convolutional neural networks in background technology.

Fig. 2 is relation classification method flow diagram of the present invention.

Fig. 3 is the relation taxonomy model figure of forward and reverse example combination in the embodiment of the present invention.

Fig. 4 is dependency analysis tree structure diagram in the embodiment of the present invention.

Embodiment

The present invention relates to the Relation extraction technology in information extraction, more particularly to the relation classification side based on machine learning Method, existing Relation extraction technology main flow way is realized by relation sorting technique.The present invention utilizes existing term vector Training technique and syntactic analysis instrument are indicated to text, and we carry out the relation based on neutral net on this basis Class.Extracted the invention mainly comprises neural network characteristics and representation module, the combining classification module of a variety of expressions.It is of the invention main Pass through relation sorting technique, implementation relation extraction technique.Relation extraction, i.e., recognize from unformatted text and generate between entity Semantic relation.For example, input text " Financial stress is one of the main causes of Divorce ", wherein having marked entity e₁=" stress " and e₂=" divorce ", relation classification task is by automatic identification entity e₁And e₂Between there is Cause-Effect relations, and be expressed as Cause-Effect (e₁,e₂)。

Below in conjunction with present specification accompanying drawing, a kind of positive example of combination of the present invention and reverse example are combined The specific embodiment of relation sorting technique based on convolutional neural networks is described in further details, it is clear that described implementation Example only a part of embodiment of the invention, rather than whole embodiments, based on the embodiment in the present invention, this area is common The every other embodiment that technical staff is obtained under the premise of creative work is not made, belongs to the model of the application protection Enclose

The present embodiment method idiographic flow is as shown in Fig. 2 comprise the following steps：

The framework relation classification composition structure as shown in figure 3, including forward direction instance entity, reverse instance entity, CNN Sub-encoders, softmax layers and causality.

Wherein, a sentence for having marked two entities is given, according to the front and rear linear precedence of word in sentence, equivalent is existed The entity of preceding appearance is used as e₁, in the entity of rear appearance be used as e₂Example, be referred to as positive example.

For example, in sentence " Financial stress is one of the main causes of divorce.” In, e is made with " stress "₁, " divorce " make e₂For positive example, positive example has Cause-Effect relations；With " divorce " makees e₁, " stress " make e₂For reverse example, reverse example has Effect-Cause relations.Research discovery, just The semantic relation of semantic relation and reverse example to example is mutually corresponding, and an outstanding categorizing system is it is ensured that forward direction Example and reverse Exemplary classes result are also mutually corresponded to.

For any sentence, the coding characteristic vector for defining its positive example isThe coding characteristic vector of reverse example ForPositive example relationship isReversely example relationship isDue to there are positive example and the not corresponding feelings of reverse example Condition, therefore set it to have the positive examples of probability h correct, there is the reverse example of 1-h probability correct；Utilize cross entropy design object function For：

Wherein, n is sentence quantity, and θ and θ ' are respectively all in neutral net in positive example and reverse instance model Parameter.

To solve object function J (θ) optimization problem, object function is minimized using stochastic gradient descent method.Specifically Ground, mini-batch sample of random selection is trained until convergence from training set；In test, positive Exemplary classes are general Rate vector is C⁺=[c₁, c₂..., c_r], reverse Exemplary classes probability vector is C^-=[c₁, c₂..., c_r], c_iRepresent in the sentence Entity e₁With e₂Between there is relation r_iProbability.Therefore, the result of classification is：

C=ω C⁺+(1-ω)C^-。

Further, the CNN sentence encoders in this specific embodiment, the sentence coder structure specifically includes coding Layer, convolutional layer, pond and non-linear layer.First, the word in sentence is converted to low-dimensional real number vector by coding layer, positioned at coding layer On convolutional layer obtain the high-level characteristic of each word；Then, sentence vector representation is constructed via pondization and non-linear layer, compiled Vector after code is denoted as s.

Each layer in the CNN sentence coder structures is described in detail below：

Coding layer

The input of CNN sentence encoders is original sentence text, because CNN can only handle fixed length input, therefore in input Original sentence, which is filled with the consistent word sequence of length, the present embodiment, before sets target length to be that the most long sentence of data set is long N is spent, filling word is " NaN ".In the coding layer, each word is low-dimensional vector by term vector matrix conversion, for mark entity In position, the present embodiment, to each word point of addition characteristic vector.In addition, sentence dependency structure is understood for raising system, this In embodiment, direction vector sum dependence characteristics vector is relied on to the addition of each word.

In coding layer, the coded representation of word includes Chinese word coding, position encoded and dependence coding.

Wherein, Chinese word coding is：It includes n word to a known sentence x, is expressed as x=[x₁, x₂... x_n], wherein x_iTable Show i-th of word in the sequence, n is filling intercepted length set in advance.Each word x_iIt is obtained by searching term vector table W Equivalent vector representation e_i, i.e. e_i=Wx_i；Passed through in the present embodiment using term vector training tool (Google word2vec) of increasing income Term vector table is obtained to the training of wikipedia off-line data.

Wherein, it is position encoded to be：Position of the entity in sentence influences the relation of inter-entity, without position feature vector When, it is entity that CNN, which will be unable to which word in identification sentence, causes classifying quality poor.To solve the technical deficiency that CNN is present, this reality Apply CNN in example using each word with and entity distance generation position feature it is vectorial, for example, in sentence " Financial stress is one of the main causes of divorce." in, word " main " and entity " stress " distance are 5, and entity " divorce " distance is -3.Specifically, using each word x_iWith two entities in sentence apart from i-i₁And i- i₂Vector of the correspondence in the feature coding table D of position as position encoded, be denoted asWithPosition feature coding schedule is used Random value is initialized.

Wherein, dependence is encoded to：Based on dependency analysis tree dependence coding include dependence direction vector sum dependence characteristics to Amount；Dependency analysis tree is the tree to being constituted after sentence structure analysis according to relation of interdependence between word, is a base for reason and good sense solution This instrument.(Stamford syntactic analysis tool analysis result), each in addition to root node in dependency analysis tree as shown in Figure 4 There is dependence between node and superior node, dependence not only also includes relying on label comprising its superior node.At this In embodiment, direction vector is relied on using the distance generation of word and upper layer node, is generated using the label of dependence between word Dependence characteristics vector.

For example, " city " is 3 with superior node " go " distance, feature tag is " nmod；" go " and superior node " intends " distance is 2, and feature tag is " xcomp ", uses for reference position encoded mode, using each word and a upper word away from FromReal number vector of the correspondence in direction coding schedule P is relied on is used as p_i, it is corresponding in dependence characteristics coding schedule using label is relied on Vector in F is used as f_i, rely on direction coding schedule and dependence characteristics coding schedule initialized using random value.

So far, the Chinese word coding of each word, position encoded and dependence coded strings are tied together as to the coded representation of the word, And for filling word, set unique vector to be identified.Specifically, to each word, connect term vector e_i, two entities position VectorWithRely on direction vector p_iAnd dependence characteristics vector f_iObtain each word X_iExpression vector, i.e.

And the coded representation of sentence is then:

X=[X₁, X₂... X_n]。

Convolutional layer

The ultimate challenge of relation classification is derived from semantic statement diversity, and position of the important information in sentence is not fixed. Therefore, all local features are merged using a convolutional layer in the present embodiment, the convolutional layer passes through sliding window of the size for w To extract local feature, when sliding window may cross the border near border, null vector can be filled on sentence both sides to ensure dimension after convolution Number is constant.

Specifically, convolution kernel is matrix f=[f₁, f₂..., f_w], then characteristic sequence s=[s are obtained after convolution₁, s₂..., s_n], wherein,

Pond and non-linear layer

In pond layer, most important characteristics are obtained using max functions, then its convolution fraction is for each convolution kernel：

p_f=max { s_A}。

The pond fraction that each convolution kernel is obtained is connected for the characteristic vector z=[p for representing the sentence₁, p₂..., p_m], wherein m is convolution nuclear volume.

The present invention is directed to relation classification problem there is provided the more efficient and comprehensive feature extraction of one kind and coded system, And entity is considered to the classification relation under different order, improve the effect of relation classification.

Those skilled in the art will be clear that the scope of the present invention is not restricted to example discussed above, it is possible to which it is carried out Some changes and modification, the scope of the present invention limited without departing from appended claims.Although oneself is through in accompanying drawing and explanation The present invention is illustrated and described in book in detail, but such explanation and description are only explanations or schematical, and it is nonrestrictive. The present invention is not limited to the disclosed embodiments.

Claims

1. the relation sorting technique based on convolutional neural networks that a kind of positive example of combination and reverse example are combined, its feature exists In,

S1. to sentence text entities to be sorted, positive example is divided into and reversely real according to the front and rear linear precedence of word in sentence Example；

S2. positive instance entity and reverse instance entity are encoded respectively using CNN sentences encoder, constructs sentence The coding characteristic vector of positive example isCoding characteristic vector with reverse example is

S3. it is according to the coding characteristic vector of positive exampleCoding characteristic vector with reverse example isUtilize softmax Layer carries out relation classification, obtains classification results r_i。

2. the pass based on convolutional neural networks that the positive example of a kind of combination according to claim 1 and reverse example are combined It is sorting technique, it is characterised in that

Wherein, a sentence for having marked two entities is given, according to the front and rear linear precedence of word in sentence, equivalent is gone out preceding Existing entity is used as e₁, in the entity of rear appearance be used as e₂Example, be referred to as positive example；

Wherein, a sentence for having marked two entities is given, according to the front and rear linear precedence of word in sentence, equivalent is gone out after Existing entity is used as e₁, in the entity of preceding appearance be used as e₂Example, referred to as reverse example.

3. the pass based on convolutional neural networks that the positive example of a kind of combination according to claim 2 and reverse example are combined It is sorting technique, it is characterised in that

For any sentence, the coding characteristic vector of the positive example of definition isReversely the coding characteristic vector of example isJust It is to example relationshipReversely example relationship is

Set it there are the positive examples of probability h correct, have the reverse example of 1-h probability correct, defining cross entropy design object function is：

<mrow> <mi>J</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mi>h</mi> <mrow> <mo>(</mo> <mi>log</mi> <mi> </mi> <mi>p</mi> <mo>(</mo> <mrow> <msubsup> <mi>r</mi> <mi>i</mi> <mo>+</mo> </msubsup> <mo>|</mo> <msubsup> <mi>z</mi> <mi>i</mi> <mo>+</mo> </msubsup> <mo>,</mo> <mi>&theta;</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>h</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mi>log</mi> <mi>p</mi> <mo>(</mo> <mrow> <msubsup> <mi>r</mi> <mi>i</mi> <mo>-</mo> </msubsup> <mo>|</mo> <msubsup> <mi>z</mi> <mi>i</mi> <mo>-</mo> </msubsup> <mo>,</mo> <msup> <mi>&theta;</mi> <mo>&prime;</mo> </msup> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow>

Wherein, n is sentence quantity, and θ and θ ' are respectively in positive example and reverse instance model in sentence encoder neutral net The mapping of convolutional layer and hidden layer and bigoted parameter；

Further, object function is minimized using stochastic gradient descent method, mini-batch is randomly choosed from training set Individual sample is trained until convergence；Positive Exemplary classes probability vector is C⁺=[c₁, c₂..., c_r], reverse Exemplary classes are general Rate vector is C^-=[c₁, c₂..., c_r], c_iRepresent entity e in the sentence₁With e₂Between there is relation r_iProbability；

The result that must can classify：

C=ω C⁺+(1-ω)C^-

4. the pass based on convolutional neural networks that the positive example of a kind of combination according to claim 1 and reverse example are combined It is sorting technique, it is characterised in that the CNN sentences encoder includes four-layer structure, first layer to four-layer structure is followed successively by volume Code layer, convolutional layer, selective attention layer, pond and non-linear layer；

Wherein, the coding layer is used to the word in sentence being converted to low-dimensional real number vector；The convolutional layer is used to obtain each The high-level characteristic of word；The selective attention layer is used to find out by most short independent path to contact most close with two Entity Semantics Word, represented by weight matrix；The pondization and non-linear layer are used for the coded representation for constructing sentence.

5. the pass based on convolutional neural networks that the positive example of a kind of combination according to claim 4 and reverse example are combined Be sorting technique, it is characterised in that the coding layer for word in sentence coded representation include Chinese word coding, it is position encoded and according to Rely coding；

Wherein, Chinese word coding is specifically included：It includes n word to a known sentence x, is expressed as x=[x₁, x₂... x_n], wherein x_iTable Show i-th of word in the sequence, n is filling intercepted length set in advance.Each word x_iIt is obtained by searching term vector table W Equivalent vector representation e_i, i.e. e_i=Wx_i；

Wherein, it is position encoded to specifically include：Using each word with and entity distance generation position feature it is vectorial, use each word x_iWith two entities in sentence apart from i-i₁And i-i₂Vector of the correspondence in the feature coding table D of position as position encoded, Be denoted asWithPosition feature coding schedule is initialized using random value；

Wherein, coding is relied on to specifically include：Using word and upper layer node distance generation rely on direction vector, using between word according to The label generation dependence characteristics vector for the relation of relying；

6. the pass based on convolutional neural networks that the positive example of a kind of combination according to claim 4 and reverse example are combined It is sorting technique, it is characterised in that the convolutional layer is used to merge all local features, and the convolutional layer is by a size W sliding window extracts local feature；

Wherein,

<mrow> <msub> <mi>s</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>g</mi> <mrow> <mo>(</mo> <msubsup> <mi>&Sigma;f</mi> <mrow> <mi>j</mi> <mo>+</mo> <mn>1</mn> </mrow> <mi>T</mi> </msubsup> <msubsup> <mi>X</mi> <mrow> <mi>j</mi> <mo>+</mo> <mn>1</mn> </mrow> <mi>T</mi> </msubsup> <mo>+</mo> <mi>b</mi> <mo>)</mo> </mrow> </mrow>

Wherein, b is bias term, and g is a nonlinear function, can obtain different using different convolution kernel and window size Feature.

7. the pass based on convolutional neural networks that the positive example of a kind of combination according to claim 4 and reverse example are combined Be sorting technique, it is characterised in that in the pondization and softmax layer, using max functions acquisition most important characteristics, then for Its convolution fraction of each convolution kernel is：

p_f=max { s_A}

The pond fraction that each convolution kernel is obtained is connected for the characteristic vector z=[p for representing the sentence₁, p₂..., p_m], its Middle m is convolution nuclear volume；