CN107818081A

CN107818081A - Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling

Info

Publication number: CN107818081A
Application number: CN201710876254.3A
Authority: CN
Inventors: 周俏丽; 杨凤玲
Original assignee: Shenyang Aerospace University
Current assignee: Shenyang Aerospace University
Priority date: 2017-09-25
Filing date: 2017-09-25
Publication date: 2018-03-20

Abstract

The present invention relates to a kind of sentence similarity appraisal procedure based on deep semantic model and semantic character labeling, text-string is mapped to the characteristic vector in low semantic space, the similarity between two sentences is measured using cosine similarity；Existing semantic role is retained, and other semantic roles are uniformly handled；According to the size of the similarity between predicate to pairing of the sentence to progress predicate, predicate matching pair is obtained, further obtains the similar calculated value between semantic role；Multiple semantic roles of each predicate in multiple predicates of one sentence are subjected to Semanteme collocation, the similarity of semantic role is calculated, the similarity that deep semantic model is calculated and the similarity two parts calculated based on semantic role carry out final similarity of the linear combination as sentence.The present invention combines semantic role, and than Pearson correlation coefficient lifting 2.226%, the result than being ranked the first on SemEval2017 evaluation and tests official website is higher by 0.266%.

Description

Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling

Technical field

The present invention relates to a kind of natural language processing technique, is specially that one kind is based on deep semantic model and semantic role mark The sentence similarity appraisal procedure of note.

Background technology

Sentence similarity (Sentence Similarity Computing) is to measure the semantic equivalence between two sentences Property, it is research work particularly significant in natural language processing field and compared with based on.Such as in Case-based design In calculated by sentence similarity match similar sentence be used as the candidate collection translated, in automatically request-answering system problem with answering The matching of case, in information filtering, for rejecting possible junk information, in automatic abstract by similarity abstracting sentence Son, in classification or cluster, for judging classification of sentence or document etc..

The similarity based method of research sentence has the method matched based on morphology, word order that Lv Xueqiang et al. is proposed at present；The Qin Soldier et al. proposes the method based on keyword；Pan Qian is red et al. to propose the method based on On The Attribute Theory；The use that Li Bin et al. is proposed The method that semantic dependency calculates；The method based on skeleton dependency tree that fringe will side et al. proposes；The improvement that Che Wanxiang et al. is proposed The method of editing distance；Journey passes sentence similarity computational methods based on Hownet that roc et al. proposes etc..

Can be divided into by calculating the method for sentence similarity at present by three kinds：(1) method based on word feature, such as vector space mould Type, morphology, word order etc.；(3) method based on semanteme：Method such as based on semantic dictionary；(3) side based on syntactic analysis Method, the sentence similarity such as based on interdependent syntactic analysis calculate.

Method based on word feature has only used the surface layer information of sentence, for containing the vocabulary such as synonym, antonym Sentence cannot be handled well.Based on the method for semantic dictionary, solve to a certain extent based on word characterization method Deficiency, but this method depends on the completeness of semantic dictionary, have ignored the interaction relationship and sentence between sentence word Profound syntactic structure.And the method based on interdependent syntactic analysis can excavate the profound information of sentence, obtain sentence Institutional framework and word between dependence, but the method for the interdependent syntax used at present only make use of effective collocation of sentence It is right, it have ignored influence of other words to sentence similarity.

The content of the invention

Calculate and established in the frame using verb as core for the sentence similarity based on semantic character labeling in the prior art In the similarity of frame, the deficiencies of composition information that can not make full use of verb and its domination be present, the present invention, which proposes, to be based on The method that the sentence similarity of deep semantic model and semantic character labeling calculates, sentence structure, semantic level from sentence enter Row analysis.

In order to solve the above technical problems, the technical solution adopted by the present invention is：

A kind of sentence similarity appraisal procedure based on deep semantic model and semantic character labeling of the present invention, including it is following Step：

1) deep semantic model is established：By relatively short text-string be mapped to feature in low semantic space to Measure, after the semantic feature vector for obtaining each sentence, the similarity between two sentences is measured using cosine similarity；

2) semantic role classification is handled：The existing semantic role of A0, A1, A2 is retained, and other semantic roles are unified Handled as a kind of semantic role；A0, A1, A2 are disclosed semantic role mark；

3) predicate Similarity Measure：On semantic role basis of classification, for more predicate sentences, according to the phase between predicate Like degree size to sentence to carry out predicate pairing, obtain predicate matching pair, respectively for multiple predicate matchings to carry out language The calculating of adopted role, obtain the similar calculated value between semantic role；

4) sentence similarity based on semantic role calculates：According to similarity value calculation between semantic role by a sentence Multiple predicates in each predicate multiple semantic roles carry out Semanteme collocation, calculate the similarity of semantic role, that is, convert Similarity Measure between predicate and between identical semantic role；

5) sentence similarity calculates：The similarity that deep semantic model is calculated and the phase calculated based on semantic role Final similarity of the linear combination as sentence is carried out like degree two parts.

Deep semantic model includes word Hash layer, hidden layer and output layer three parts, and each layer of function is as follows It is shown：

l₁=W₁x (1)

l_i=f (W_il_i-1+b_i), i=2 ..., N-1 (2)

Y=f (W_Nl_N-1+b_N) (3)

Wherein, x is input vector, and y is output vector, l_i, i=1 ..., N-1 are the output of hidden layer, W_iRepresent the i-th power Weight, b_iI-th of biasing is represented, f (*) represents tanh activation primitives；

By word Hash layer generate characteristic vector projected by hidden layer, and output layer formed semantic feature to Amount；

After the semantic feature vector of each sentence is obtained, measured using cosine similarity between two sentences Semantic Similarity.

Predicate matching method is as follows：

The similarity for calculate in sentence A in i-th of predicate and sentence B between j-th of predicate by DSSM models is S_ij, obtain similarity matrix N between any two between two sentence predicates：

Wherein n, m are respectively the number of predicate in two sentences；

The specific algorithm of predicate pairing is as follows：

301) all elements in row searching matrix N are pressed, find the maximum element of similarity, as sentence A and sentence B the One predicate matching pair；

302) row and column corresponding to the maximum element of similarity is deleted, ensures each predicate with another predicate only One pairing；

303) remaining element is put together the matrix N new as one_i, whether element is empty in judgement, if so, then Predicate pairing terminates, and otherwise continues executing with step 301), until all predicates all find unique pairing predicate.

When searching predicate matching pair, evaluate and test the testing material given in official website for SemEval2017 and only looked into from matrix N Maximum preceding 4 predicates of similarity are looked for, if the row or column of matrix searches predicate matching pair less than 4 dimensions according to actual conditions.

Sentence similarity based on semantic role calculates：By multiple languages of each predicate in multiple predicates of a sentence Adopted role carries out Semanteme collocation, calculates the similarity of semantic role, that is, is converted between predicate and between identical semantic role Similarity Measure, be specially：

If A, the maximum predicate of similarity is respectively A in B sentences₁、B₁, determine for the similarity of each predicate matching pair Justice is：

Wherein, n, m are respectively the number of the semantic role of corresponding predicate in sentence A and sentence B,For semanteme Role r_i、r_jBetween similarity, S (V_A1,V_B1) represent two predicates between similarity, α be semantic role similarity complete Shared proportion in sentence；

Similarity based on semantic role is defined as：

Wherein, the matching pair of count (V) predicates between sentence A, B, ∑ S (A_i,B_j) it is each predicate in formula (4) The sum of the corresponding semantic role similarity of pairing.

In sentence similarity calculation procedure, it is as the final similarity of sentence using the progress linear combination of above-mentioned two parts：

The similarity calculated based on DSSM models is designated as S₁, S will be designated as based on the similarity that semantic role calculates₂, Then the similarity of sentence is：

S (A, B)=β × S₁+(1-β)×S₂ (6)

β is represented based on DSSM sentence similarity in weight shared by the final similarity of sentence in above formula.

The invention has the advantages that and advantage：

1. the present invention proposes the method calculated based on the sentence similarity of deep semantic model and semantic character labeling, from Sentence structure, the semantic level of sentence are analyzed, and also utilize the predicate in upper sentence and its composition information dominated.

2. the inventive method makes full use of sentence structure information, predicate information, on baselines experiment basis, With reference to semantic role, Pearson correlation coefficient improves 2.226%, than the knot to be ranked the first on SemEval2017 evaluation and tests official website Fruit is higher by 0.266%, and (SemEval is Semantic Evaluation abbreviation, is to carry out a system to calculating semantic parsing system The assessment of row, mainly inquire into the essence of meaning in language.Task one is that sentence similarity is calculated in SemEval2017 Assessment).

Brief description of the drawings

Fig. 1 is the inventive method flow chart；

Fig. 2 is the DSSM models that the inventive method is related to.

Embodiment

With reference to Figure of description, the present invention is further elaborated.

Such as Fig. 1 institutes method, a kind of sentence similarity assessment side based on deep semantic model and semantic character labeling of the present invention Method, comprise the following steps：

1) deep semantic model is established：By relatively short text-string be mapped to feature in low semantic space to Measure, after the semantic feature vector for obtaining each sentence, the semantic phase between two sentences is measured using cosine similarity Like degree；

2) semantic role classification is handled：A0, A1, A2 (A0, A1, A2 are that disclosed semantic role identifies) existing semantic angle Color is retained, and other semantic roles are handled collectively as a kind of semantic role；

4) sentence similarity based on semantic role calculates：By in multiple predicates of a sentence on the basis of step 3) Multiple semantic roles of each predicate carry out Semanteme collocation, calculate the similarity of semantic role, that is, be converted between predicate with And the Similarity Measure between identical semantic role；

5) sentence similarity calculates：The similarity that DSSM models are calculated and the similarity calculated based on semantic role Two parts carry out final similarity of the linear combination as sentence.

The present invention proposes the sentence similarity based on semantic character labeling and calculated, using different semantic roles to be substantially single Member, consider multiple semantic verbs and the similarity of role in sentence.

In step 1), deep semantic model (Deep Structured Model, abbreviation DSSM) is as shown in Fig. 2 be a kind of Technology based on deep learning, it is mainly used in the semantic understanding of text, it is by relatively short text-string (such as sentence) The characteristic vector being mapped in low semantic space.These vectors can be used for document by comparing the similitude of document and inquiry Retrieval, this method are used for the result of file retrieval better than other.

DSSM represents a sentence in semantic vector space using typical deep neutral net (DNN) architecture (document).DNN is to be used as input using bag of words vector, and DSSM is subtracted using a kind of new word Hash (word harshing) The dimension of few bag of words vector.Word Hash is that the beginning and end of each word is added into a " # " respectively, then with three characters For a unit, the input as network.Such as word " cat ", beginning, ending become " #cat# " plus " # " respectively, three characters For a unit, become " #ca ", " cat ", " at# ".Represent word by this way, share 3073 kinds of situations, then by this 3073 Kind representation is expressed as input of the form of vector as neutral net.

DSSM models mainly include three parts, are respectively：Word Hash layer, hidden layer, the function of each layer of output layer are as follows It is shown：

l₁=W₁x (1)

l_i=f (W_il_i-1+b_i), i=2 ..., N-1 (2)

Y=f (W_Nl_N-1+b_N) (3)

X represents input vector, and y represents output vector, l_i, the output of i=1 ..., N-1 expression hidden layers, W_iRepresent i-th Weight, b_iI-th of biasing is represented, f () represents tanh activation primitives.The characteristic vector generated by word Hash layer is entered by hidden layer Row projection, and form semantic feature vector in output layer.After the semantic feature vector of each sentence is obtained, cosine is utilized Similarity measures the Semantic Similarity between two sentences.Except that can calculate the similarity between sentence, the model may be used also To calculate the similarity between word.The DSSM models that Fig. 2 is represented, wherein Q represent a question sentence, and D represents to treat candidate sentence subset Close, R represents the cosine similarity between two vectors, and P represents to choose the probability of some sentence in candidate sentences set.At this During invention sentence similarity calculates, D only selects a sentence, and need not calculate probable value P.

In step 2), semantic role classification processing, different syntax theory system has the classification of different semantic roles, What Meng Cong etc. was compiled《Verb usage dictionary》Noun object is divided into 14 classes, Li Linding by the case relation of itself and verb《The modern Chinese Sentence-type》In divided 21 classes etc..The species of semantic role is various, but because sentence similarity research is sentencing in simulation people Disconnected process, therefore semantic role can be subjected to classification processing.The present invention is retained original semantic role A0, A1, A2, Other semantic angles are handled collectively as a kind of semantic role, labeled as o_srl.

In step 3), predicate Similarity Measure a, it is generally the case that sentence can contain multiple predicates, if by two All predicates in sentence carry out the calculating of similarity two-by-two, and complexity is very big when can not only make the long sentence of more predicates, Er Qiehui Make experimental result influenced to different extents.Therefore more predicate problems are directed to, the present invention is proposed according to the phase between predicate Like degree size to sentence to carry out predicate pairing.

If the similarity in sentence A in i-th of predicate and sentence B between j-th of predicate is S_ij, two sentences can be obtained Similarity matrix N between predicate between any two (similarity between predicate is calculated by DSSM models)：

Wherein m, n are respectively the number of predicate in two sentences, and the specific algorithm of predicate pairing is as follows：

302) delete step 301) in the maximum element of the similarity that finds corresponding to row and column, that is, ensure each Predicate only uniquely matches with another predicate；

303) remaining element in step 302) is put together the matrix N new as one, element is in judgment matrix N No is sky, if so, then predicate pairing terminates, step 301) is otherwise continued executing with, until all predicates all find unique pairing Predicate.

P (p=min (n, m)) can be found to predicate matching pair by above-mentioned method, respectively for this p to carrying out The calculating of semantic role.

The present invention is improved predicate similarity and semantic role similarity integration algorithm, and major part obtains similarity Value obtains Similarity value to secondary part and plays a part of restriction.Therefore, if the predicate similarity-rough set in two sentences is low, that The similarity of semantic role corresponding to predicate is played a part of also reducing for the overall similarity of sentence, therefore is being looked into When looking for predicate matching pair, evaluate and test the testing material given in official website for SemEval2017 and only search similarity most from matrix N Big preceding 4 predicates, if the row or column of matrix searches predicate matching pair less than 4 dimensions according to actual conditions.

The sentence similarity based on semantic role calculates in step 4), and a sentence typically contains multiple predicates, each Predicate typically also includes multiple semantic roles, and predicate and this structure of semantic role are referred to as Semanteme collocation by the present invention.Calculate The similarity of semantic role is to be converted into the Similarity Measure between predicate and between identical semantic role, such as：

Sentence A：

A man is shaved in front of a lecture hall

Semantic role analysis result be：

[A0 A man]is[V shaved][o_srl in front of a lecture hall]

Sentence B：

A man is sitting in the grass

Semantic role analysis result be：

[A0 A man]is[V sitting][o_srl in the grass]

[shaved, A0, A man] in sentence A can be used as a language with [sitting, A0, the A man] in sentence B Justice collocation is to carrying out Similarity Measure, the similarity between as semantic role A0.Similarly, [shaved, o_srl, in front Of a lecture hall] and [sitting, o_srl, in the grass] semantic role for calculating it is semantic as other Similarity between role.

It is defined as (by taking the maximum predicate of similarity in A, B sentence as an example) for the similarity of each predicate matching pair：

In above formula, n, m are respectively the number of the semantic role of corresponding predicate in sentence A and sentence B,For language Adopted role r_i、r_jBetween similarity, S (V_A1,V_B1) represent two predicates between similarity, α be semantic role similarity exist Shared proportion in full sentence.Similarity between semantic role is calculated with DSSM models.

Similarity based on semantic role is defined as：

The matching pair of count (V) predicates between sentence A, B, ∑ S (A in above formula_i,B_j) be formula (4) in all predicates The sum of corresponding semantic role similarity.

The result that sentence similarity calculates in step 5) is made up of two parts：The similarity that is calculated based on DSSM models and The similarity calculated based on semantic role, above-mentioned two parts are subjected to final similarity of the linear combination as sentence.

S (A, B)=β × S₁+(1-β)×S₂ (6)

β is represented based on DSSM sentence similarity in weight, value 0.6 shared by the final similarity of sentence in above formula.

As shown in figure 1, input sentence pair, divides two parts to carry out the calculating of sentence similarity, a part is based on DSSM sentences Similarity Measure, another part are to carry out sentence similarity calculating, most two-part experimental result line at last based on semantic role Property combination as final sentence similarity.

Such as：

Sentence A：

[A0 A man]is[V shaved][o_srl in front of a lecture hall]

Sentence B：

[A0 A man]is[V sitting][o_srl in the grass]

Predicate matching to for：shaved、sitting

The corresponding semantic semantic role of predicate is：

A0：A man A man

o_srl：in front of a lecture hall in the grass

Above-mentioned predicate matching carries out the calculating of similarity to, semantic role respectively, and the result of calculating is carried out into linear group Close, obtain the sentence similarity S based on semantic character labeling₁。

Two sentences in above-mentioned example are subjected to sentence phase using the existing instrument sent2vec based on DSSM models Like the calculating of degree, S is designated as₂。

Above-mentioned two direction is carried out to the result S of Similarity Measure₁、S₂It is similar as final sentence to carry out linear combination Degree.

It is in the upper experimental result of SemEval2017 language materials：

The experimental result of table 1

The experimental result that baseline experiments are drawn based on DSSM models.Semantic character labeling selected by the present invention Instrument reached 88.25% in the test_wsj data set F values in CoNLL2005Shared Task.

On baselines experiment basis, with reference to semantic role, Pearson correlation coefficient improves 2.226%, than The result of (ruthva) of being ranked the first on SemEval2017 evaluation and tests official website is higher by 0.266%.The result of semantic role identification is entered Go and partly identified wrong amendment, the Pearson correlation coefficient after correcting has reached 0.85936, improved than baseline 2.416%.Experimental result, which illustrates semantic role identification to be dissolved into the calculating of sentence similarity, can make up current method and exist The defects of using in terms of semantic information, and finally lift the result of calculation of sentence similarity.

Claims

A kind of 1. sentence similarity appraisal procedure based on deep semantic model and semantic character labeling, it is characterised in that including with Lower step：

1) deep semantic model is established：Text-string is mapped to the characteristic vector in low semantic space, obtains each sentence Semantic feature vector after, measure the similarity between two sentences using cosine similarity；

2) semantic role classification is handled：The existing semantic role of A0, A1, A2 is retained, other semantic roles collectively as A kind of semantic role is handled；A0, A1, A2 are disclosed semantic role mark；

3) predicate Similarity Measure：On semantic role basis of classification, for more predicate sentences, according to the similarity between predicate Size to sentence to carrying out the pairing of predicate, predicate matching pair is obtained, respectively for multiple predicate matchings to carrying out semantic angle The calculating of color, obtain the similar calculated value between semantic role；

4) sentence similarity based on semantic role calculates：According to similarity value calculation between semantic role by the more of sentence Multiple semantic roles of each predicate carry out Semanteme collocation in individual predicate, calculate the similarity of semantic role, that is, are converted into meaning Similarity Measure between word and between identical semantic role；

5) sentence similarity calculates：The similarity that deep semantic model is calculated and the similarity calculated based on semantic role Two parts carry out final similarity of the linear combination as sentence.
2. the sentence similarity appraisal procedure according to claim 1 based on deep semantic model and semantic character labeling, It is characterized in that：Deep semantic model includes word Hash layer, hidden layer and output layer three parts, and each layer of function is such as Shown in lower：

l₁=W₁x (1)

l_i=f (W_il_i-1+b_i), i=2 ..., N-1 (2)

Y=f (W_Nl_N-1+b_N) (3)

Wherein, x is input vector, and y is output vector, l_i, i=1 ..., N-1 are the output of hidden layer, W_iThe i-th weight is represented, b_iI-th of biasing is represented, f (*) represents tanh activation primitives；

The characteristic vector generated by word Hash layer is projected by hidden layer, and forms semantic feature vector in output layer；

After the semantic feature vector of each sentence is obtained, the semanteme between two sentences is measured using cosine similarity Similitude.
3. the sentence similarity appraisal procedure according to claim 1 based on deep semantic model and semantic character labeling, It is characterized in that：Predicate matching method is as follows：

The similarity for calculate in sentence A in i-th of predicate and sentence B between j-th of predicate by DSSM models is S_ij, obtain To the similarity matrix N between two sentence predicates between any two：

<mrow> <mi>N</mi> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msub> <mi>S</mi> <mn>11</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>S</mi> <mrow> <mn>1</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mo>...</mo> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <mo>...</mo> </mtd> </mtr> <mtr> <mtd> <msub> <mi>S</mi> <mrow> <mi>n</mi> <mn>1</mn> </mrow> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>S</mi> <mrow> <mi>n</mi> <mi>m</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> </mrow>

Wherein n, m are respectively the number of predicate in two sentences；

The specific algorithm of predicate pairing is as follows：

301) all elements in row searching matrix N are pressed, the maximum element of similarity are found, as sentence A and sentence B first Predicate matching pair；

302) row and column corresponding to the maximum element of similarity is deleted, ensures that each predicate is only uniquely matched somebody with somebody with another predicate It is right；

303) remaining element is put together the matrix N new as one_i, whether element is empty in judgement, if so, then predicate Pairing terminates, and otherwise continues executing with step 301), until all predicates all find unique pairing predicate.
4. the sentence similarity appraisal procedure according to claim 3 based on deep semantic model and semantic character labeling, It is characterized in that：When searching predicate matching pair, the testing material given in official website is evaluated and tested only from matrix N for SemEval2017 It is middle to search maximum preceding 4 predicates of similarity, if the row or column of matrix searches predicate matching less than 4 dimensions according to actual conditions It is right.
5. the sentence similarity appraisal procedure according to claim 1 based on deep semantic model and semantic character labeling, It is characterized in that：Sentence similarity based on semantic role calculates：By in multiple predicates of a sentence each predicate it is more Individual semantic role carries out Semanteme collocation, calculates the similarity of semantic role, that is, is converted between predicate and identical semantic role Between Similarity Measure, be specially：

If A, the maximum predicate of similarity is respectively A in B sentences₁、B₁, it is defined as the similarity of each predicate matching pair：

<mrow> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>B</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>&alpha;</mi> <mo>&times;</mo> <mfrac> <mrow> <mi>&Sigma;</mi> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>r</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>r</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>,</mo> <mi>m</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>&times;</mo> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mrow> <mi>A</mi> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>V</mi> <mrow> <mi>B</mi> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

Wherein, n, m are respectively the number of the semantic role of corresponding predicate in sentence A and sentence B,For semantic role r_i、r_jBetween similarity, S (V_A1,V_B1) represent two predicates between similarity, α be semantic role similarity in full sentence Shared proportion；

Similarity based on semantic role is defined as：

<mrow> <mi>S</mi> <mrow> <mo>(</mo> <mi>A</mi> <mo>,</mo> <mi>B</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>&Sigma;</mi> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>c</mi> <mi>o</mi> <mi>u</mi> <mi>n</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>V</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>

Wherein, the matching pair of count (V) predicates between sentence A, B, ∑ S (A_i,B_j) it is each predicate matching pair in formula (4) The sum of corresponding semantic role similarity.
6. the sentence similarity appraisal procedure according to claim 1 based on deep semantic model and semantic character labeling, It is characterized in that in sentence similarity calculation procedure, above-mentioned two parts are subjected to final similarity of the linear combination as sentence For：

The similarity calculated based on DSSM models is designated as S₁, S will be designated as based on the similarity that semantic role calculates₂, then sentence Son similarity be：

S (A, B)=β × S₁+(1-β)×S₂ (6)

β is represented based on DSSM sentence similarity in weight shared by the final similarity of sentence in above formula.