CN108536665A

CN108536665A - A kind of method and device of determining sentence consistency

Info

Publication number: CN108536665A
Application number: CN201710121244.9A
Authority: CN
Inventors: 王煦祥; 尹庆宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-03-02
Filing date: 2017-03-02
Publication date: 2018-09-14

Abstract

The invention discloses a kind of methods of determining sentence consistency, including：The sentence group of consistency to be determined is received, sentence group includes at least two sentences；The consistency probability of at least two sentences is determined by sentence consistency discrimination model, the training characteristics of sentence consistency discrimination model include syntactic structure similarity feature, and syntactic structure similarity is characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence；When consistency probability is more than probability threshold value, determine that at least two sentences are consistent.The embodiment of the present invention also provides corresponding device.Technical solution of the present invention is due to including syntactic structure similarity feature in the similarity feature of various dimensions, the consistency of sentence is determined by the similarity feature of multiple dimensions comprising syntactic structure similarity feature, the accuracy for judging sentence consistency can be improved, and then improves the accuracy of artificial intelligence question answering system and search engine.

Description

A kind of method and device of determining sentence consistency

Technical field

The present invention relates to field of computer technology, and in particular to a kind of method and device of determining sentence consistency is established The method and device of sentence consistency discrimination model.

Background technology

With the development of artificial intelligence, there is the robot that can be answered a question, as very famous on network：" virtuous two " and Still, the robot exactly developed by the method for artificial intelligence, can the person of answering the question the problem of.The principle of search engine is real It is also that problem input by user is identified on border, corresponding answer is then searched for according to recognition result.

Either robot or search engine are all based on sentence identification model and first identify the problems in question sentence, so The answer for the problem is determined according to the problem of identifying, and then feed back to quizmaster's again afterwards.

What the similarity that sentence identification model in the prior art is typically based on sentence was trained, as long as sentence The similar sentence that usually can be all identified as equivalent of keyword in son, but the keyword of some possible question sentences in practice It is all identical, but the semanteme of question sentence is different, and sentence identification model is identified as being same sentence, and it is wrong to may result in identification Accidentally, to the recognition result of output error, such as " whose father Yao Ming is " and " whom father of Yao Ming is ", if statement identification Both of these problems are considered as same, one of result of return exactly mistake by model.Therefore, how rapidly and accurately to differentiate and ask The consistency of sentence is most important for the identification accuracy for promoting artificial intelligence question answering system and search engine.

Invention content

In order to promote the accuracy of artificial intelligence question answering system and search engine, the embodiment of the present invention provides a kind of determining language The method and device of sentence consistency, the method and device for establishing sentence consistency discrimination model.

First aspect present invention provides a kind of method of determining sentence consistency, including：

The sentence group of consistency to be determined is received, the sentence group includes at least two sentences；

The consistency probability of at least two sentence, the sentence consistency are determined by sentence consistency discrimination model The training characteristics of discrimination model include syntactic structure similarity feature, and the syntactic structure similarity is characterized in by sample Sentence carries out what feature extraction obtained according to syntactic structure；

When the consistency probability is more than probability threshold value, determine that at least two sentence is consistent.

Second aspect of the present invention provides a kind of method for establishing sentence consistency discrimination model, including

Obtain multiple sample sentence groups for training pattern, wherein each sample sentence group includes at least two Sample sentence, at least two samples sentence all carry sample label, and the sample label is used to indicate described at least two The semanteme of sample sentence is identical or differs；

For at least two samples sentence, the similarity feature of each dimension in multiple dimensions is extracted, it is described each The similarity feature of dimension includes syntactic structure similarity feature, and the syntactic structure similarity is characterized in by sample sentence Carry out what feature extraction obtained according to syntactic structure；

Model training is carried out according to the similarity feature of each dimension, establishes sentence consistency discrimination model.

Third aspect present invention provides a kind of device of determining sentence consistency, including：

Receiving unit, the sentence group for receiving consistency to be determined, the sentence group include at least two sentences；

First determination unit, for being determined described in the receiving unit reception at least by sentence consistency discrimination model The training characteristics of the consistency probability of two sentences, the sentence consistency discrimination model include that syntactic structure similarity is special Sign, the syntactic structure similarity are characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence；

Second determination unit, the consistency probability for being determined when first determination unit are more than probability threshold value When, determine that at least two sentence is consistent.

Fourth aspect present invention provides a kind of device for establishing sentence consistency discrimination model, including：

Acquiring unit, for obtaining multiple sample sentence groups for training pattern, wherein in each sample sentence group Including at least two sample sentences, at least two samples sentence all carries sample label, and the sample label is used to indicate The semanteme of at least two samples sentence is identical or differs；

Determination unit, at least two samples sentence for being obtained for the acquiring unit, extracts multiple dimensions In each dimension similarity feature, the similarity feature of each dimension includes syntactic structure similarity feature, the sentence Method structural similarity is characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence；

The similarity feature of model training unit, each dimension for being determined according to the determination unit carries out mould Sentence consistency discrimination model is established in type training.

It is close with some sentences are unable to judge accurately in the prior art, but the different sentence of syntax is compared, and the present invention is implemented The method that example provides attribute sentence consistency really, includes syntactic structure similarity feature in the similarity feature of various dimensions, is led to The similarity feature for crossing multiple dimensions comprising syntactic structure similarity feature determines the consistency of sentence, can improve judgement language The accuracy of sentence consistency, and then improve the accuracy of artificial intelligence question answering system and search engine.

Description of the drawings

Fig. 1 is an embodiment schematic diagram of the method that sentence consistency discrimination model is established in the embodiment of the present invention；

Fig. 2 is the process schematic of a similarity calculation in the embodiment of the present invention；

Fig. 3 is an example schematic diagram of interdependent syntactic analysis in the embodiment of the present invention；

Fig. 4 is another example schematic diagram of interdependent syntactic analysis in the embodiment of the present invention；

Fig. 5 is an embodiment schematic diagram of the method that sentence consistency is determined in the embodiment of the present invention；

Fig. 6 is an embodiment schematic diagram of the device that sentence consistency is determined in the embodiment of the present invention；

Fig. 7 is another embodiment schematic diagram for the device that sentence consistency is determined in the embodiment of the present invention；

Fig. 8 is an embodiment schematic diagram of the device that sentence consistency discrimination model is established in the embodiment of the present invention；

Fig. 9 is another embodiment schematic diagram for the device that sentence consistency discrimination model is established in the embodiment of the present invention；

Figure 10 is another embodiment schematic diagram for the device that sentence consistency is determined in the embodiment of the present invention.

Specific implementation mode

Below in conjunction with the accompanying drawings, embodiments herein is described, it is clear that described embodiment is only the application The embodiment of a part, instead of all the embodiments.Those of ordinary skill in the art it is found that with artificial intelligence development and The appearance of new opplication scene, technical solution provided by the embodiments of the present application are equally applicable for similar technical problem.

The embodiment of the present invention provides a kind of method of determining sentence consistency, is wrapped in the multiple similarity features considered Syntactic structure feature has been included, the consistency of sentence is determined by syntactic structure, the accuracy for judging sentence consistency can be improved, And then improve the accuracy of artificial intelligence question answering system and search engine.The embodiment of the present invention also provides corresponding determining sentence one The device of cause property, the method and device for establishing sentence consistency discrimination model.It is described in detail separately below.

Sentence is similar for the problem that being unable to judge accurately, but the sentence consistency that syntax is different, and the embodiment of the present invention can To establish a kind of sentence consistency discrimination model, which can accurately sentence the consistency of sentence It is disconnected.The method that the sentence consistency discrimination model is established in the embodiment of the present invention is introduced with reference to Fig. 1.

As shown in Figure 1, an embodiment packet of the method provided in an embodiment of the present invention for establishing sentence consistency discrimination model It includes：

101, multiple sample sentence groups for training pattern are obtained, wherein each sample sentence group includes at least Two sample sentences, at least two samples sentence all carry sample label, the sample label be used to indicate it is described at least The semanteme of two sample sentences is identical or differs.

It may include two sample question sentences in sample sentence group in the embodiment of the present invention, can also be asked including multiple samples , it is illustrated for including two sample question sentences in the embodiment of the present invention.

Sample label is used to identify semantic whether identical, the example of at least two sample sentences in same sample sentence group Such as：It can be differed, identified with 1 identical to identify with 0.

Such as：If in a sample sentence group two included sample sentences be respectively " whose father Yao Ming is " and " whom the father of Yao Ming is ", then the sample label of the sample sentence group is 0, if two samples included in a sample sentence group This sentence is respectively " how being formatted to USB flash disk " and " how being formatted to USB flash disk ", then the sample label of the sample sentence group is 1.

102, it is directed at least two samples sentence, extracts the similarity feature of each dimension in multiple dimensions, it is described The similarity feature of each dimension includes syntactic structure similarity feature, and the syntactic structure similarity is characterized in by sample Sentence carries out what feature extraction obtained according to syntactic structure.

Various dimensions can be multiple dimensions of semanteme, structure, statistics etc. in the embodiment of the present invention.

The similarity feature of each dimension may include String kernel (String Kernel) spy in the embodiment of the present invention Sign, Hownet features, TF-IDF features, Term Weight features, Rank features, syntactic structure feature and generic features etc..

The explanation of above feature classification and corresponding each category feature can be understood refering to table 1.

Table 1：Mark sheet

103, model training is carried out according to the similarity feature of each dimension, establishes sentence consistency discrimination model.

Similarity calculation principle for String Kernel features can be passed through according to String Kernel algorithms Feature in the structure of character string carrys out the similarity between calculating character string.The algorithm is input, sample language with sample sentence group The similarity of sample sentence is output in sentence group, and flow chart is illustrated in fig. 2 shown below.

201, the sample sentence in sample sentence group is pre-processed.

202, String kernel function calculating is carried out to sample sentence.

String Kernel calculate the similarity numerical value between pretreated sample sentence, are mainly based upon character string The method of kernel function.Given character string, that is, sample sentence are split into substring set first, the length of substring can lead to Parameter regulation is crossed, the similarity between substring set is then calculated by kernel function.

203, similarity score regularization is carried out to the similarity of calculated sample sentence.

The similarity in sample sentence group between sample sentence is obtained by linear combining.

204, the similarity numerical value of sample sentence in sample sentence group is obtained.

It is the description to the similarity calculation principle of String Kernel features above, Hownet feature phases is described below Like the training of degree.

Hownet, that is, Hownet is a Chinese semantic knowledge dictionary in detail, and in Hownet, each concept is by multiple justice Original composition, concept is not the node in this hierarchical system, and adopted original is only.Moreover, each justice is former simultaneously in the concept of word It is not equality, has complicated relationship between them, indicated by a kind of special Knowledge Description Language.

It is made of the conceptual description of notional word the description formula of following three kinds of forms：

(1) the former description formula of independent justice：It is indicated by " basic meaning is former " or " (specific word) "；

(2) the former description formula of relationship justice：By " relationship justice original=basic meaning former " or " relationship justice former=(specific word) " or " (former=specific word of relationship justice) " indicates, wherein relationship justice original refer to comprising " EventRole | dynamic character " and The justice of " EventFeatures | dynamic attribute " this two class is former；

(3) Signifier original describes formula：It is indicated by " relational symbol basic meaning is former " or " relational symbol (specific word) ", wherein Relational symbol include " #, %, $, * ,+, ＆ ,@,、！", the relationship respectively represented repeats no more.

For two word W₁And W₂If W₁There is n concept [S₁₁, S₁₂..., S_1n], W₂There is m concept [S₂₁, S₂₂..., S_2m], W₁And W₂Similarity Sim (W₁, W₂) be each concept similarity maximum value, as shown in formula (1)：

Because all concepts are all finally attributed to be indicated originally with justice, the former similarity calculation of justice is that concept is similar The basis of degree.It, can be simply by language since all adopted primitive roots constitute the former hierarchical system of tree-shaped justice according to hyponymy Justice distance calculates similarity.Adopted former semantic distance is shown in as shown in formula (2).

Wherein (d+a) and indicate that two justice are former, d is and the path length in adopted former hierarchical system.α is one adjustable The parameter of section can set the value of α in the Similarity of Words computational methods based on Hownet, such as：α=0.5.

The semantic formula of notional word concept is divided into four parts：

(1) the first independent former description formula of justice：Use Sim₁(S₁, S₂) this part similarity for indicating two concepts, particularly as being The former similarity of two justice, can be calculated according to formula (3)；

(2) other independent former description formulas of justice：Every other independent justice in semantic formula in addition to the first independent justice is former Original uses Sim₂(S₁, S₂) indicate this part similarities of two concepts, it is specific to calculate that steps are as follows：

I, first all independent justice predecessor meanings two expression formulas are matched, and the former similarity of justice two-by-two is calculated；

The maximum a pair of II, value is classified as one group；

III executes the i-th i steps to the remaining independent former similarity two-by-two of justice.Repeatedly, grouping is all completed until all. Any former similarity with null value of justice is defined as constant δ, such as：δ=0.2；

IV is finally averaged.

(3) the former description formula of relationship justice：All relationships justice is former in semantic formula, uses Sim₃(S₁, S₂) indicate two concepts This part similarity, the former identical description formula of relationship justice is divided into one group, similarity is calculated, finally averages.

(4) Signifier original describes formula：All Signifiers are former in semantic formula, use Sim₄(S₁, S₂) indicate two concepts This part similarity, be divided into one group relational symbol is identical, calculate similarity, finally average.

In conclusion shown in the similarity calculating method of two words such as formula (3).

Have the semantic similarity between word and word, so that it may since calculate sentence between semantic similarity.We calculate sentence The method of semantic similarity is between son：If 2 sentences A and B, the word that A includes is A₁, A₂..., A_m, the word that B includes is B₁, B₂..., B_n, then word A_i(1im) and B_jSimilarity between (1jn) can use s (A_i,B_j) indicate, thus obtain two sentences The similarity of middle any two word, the semantic similarity s (A, B) between sentence A and sentence B is such as shown in formula (4).

Wherein, a_i=max (s (A_i,B₁),,s(A_i,B₂),…,s(A_i,B_n)), b_j=max (s (B_j,A₁),s(B_j, A₂),,…,s(B_j,A_m))。

It is that the similarity to TF-IDF features is described below to the description of the similarity calculation principle of Hownet features above Computing Principle.

The method that TF-IDF calculates similarity is a kind of method of vector space model (VSM) text similarity amount.It is counting When calculation, weight is assigned to each lexical item using TF-IDF.The meeting when calculating the weight of lexical item in vector using TF-IDF methods It is related to two concepts：

A) TF- word frequency.The frequency that some lexical item occurs in a text thinks that the word frequency of some word is got under normal conditions Greatly, it is more related to the theme of text.

B) IDF- is against text frequency.The frequency that some lexical item occurs in more texts of text collection is higher, the lexical item Separating capacity it is poorer.Such as in a text collection for containing 100, if some lexical item A goes out in 50 texts It is existing, and another lexical item B only occurs in 5 texts, then lexical item B ratios A has better separating capacity.

The TF-IDF values that each lexical item is calculated using above-mentioned concept, as shown in formula (5).

TF-IDF(ω_i)=tf (ω_i)×idf(ω_i)=tf (ω_i)×log(N/df(ω_i)) formula (5)

Wherein, TF-IDF (ω_i) indicate current lexical item ω_iTF-IDF values, the value be equal to lexical item ω_iWord frequency tf (ω_i) With inverse text frequency idf (ω_i) product, specifically, any lexical item ω in text j_iTF-IDF values can pass through tf (ω_i) With log (N/df (ω_i)) be calculated；tf(ω_i) indicate current lexical item ω_iThe frequency occurred in text j；N indicates text set The sum of all texts in conjunction；df(ω_i) indicate current lexical item ω occurs in how many text in text collection_i.Herein, Each question sentence is obtained into sample sentence as a text by carrying out above-mentioned analysis to each lexical item in sample sentence group Then the TF-IDF values of each lexical item in group are to establish a vector space mould per sample sentence using these TF-IDF values The similarity between two sentences is calculated by cosine in type.

It is the description to the similarity calculation principle of TF-IDF features above, is described below to Term Weight features Similarity calculation principle.

The method for calculating similarity using Term Weight is also based on vector space model (VSM) text similarity A kind of method of amount.Its with TF-IDF calculate similarity method the difference is that give the entitled method of lexical item.It utilizes Search engine assigns power by the coincidence factor of search result for the lexical item in sentence.The used specific method of power is assigned for lexical item It is as follows：

1) entire sentence is put into and is retrieved in search engine, record preceding 20 retrieval results.

2) remove a word, then sentence is put into search engine and is retrieved, record preceding 20 retrieval results.

3) percentage that secondary retrieval result accounts for the retrieval result of first time is calculated, is then subtracted with 1, obtained numerical value It is regarded as importance scores of this word in sentence.The score of word is bigger, illustrates more important, and weight is bigger.

The weight of each lexical item in a sentence can be obtained by this method.But, it is contemplated that if to each language Sentence will be put into search engine and retrieve repeatedly, and time loss is bigger, so we are using the SVM-RANK in machine learning Algorithm is reached automatically by learning to the entitled purpose of lexical item in sentence.

It for sentence, first has to pre-process, pretreatment includes participle, part-of-speech tagging, according to syntactic analysis etc., to obtain Syntactical relationship between the part of speech feature and word of word itself.Following spy is had chosen for each lexical item in sentence Sign：

1)NOUN：Whether the word is noun.

2)S&C：Whether the word is subject or object.

3)TermFreq：Word frequency, the frequency that word occurs in entire document.

4)DocFreq：There is the number of the document of the word in entire document in document frequency.

In this way, the weight that each lexical item in a sentence can be obtained, then with TF-IDF similarity calculations Method is the same, establishes vector space model for each text (question sentence), is calculated by cosine similar between two sentence pairs Degree.

It is that the phase to Rank features is described below to the description of the similarity calculation principle of Term Weight features above Like degree Computing Principle.

In Algorithms for Page Ranking, if a webpage, by a lot of other web page interlinkages, this webpage is to come relatively Say important webpage.According to this thought, from the dependency tree of a sentence, by the dependence of each lexical item, Measurement is made that the weight of each word.Such as：For sentence " I has bought an iPhone " in this sentence, word The interdependent chain in-degree of " buying " is 2, and the interdependent chain in-degree of " mobile phone " is also 2, other lexical items are 1.Then as can be seen that in this sentence In, " buying " and " mobile phone " is comparatively important word, therefore the weight obtained using this method, also can be from certain Aspect reacts importance of the lexical item in sentence.We utilize such method, and power is assigned to the word in a sentence by in-degree. For word w, shown in weight such as formula (6).

Weight (w)=In (w)/Norm formula (6)

Wherein, In (w) indicates the interdependent chain in-degree of word w.Norm be this sentence in all words in-degree and.

It calculates in sentence after the weight of each word, is that question sentence establishes vector space model using weight, then passes through The similarity between two sentence pairs is calculated in cosine.

For Hownet features, TF-IDF features, Term Weight features, the similarity of Rank features and generic features Computational methods can the computational methods ripe according to above-mentioned principle and these types of feature carry out similarity calculation, this hair The similarity calculation of syntactic structure feature is primarily directed in bright embodiment, below in conjunction with the accompanying drawings to being directed to syntactic structure feature The calculating process of similarity illustrate.

Syntactic structure is characterized in the feature extracted from syntactic structure.Which includes core predicate HED, subject-predicate relationships The relationships such as SBV, dynamic guest's relationship VOB, attribute, the adverbial modifier and complement.

Involved syntactic relation can be understood refering to table 2 on syntactic structure.

Table 2：Syntactic type table

Relationship type	It identifies (tag)	Explanation
			Subject-predicate relationship	SBV	Subject-verb
Dynamic guest's relationship	VOB	Direct object, verb-object
			Between guest's relationship	IOB	Indirect object, indirect-object
Preposition object	FOB	fronting-object
			And language	DBL	double
Relationship in fixed	ATT	attribute
			Verbal endocentric phrase	ADV	adverbial
Structure of complementation	CMP	complement
			Parallel construction	COO	coordinate
Guest's Jie relationship	POB	preposition-object
			Left additional relationships	LAD	left adjunct
Right additional relationships	RAD	right adjunct
			Absolute construction	IS	independent structure
Core predicate	HED	head

When from the similarity for calculating sample sentence in sample sentence group in terms of syntax structure feature, in the multiple dimension Including syntactic structure dimension, the similarity feature of syntactic structure dimension in the multiple dimensions of extraction may include：

Interdependent syntactic analysis is carried out at least two samples sentence, to respectively obtain at least two samples sentence In each sample sentence interdependent syntax tree；

Each sample language is determined according to the interdependent syntax tree of each sample sentence in at least two samples sentence The syntax tuple of sentence；

The syntax tuple for comparing each sample sentence determines that the syntactic structure of the syntactic structure dimension is similar Spend feature.

Wherein, the interdependent syntax tree according to each sample sentence in at least two samples sentence determines described every The syntax tuple of a sample sentence may include：

Determine that the core predicate of each sample sentence, subject-predicate close according to the interdependent syntax tree of each sample sentence System and dynamic guest's relationship, using the core predicate of each sample sentence, subject-predicate relationship and dynamic guest's relationship as each sample The component of the syntax tuple of sentence.

In the embodiment of the present invention, as shown in figure 3, if two sample sentences in sample sentence group are respectively " Xie Tingfeng It is whose son " and " whom the son of Xie Tingfeng is ", sample label indicates that the two sample sentences are different, during training, The similarity feature for needing first to extract each dimension for example enumerated above, by taking syntactic structure dimension as an example, to sample sentence " whose son Xie Tingfeng is " carries out interdependent syntactic analysis, to obtain interdependent syntax tree as shown in Figure 3, to sample sentence " Xie Ting Whom the son of cutting edge of a knife or a sword is " interdependent syntactic analysis is carried out, to obtain interdependent syntax tree as shown in Figure 4.

As shown in figure 3, core predicate HED is "Yes", subject-predicate relationship SBV is " Xie Tingfeng is ", and dynamic guest's relationship VOB is " to be Who ", in the embodiment of the present invention, is analyzed by taking tri- elements of HED, SBV and VOB as an example, then can determine the sample from Fig. 3 " Xie Tingfeng be, who " the syntax tuple of sentence be, which is referred to as triple.

As shown in figure 4, core predicate HED is "Yes", subject-predicate relationship SBV is " son is ", and dynamic guest's relationship VOB is " to be Who ", in the embodiment of the present invention, is analyzed by taking tri- elements of HED, SBV and VOB as an example, then can determine the sample from Fig. 3 " son be, who " the syntax tuple of sentence be, which is referred to as triple.

Because the sample label of the sample sentence group is sentence difference, then two can be compared out from the example of Fig. 3 and Fig. 4 Triple is different.It can then determine that the similarity of the syntactic structure dimension is characterized as 0.The similarity knot of the syntactic structure dimension Fruit feature 0, in conjunction with the similarity feature of other each dimensions, so that it may to obtain the similarity characteristic set of a various dimensions, The similarity characteristic set is used into logistic regression algorithm (Logistic Regression, LR), support vector machines The machine learning algorithms such as (Support Vector Machine, SVM), maximum entropy, Bayes carry out model training, so that it may with To sentence consistency discrimination model.

After the completion of sentence consistency discrimination model training, need to be surveyed by the sample sentence group with sample label Examination, test can just be used after passing through.

Sentence consistency model in the embodiment of the present invention can be used for intelligent robot question answering system and data retrieval system In system.

The training method of sentence consistency discrimination model provided in an embodiment of the present invention, introduces during model training Syntactic structure feature, it is similar for sentence, but the differentiation of the different sentence of syntax can improve differentiation accuracy rate, to improve The accuracy of question answering system.

The embodiment of the present invention additionally provides the method based on sentence consistency discrimination model attribute sentence consistency really.

Refering to Fig. 5, the method for determining sentence consistency provided in an embodiment of the present invention includes：

301, the sentence group of consistency to be determined is received, the sentence group includes at least two sentences.

302, the consistency probability of at least two sentence, the sentence one are determined by sentence consistency discrimination model The training characteristics of cause property discrimination model include syntactic structure similarity feature, and the syntactic structure similarity is characterized in by right Sample sentence carries out what feature extraction obtained according to syntactic structure.

The training process of sentence consistency discrimination model in the embodiment of the present invention can refering in previous embodiment to mould The embodiment of type training is understood.

303, when the consistency probability is more than probability threshold value, determine that at least two sentence is consistent.

Such as：If probability threshold value is 80%, the similarity of at least two sentences is 90%, then can determine at least two languages Sentence is consistent.

Optionally, the consistency probability that at least two sentence is determined by sentence consistency discrimination model, can To include：

For at least two sentence, the similarity feature of each dimension in multiple dimensions is extracted；

According to the similarity feature of each dimension, the consistency probability of at least two sentence is determined.

Optionally, syntactic structure dimension in the multiple dimension, the multiple dimension includes syntactic structure dimension, described The similarity feature for extracting syntactic structure dimension in multiple dimensions may include：

Interdependent syntactic analysis is carried out at least two sentence, to respectively obtain each language at least two sentence The interdependent syntax tree of sentence；

The syntax tuple of each sentence is determined according to the interdependent syntax tree of each sentence at least two sentence；

The syntax tuple for comparing each sentence determines that the syntactic structure similarity of the syntactic structure dimension is special Sign.

In the embodiment of the present invention, interdependent syntax tree can be understood refering to the example of Fig. 3 and Fig. 4, such as：If to be determined There are two the sentences of consistency, and one of sentence is " whose son Xie Tingfeng is ", another sentence is that " thanking to thunderbolt cutting edge of a knife or a sword son is Who ", it can determine the similarity feature of two sentences in syntactic structure dimension for example, by the analytic process of Fig. 3 and Fig. 4 It is 0.If one of sentence is " whose son Xie Tingfeng is ", another sentence is " whom the father of Xie Tingfeng is ", passes through language Sentence consistency discrimination model can determine that similarity of two sentences in syntactic structure dimension is characterized as 1, if in conjunction with it The similarity feature of his dimension determines that the similarity probability of the two sentences is more than probability threshold value, it is determined that the two is consistent.

Optionally, the interdependent syntax tree according to each sentence at least two sentence determines each sentence Syntax tuple, including：

Core predicate, subject-predicate relationship and the dynamic guest of each sentence are determined according to the interdependent syntax tree of each sentence Relationship, using the core predicate of each sentence, subject-predicate relationship and dynamic guest's relationship as the syntax tuple of each sentence Component.

Optionally, described when the consistency probability is more than probability threshold value, determine at least two sentence it is consistent it Afterwards, the method further includes：

Optimize the sentence consistency discrimination model according at least two sentence.

In the embodiment of the present invention, sentence consistency discrimination model in use, can also be constantly consistent to sentence Property discrimination model optimizes, to improve the accuracy of sentence consistency discrimination model.

Refering to Fig. 6, an embodiment of the device 40 of determining sentence consistency provided in an embodiment of the present invention includes：

Receiving unit 401, the sentence group for receiving consistency to be determined, the sentence group include at least two languages Sentence；

First determination unit 402, the institute for determining the reception of the receiving unit 401 by sentence consistency discrimination model The consistency probability of at least two sentences is stated, the training characteristics of the sentence consistency discrimination model include that syntactic structure is similar Feature is spent, the syntactic structure similarity is characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence；

Second determination unit 403, the consistency probability for being determined when first determination unit are more than probability threshold When value, determine that at least two sentence is consistent.

In the embodiment of the present invention, receiving unit 401 receives the sentence group of consistency to be determined, the sentence group include to Few two sentences；First determination unit 402 determines that the receiving unit 401 receives for passing through sentence consistency discrimination model At least two sentence consistency probability, the training characteristics of the sentence consistency discrimination model include syntactic structure Similarity feature, the syntactic structure similarity are characterized in obtaining by carrying out feature extraction according to syntactic structure to sample sentence 's；Second determination unit 403 determines institute when the consistency probability that first determination unit determines is more than probability threshold value It is consistent to state at least two sentences.With accurate can not judge that some sentences are close in the prior art, but the different sentence of syntax is compared, Inventive embodiments provide the device of attribute sentence consistency really, include syntactic structure similarity in the similarity feature of various dimensions Feature determines the consistency of sentence, Ke Yiti by the similarity feature of multiple dimensions comprising syntactic structure similarity feature Height judges the accuracy of sentence consistency, and then improves the accuracy of artificial intelligence question answering system and search engine.

Optionally, on the basis of the above embodiments, the device 40 of determining sentence consistency provided in an embodiment of the present invention Another embodiment in,

First determination unit 402 is used for：

First determination unit 402 is used for：The multiple dimension includes syntactic structure dimension,

First determination unit 402 is used for：

Optionally, refering to Fig. 7, on the basis of the above embodiments, determining sentence consistency provided in an embodiment of the present invention Device 40 another embodiment in, described device 40 further includes：

Optimize unit 404, after determining that at least two sentence is consistent in second determination unit 403, root Optimize the sentence consistency discrimination model according at least two sentence.

The device of determining sentence consistency provided in an embodiment of the present invention can be refering to fig. 1 to the corresponding description of the parts Fig. 5 Understood, this place is no longer described in detail.

Refering to Fig. 8, an embodiment packet of the device 50 provided in an embodiment of the present invention for establishing sentence consistency discrimination model It includes：

Acquiring unit 501, for obtaining multiple sample sentence groups for training pattern, wherein each sample sentence group In all include at least two sample sentences, at least two samples sentence all carries sample label, and the sample label is used for Indicate that the semanteme of at least two samples sentence is identical or differs；

Determination unit 502, at least two samples sentence for being obtained for the acquiring unit 501, extraction are more The similarity feature of the similarity feature of each dimension in a dimension, each dimension includes syntactic structure similarity feature, The syntactic structure similarity is characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence；

Model training unit 503, the similarity feature of each dimension for being determined according to the determination unit 502 Model training is carried out, sentence consistency discrimination model is established.

In the embodiment of the present invention, acquiring unit 501 obtains multiple sample sentence groups for training pattern, wherein each All include at least two sample sentences in sample sentence group, at least two samples sentence all carries sample label, the sample This label is used to indicate the semantic identical of at least two samples sentence or differs；Determination unit 502 is directed to the acquisition At least two samples sentence that unit 501 obtains extracts the similarity feature of each dimension in multiple dimensions, described each The similarity feature of dimension includes syntactic structure similarity feature, and the syntactic structure similarity is characterized in by sample sentence Carry out what feature extraction obtained according to syntactic structure；Model training unit 503 determines described every according to the determination unit 502 The similarity feature of a dimension carries out model training, establishes sentence consistency discrimination model.Compared with prior art, the present invention is real The training device for applying the sentence consistency discrimination model of example offer includes syntactic structure spy in the similarity feature of various dimensions Sign, the consistency of sentence is determined by syntactic structure, can improve the accuracy for judging sentence consistency, and then improve artificial intelligence The accuracy of energy question answering system and search engine.

Optionally, provided in an embodiment of the present invention to establish sentence consistency on the basis of above-mentioned Fig. 8 corresponding embodiments In another embodiment of the device 50 of discrimination model,

The determination unit 502 is used for：The multiple dimension includes syntactic structure dimension,

Optionally, on the basis of the above embodiments, provided in an embodiment of the present invention to establish sentence consistency discrimination model Device 50 another embodiment in,

The determination unit 502 is used for：

The device 50 provided in an embodiment of the present invention for establishing sentence consistency discrimination model can be refering to fig. 1 to the parts Fig. 4 It is corresponding description understood, this place is no longer described in detail.

The device 50 provided in an embodiment of the present invention for establishing sentence consistency discrimination model can be that the calculating such as server are set It is standby, with reference to the form of server, the device 50 for establishing sentence consistency discrimination model in the embodiment of the present invention is described.

Fig. 9 is the structural schematic diagram of the device 50 provided in an embodiment of the present invention for establishing sentence consistency discrimination model.Institute It includes processor 510, memory 550 and transceiver 530, memory 550 to state and establish the device 50 of sentence consistency discrimination model It may include read-only memory and random access memory, and operational order and data provided to processor 510.Memory 550 It is a part of can also include nonvolatile RAM (NVRAM).

In some embodiments, memory 550 stores following element, executable modules or data structures, or Their subset of person or their superset:

In embodiments of the present invention, by calling the operational order of the storage of memory 550, (operational order is storable in behaviour Make in system),

Compared with prior art, the training device of sentence consistency discrimination model provided in an embodiment of the present invention, in multidimensional The similarity feature of degree includes syntactic structure feature, and the consistency of sentence is determined by syntactic structure, can improve judgement language The accuracy of sentence consistency, and then improve the accuracy of artificial intelligence question answering system and search engine.

Processor 510 controls the operation for the device 50 for establishing sentence consistency discrimination model, and processor 510 can also be known as CPU (Central Processing Unit, central processing unit).Memory 550 may include read-only memory and deposit at random Access to memory, and provide instruction and data to processor 510.The a part of of memory 550 can also include non-volatile random Access memory (NVRAM).The various components for the device 50 for establishing sentence consistency discrimination model in specific application pass through total Linear system system 520 is coupled, and wherein bus system 520 can also include power bus, control in addition to including data/address bus Bus and status signal bus in addition etc..But for the sake of clear explanation, various buses are all designated as bus system 520 in figure.

The method that the embodiments of the present invention disclose can be applied in processor 510, or be realized by processor 510. Processor 510 may be a kind of IC chip, the processing capacity with signal.During realization, the above method it is each Step can be completed by the integrated logic circuit of the hardware in processor 510 or the instruction of software form.Above-mentioned processing Device 510 can be general processor, digital signal processor (DSP), application-specific integrated circuit (ASIC), ready-made programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware components.May be implemented or Person executes disclosed each method, step and the logic diagram in the embodiment of the present invention.General processor can be microprocessor or Person's processor can also be any conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be straight Connect and be presented as that hardware decoding processor executes completion, or in decoding processor hardware and software module combination executed At.Software module can be located at random access memory, and flash memory, read-only memory, programmable read only memory or electrically-erasable can In the storage medium of this fields such as programmable memory, register maturation.The storage medium is located at memory 550, and processor 510 is read Information in access to memory 550, in conjunction with the step of its hardware completion above method.

Optionally, processor 510 is used for：The multiple dimension includes syntactic structure dimension,

Optionally, processor 510 is used for：

The description of the device 50 to establishing sentence consistency discrimination model can be refering to fig. 1 to the description of the parts Fig. 4 above Understood, it is no longer repeated at this place.

The device of determining sentence consistency provided in an embodiment of the present invention can be by mobile terminal and personal computer, intelligence Robot etc. is equipped with the terminal device of search engine to realize, for determining that the device of sentence consistency is mobile phone, introduces The process of attribute sentence consistency really of the embodiment of the present invention.

As shown in Figure 10, for convenience of description, it illustrates only and the relevant part of the embodiment of the present invention, particular technique details It does not disclose, please refers to present invention method part.

Figure 10 shows the block diagram with the part-structure of mobile terminal 800 provided in an embodiment of the present invention.With reference to figure 10, Mobile terminal includes：Camera 810, memory 820, input unit 830, display unit 840, sensor 850, voicefrequency circuit 860, the components such as WiFi module 870, processor 880 and power supply 890.It will be understood by those skilled in the art that being shown in Figure 10 Mobile terminal structure do not constitute the restriction to mobile terminal, may include than illustrating more or fewer components or group Close certain components or different components arrangement.

Each component parts of mobile terminal is specifically introduced with reference to Figure 10：

Camera 810 can be used for shooting；

Memory 820 can be used for storing software program and module, and processor 880 is stored in memory 820 by operation Software program and module, to execute various function application and the data processing of mobile terminal.Memory 820 can be main Including storing program area and storage data field, wherein storing program area can storage program area, answering needed at least one function With program (such as sound-playing function, image player function etc.) etc.；Storage data field can store the use according to mobile terminal Data (such as audio data, phone directory etc.) created etc..In addition, memory 820 may include high random access storage Device, can also include nonvolatile memory, and a for example, at least disk memory, flush memory device or other volatibility are solid State memory device.

Input unit 830 can be used for receiving the operational order of user, and generate with the user setting of mobile terminal 800 with And the related key signals input of function control.Specifically, input unit 830 may include that touch panel 831 and other inputs are set Standby 832.Touch panel 831, also referred to as touch screen, collect user on it or neighbouring touch operation (such as user use The operation of any suitable object or attachment such as finger, stylus on touch panel 831 or near touch panel 831), and root Corresponding connection mobile terminal is driven according to preset formula.Optionally, touch panel 831 may include touch detection movement eventually End and two parts of touch controller.Wherein, the touch orientation of touch detection mobile terminal detection user, and detect touch operation The signal brought, transmits a signal to touch controller；Touch controller receives touch information from touch detection mobile terminal, And it is converted into contact coordinate, then processor 880 is given, and order that processor 880 is sent can be received and executed.This Outside, the multiple types such as resistance-type, condenser type, infrared ray and surface acoustic wave may be used and realize touch panel 831.In addition to touch-control Panel 831, input unit 830 can also include other input equipments 832.Specifically, other input equipments 832 may include but One be not limited in physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, operating lever etc. Kind is a variety of.

Display unit 840 can be used for display interface.Display unit 840 may include indicator light 841, optionally, may be used Liquid crystal display (Liquid Crystal Display, LCD), Organic Light Emitting Diode (Organic Light-Emitting Diode, OLED) etc. forms carry out configuration instruction lamp 841.Further, touch panel 831 can cover indicator light 841, work as touch surface Plate 831, which detects, sends processor 880 to determine the type of touch event, then on it or after neighbouring touch operation Processor 880 provides corresponding visual output according to the type of touch event on indicator light 841.Although in Fig. 10, touch-control Panel 831 and indicator light 841 are to realize input and the input function of mobile terminal as two independent components, but at certain In a little embodiments, can be integrated by touch panel 831 and indicator light 841 and that realizes mobile terminal output and input function.

Mobile terminal 800 may also include at least one sensor 850.

Voicefrequency circuit 860, loud speaker 861, microphone 862 can provide the audio interface between user and mobile terminal.Sound The transformed electric signal of the audio data received can be transferred to loud speaker 861, is converted to by loud speaker 861 by frequency circuit 860 Voice signal exports；On the other hand, the voice signal of collection is converted to electric signal by microphone 862, is received by voicefrequency circuit 860 After be converted to audio data, it is such as another to be sent to through camera 810 then by after the processing of audio data output processor 880 Mobile terminal, or audio data is exported to memory 820 to be further processed.

WiFi module 870 can be used for communicating.

Processor 880 is the control centre of mobile terminal, utilizes each of various interfaces and the entire mobile terminal of connection A part by running or execute the software program and/or module that are stored in memory 820, and calls and is stored in storage Data in device 820 execute the various functions and processing data of mobile terminal, to carry out integral monitoring to mobile terminal.It can Choosing, processor 880 may include one or more processing units；Preferably, processor 880 can integrate application processor and modulation Demodulation processor, wherein the main processing operation system of application processor, user interface and application program etc., modulation /demodulation processing Device mainly handles wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 880.

Mobile terminal 800 further includes the power supply 890 (such as battery) powered to all parts, it is preferred that power supply can lead to Cross power-supply management system and processor 880 be logically contiguous, to by power-supply management system realize management charging, electric discharge and The functions such as power managed.

Although being not shown, mobile terminal 800 can also include radio frequency (Radio Frequency, RF) circuit, bluetooth module Deng details are not described herein.

In embodiments of the present invention, the mobile terminal processor 880 included when determining sentence consistency also has Following functions：

Optionally, the multiple dimension includes syntactic structure dimension, syntactic structure dimension in the multiple dimensions of extraction Similarity feature, may include：

Optionally, the interdependent syntax tree according to each sentence at least two sentence determines each sentence Syntax tuple, may include：

Optionally, described when the consistency probability is more than probability threshold value, determine at least two sentence it is consistent it Afterwards, the method can also include：

Mobile phone provided in an embodiment of the present invention can be refering to fig. 1 to the parts Fig. 6 associated description understood that this place is no longer Repetition repeats.

In the above-described embodiments, can come wholly or partly by software, hardware, firmware or its arbitrary combination real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.

The computer program product includes one or more computer instructions.Load and execute on computers the meter When calculation machine program instruction, entirely or partly generate according to the flow or function described in the embodiment of the present invention.The computer can To be all-purpose computer, special purpose computer, computer network or other programmable devices.The computer instruction can be deposited Storage in a computer-readable storage medium, or from a computer readable storage medium to another computer readable storage medium Transmission, for example, the computer instruction can pass through wired (example from a web-site, computer, server or data center Such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave) mode to another website Website, computer, server or data center are transmitted.The computer readable storage medium, which can be computer, to be deposited Any usable medium of storage is either set comprising data storages such as one or more usable mediums integrated server, data centers It is standby.The usable medium can be magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or partly lead Body medium (such as solid state disk Solid State Disk (SSD)) etc..

One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include：ROM, RAM, disk or CD etc..

It is provided for the embodiments of the invention the method for determining sentence consistency above, establishes sentence consistency discrimination model Method and related device be described in detail, specific case used herein is to the principle of the present invention and embodiment It is expounded, the explanation of above example is only intended to facilitate the understanding of the method and its core concept of the invention；Meanwhile for Those of ordinary skill in the art have change in specific embodiments and applications according to the thought of the present invention Place, in conclusion the content of the present specification should not be construed as limiting the invention.

Claims

1. a kind of method of determining sentence consistency, which is characterized in that including：

The consistency probability of at least two sentence, the sentence consistency discrimination are determined by sentence consistency discrimination model The training characteristics of model include syntactic structure similarity feature, and the syntactic structure similarity is characterized in by sample sentence Carry out what feature extraction obtained according to syntactic structure；

2. according to the method described in claim 1, it is characterized in that, it is described by sentence consistency discrimination model determine described in extremely The consistency probability of few two sentences, including：

3. described according to the method described in claim 2, it is characterized in that, the multiple dimension includes syntactic structure dimension The similarity feature of syntactic structure dimension in multiple dimensions is extracted, including：

Interdependent syntactic analysis is carried out at least two sentence, to respectively obtain each sentence at least two sentence Interdependent syntax tree；

The syntax tuple for comparing each sentence determines the syntactic structure similarity feature of the syntactic structure dimension.

4. according to the method described in claim 3, it is characterized in that, described according to each sentence at least two sentence Interdependent syntax tree determines the syntax tuple of each sentence, including：

Determine that core predicate, subject-predicate relationship and the dynamic guest of each sentence are closed according to the interdependent syntax tree of each sentence System, using the core predicate of each sentence, subject-predicate relationship and dynamic guest's relationship as the group of the syntax tuple of each sentence At element.

5. according to any methods of claim 1-4, which is characterized in that described when the consistency probability is more than probability threshold When value, after determining that at least two sentence is consistent, the method further includes：

6. a kind of method for establishing sentence consistency discrimination model, which is characterized in that including

Obtain multiple sample sentence groups for training pattern, wherein each sample sentence group includes at least two samples Sentence, at least two samples sentence all carry sample label, and the sample label is used to indicate at least two sample The semanteme of sentence is identical or differs；

For at least two samples sentence, the similarity feature of each dimension in multiple dimensions, each dimension are extracted Similarity feature include syntactic structure similarity feature, the syntactic structure similarity be characterized in by sample sentence according to Syntactic structure carries out what feature extraction obtained；

7. described according to the method described in claim 6, it is characterized in that, the multiple dimension includes syntactic structure dimension The similarity feature of syntactic structure dimension in multiple dimensions is extracted, including：

Interdependent syntactic analysis is carried out at least two samples sentence, it is every in at least two samples sentence to respectively obtain The interdependent syntax tree of a sample sentence；

Each sample sentence is determined according to the interdependent syntax tree of each sample sentence in at least two samples sentence Syntax tuple；

The syntax tuple for comparing each sample sentence determines that the syntactic structure similarity of the syntactic structure dimension is special Sign.

8. the method according to the description of claim 7 is characterized in that described according to each sample in at least two samples sentence The interdependent syntax tree of this sentence determines the syntax tuple of each sample sentence, including：

According to the interdependent syntax tree of each sample sentence determine the core predicate of each sample sentence, subject-predicate relationship and Dynamic guest's relationship, using the core predicate of each sample sentence, subject-predicate relationship and dynamic guest's relationship as each sample sentence Syntax tuple component.

9. a kind of device of determining sentence consistency, which is characterized in that including：

First determination unit, for determining described at least two of the receiving unit reception by sentence consistency discrimination model The training characteristics of the consistency probability of sentence, the sentence consistency discrimination model include syntactic structure similarity feature, institute Syntactic structure similarity is stated to be characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence；

Second determination unit is used for when the consistency probability that first determination unit determines is more than probability threshold value, really Fixed at least two sentence is consistent.

10. device according to claim 9, which is characterized in that

First determination unit is used for：

11. device according to claim 10, which is characterized in that

First determination unit is used for：The multiple dimension includes syntactic structure dimension,

12. according to the devices described in claim 11, which is characterized in that

First determination unit is used for：

13. according to any devices of claim 9-12, which is characterized in that described device further includes：

Optimize unit, after determining that at least two sentence is consistent in second determination unit, according to it is described at least Two sentences optimize the sentence consistency discrimination model.

14. a kind of device for establishing sentence consistency discrimination model, which is characterized in that including：

Acquiring unit, for obtaining multiple sample sentence groups for training pattern, wherein each sample sentence group includes At least two sample sentences, at least two samples sentence all carry sample label, and the sample label is used to indicate described The semanteme of at least two sample sentences is identical or differs；

Determination unit, at least two samples sentence for being obtained for the acquiring unit extract every in multiple dimensions The similarity feature of the similarity feature of a dimension, each dimension includes syntactic structure similarity feature, the syntax knot Structure similarity is characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence；

The similarity feature of model training unit, each dimension for being determined according to the determination unit carries out model instruction Practice, establishes sentence consistency discrimination model.

15. device according to claim 14, which is characterized in that

The determination unit is used for：The multiple dimension includes syntactic structure dimension,

16. device according to claim 15, which is characterized in that

The determination unit is used for：