CN108536665A - A kind of method and device of determining sentence consistency - Google Patents
A kind of method and device of determining sentence consistency Download PDFInfo
- Publication number
- CN108536665A CN108536665A CN201710121244.9A CN201710121244A CN108536665A CN 108536665 A CN108536665 A CN 108536665A CN 201710121244 A CN201710121244 A CN 201710121244A CN 108536665 A CN108536665 A CN 108536665A
- Authority
- CN
- China
- Prior art keywords
- sentence
- sample
- consistency
- syntactic structure
- dimension
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of methods of determining sentence consistency, including:The sentence group of consistency to be determined is received, sentence group includes at least two sentences;The consistency probability of at least two sentences is determined by sentence consistency discrimination model, the training characteristics of sentence consistency discrimination model include syntactic structure similarity feature, and syntactic structure similarity is characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence;When consistency probability is more than probability threshold value, determine that at least two sentences are consistent.The embodiment of the present invention also provides corresponding device.Technical solution of the present invention is due to including syntactic structure similarity feature in the similarity feature of various dimensions, the consistency of sentence is determined by the similarity feature of multiple dimensions comprising syntactic structure similarity feature, the accuracy for judging sentence consistency can be improved, and then improves the accuracy of artificial intelligence question answering system and search engine.
Description
Technical field
The present invention relates to field of computer technology, and in particular to a kind of method and device of determining sentence consistency is established
The method and device of sentence consistency discrimination model.
Background technology
With the development of artificial intelligence, there is the robot that can be answered a question, as very famous on network:" virtuous two " and
Still, the robot exactly developed by the method for artificial intelligence, can the person of answering the question the problem of.The principle of search engine is real
It is also that problem input by user is identified on border, corresponding answer is then searched for according to recognition result.
Either robot or search engine are all based on sentence identification model and first identify the problems in question sentence, so
The answer for the problem is determined according to the problem of identifying, and then feed back to quizmaster's again afterwards.
What the similarity that sentence identification model in the prior art is typically based on sentence was trained, as long as sentence
The similar sentence that usually can be all identified as equivalent of keyword in son, but the keyword of some possible question sentences in practice
It is all identical, but the semanteme of question sentence is different, and sentence identification model is identified as being same sentence, and it is wrong to may result in identification
Accidentally, to the recognition result of output error, such as " whose father Yao Ming is " and " whom father of Yao Ming is ", if statement identification
Both of these problems are considered as same, one of result of return exactly mistake by model.Therefore, how rapidly and accurately to differentiate and ask
The consistency of sentence is most important for the identification accuracy for promoting artificial intelligence question answering system and search engine.
Invention content
In order to promote the accuracy of artificial intelligence question answering system and search engine, the embodiment of the present invention provides a kind of determining language
The method and device of sentence consistency, the method and device for establishing sentence consistency discrimination model.
First aspect present invention provides a kind of method of determining sentence consistency, including:
The sentence group of consistency to be determined is received, the sentence group includes at least two sentences;
The consistency probability of at least two sentence, the sentence consistency are determined by sentence consistency discrimination model
The training characteristics of discrimination model include syntactic structure similarity feature, and the syntactic structure similarity is characterized in by sample
Sentence carries out what feature extraction obtained according to syntactic structure;
When the consistency probability is more than probability threshold value, determine that at least two sentence is consistent.
Second aspect of the present invention provides a kind of method for establishing sentence consistency discrimination model, including
Obtain multiple sample sentence groups for training pattern, wherein each sample sentence group includes at least two
Sample sentence, at least two samples sentence all carry sample label, and the sample label is used to indicate described at least two
The semanteme of sample sentence is identical or differs;
For at least two samples sentence, the similarity feature of each dimension in multiple dimensions is extracted, it is described each
The similarity feature of dimension includes syntactic structure similarity feature, and the syntactic structure similarity is characterized in by sample sentence
Carry out what feature extraction obtained according to syntactic structure;
Model training is carried out according to the similarity feature of each dimension, establishes sentence consistency discrimination model.
Third aspect present invention provides a kind of device of determining sentence consistency, including:
Receiving unit, the sentence group for receiving consistency to be determined, the sentence group include at least two sentences;
First determination unit, for being determined described in the receiving unit reception at least by sentence consistency discrimination model
The training characteristics of the consistency probability of two sentences, the sentence consistency discrimination model include that syntactic structure similarity is special
Sign, the syntactic structure similarity are characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence;
Second determination unit, the consistency probability for being determined when first determination unit are more than probability threshold value
When, determine that at least two sentence is consistent.
Fourth aspect present invention provides a kind of device for establishing sentence consistency discrimination model, including:
Acquiring unit, for obtaining multiple sample sentence groups for training pattern, wherein in each sample sentence group
Including at least two sample sentences, at least two samples sentence all carries sample label, and the sample label is used to indicate
The semanteme of at least two samples sentence is identical or differs;
Determination unit, at least two samples sentence for being obtained for the acquiring unit, extracts multiple dimensions
In each dimension similarity feature, the similarity feature of each dimension includes syntactic structure similarity feature, the sentence
Method structural similarity is characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence;
The similarity feature of model training unit, each dimension for being determined according to the determination unit carries out mould
Sentence consistency discrimination model is established in type training.
It is close with some sentences are unable to judge accurately in the prior art, but the different sentence of syntax is compared, and the present invention is implemented
The method that example provides attribute sentence consistency really, includes syntactic structure similarity feature in the similarity feature of various dimensions, is led to
The similarity feature for crossing multiple dimensions comprising syntactic structure similarity feature determines the consistency of sentence, can improve judgement language
The accuracy of sentence consistency, and then improve the accuracy of artificial intelligence question answering system and search engine.
Description of the drawings
Fig. 1 is an embodiment schematic diagram of the method that sentence consistency discrimination model is established in the embodiment of the present invention;
Fig. 2 is the process schematic of a similarity calculation in the embodiment of the present invention;
Fig. 3 is an example schematic diagram of interdependent syntactic analysis in the embodiment of the present invention;
Fig. 4 is another example schematic diagram of interdependent syntactic analysis in the embodiment of the present invention;
Fig. 5 is an embodiment schematic diagram of the method that sentence consistency is determined in the embodiment of the present invention;
Fig. 6 is an embodiment schematic diagram of the device that sentence consistency is determined in the embodiment of the present invention;
Fig. 7 is another embodiment schematic diagram for the device that sentence consistency is determined in the embodiment of the present invention;
Fig. 8 is an embodiment schematic diagram of the device that sentence consistency discrimination model is established in the embodiment of the present invention;
Fig. 9 is another embodiment schematic diagram for the device that sentence consistency discrimination model is established in the embodiment of the present invention;
Figure 10 is another embodiment schematic diagram for the device that sentence consistency is determined in the embodiment of the present invention.
Specific implementation mode
Below in conjunction with the accompanying drawings, embodiments herein is described, it is clear that described embodiment is only the application
The embodiment of a part, instead of all the embodiments.Those of ordinary skill in the art it is found that with artificial intelligence development and
The appearance of new opplication scene, technical solution provided by the embodiments of the present application are equally applicable for similar technical problem.
The embodiment of the present invention provides a kind of method of determining sentence consistency, is wrapped in the multiple similarity features considered
Syntactic structure feature has been included, the consistency of sentence is determined by syntactic structure, the accuracy for judging sentence consistency can be improved,
And then improve the accuracy of artificial intelligence question answering system and search engine.The embodiment of the present invention also provides corresponding determining sentence one
The device of cause property, the method and device for establishing sentence consistency discrimination model.It is described in detail separately below.
Sentence is similar for the problem that being unable to judge accurately, but the sentence consistency that syntax is different, and the embodiment of the present invention can
To establish a kind of sentence consistency discrimination model, which can accurately sentence the consistency of sentence
It is disconnected.The method that the sentence consistency discrimination model is established in the embodiment of the present invention is introduced with reference to Fig. 1.
As shown in Figure 1, an embodiment packet of the method provided in an embodiment of the present invention for establishing sentence consistency discrimination model
It includes:
101, multiple sample sentence groups for training pattern are obtained, wherein each sample sentence group includes at least
Two sample sentences, at least two samples sentence all carry sample label, the sample label be used to indicate it is described at least
The semanteme of two sample sentences is identical or differs.
It may include two sample question sentences in sample sentence group in the embodiment of the present invention, can also be asked including multiple samples
, it is illustrated for including two sample question sentences in the embodiment of the present invention.
Sample label is used to identify semantic whether identical, the example of at least two sample sentences in same sample sentence group
Such as:It can be differed, identified with 1 identical to identify with 0.
Such as:If in a sample sentence group two included sample sentences be respectively " whose father Yao Ming is " and
" whom the father of Yao Ming is ", then the sample label of the sample sentence group is 0, if two samples included in a sample sentence group
This sentence is respectively " how being formatted to USB flash disk " and " how being formatted to USB flash disk ", then the sample label of the sample sentence group is 1.
102, it is directed at least two samples sentence, extracts the similarity feature of each dimension in multiple dimensions, it is described
The similarity feature of each dimension includes syntactic structure similarity feature, and the syntactic structure similarity is characterized in by sample
Sentence carries out what feature extraction obtained according to syntactic structure.
Various dimensions can be multiple dimensions of semanteme, structure, statistics etc. in the embodiment of the present invention.
The similarity feature of each dimension may include String kernel (String Kernel) spy in the embodiment of the present invention
Sign, Hownet features, TF-IDF features, Term Weight features, Rank features, syntactic structure feature and generic features etc..
The explanation of above feature classification and corresponding each category feature can be understood refering to table 1.
Table 1:Mark sheet
103, model training is carried out according to the similarity feature of each dimension, establishes sentence consistency discrimination model.
Similarity calculation principle for String Kernel features can be passed through according to String Kernel algorithms
Feature in the structure of character string carrys out the similarity between calculating character string.The algorithm is input, sample language with sample sentence group
The similarity of sample sentence is output in sentence group, and flow chart is illustrated in fig. 2 shown below.
201, the sample sentence in sample sentence group is pre-processed.
202, String kernel function calculating is carried out to sample sentence.
String Kernel calculate the similarity numerical value between pretreated sample sentence, are mainly based upon character string
The method of kernel function.Given character string, that is, sample sentence are split into substring set first, the length of substring can lead to
Parameter regulation is crossed, the similarity between substring set is then calculated by kernel function.
203, similarity score regularization is carried out to the similarity of calculated sample sentence.
The similarity in sample sentence group between sample sentence is obtained by linear combining.
204, the similarity numerical value of sample sentence in sample sentence group is obtained.
It is the description to the similarity calculation principle of String Kernel features above, Hownet feature phases is described below
Like the training of degree.
Hownet, that is, Hownet is a Chinese semantic knowledge dictionary in detail, and in Hownet, each concept is by multiple justice
Original composition, concept is not the node in this hierarchical system, and adopted original is only.Moreover, each justice is former simultaneously in the concept of word
It is not equality, has complicated relationship between them, indicated by a kind of special Knowledge Description Language.
It is made of the conceptual description of notional word the description formula of following three kinds of forms:
(1) the former description formula of independent justice:It is indicated by " basic meaning is former " or " (specific word) ";
(2) the former description formula of relationship justice:By " relationship justice original=basic meaning former " or " relationship justice former=(specific word) " or
" (former=specific word of relationship justice) " indicates, wherein relationship justice original refer to comprising " EventRole | dynamic character " and
The justice of " EventFeatures | dynamic attribute " this two class is former;
(3) Signifier original describes formula:It is indicated by " relational symbol basic meaning is former " or " relational symbol (specific word) ", wherein
Relational symbol include " #, %, $, * ,+, & ,@,、!", the relationship respectively represented repeats no more.
For two word W1And W2If W1There is n concept [S11, S12..., S1n], W2There is m concept [S21, S22...,
S2m], W1And W2Similarity Sim (W1, W2) be each concept similarity maximum value, as shown in formula (1):
Because all concepts are all finally attributed to be indicated originally with justice, the former similarity calculation of justice is that concept is similar
The basis of degree.It, can be simply by language since all adopted primitive roots constitute the former hierarchical system of tree-shaped justice according to hyponymy
Justice distance calculates similarity.Adopted former semantic distance is shown in as shown in formula (2).
Wherein (d+a) and indicate that two justice are former, d is and the path length in adopted former hierarchical system.α is one adjustable
The parameter of section can set the value of α in the Similarity of Words computational methods based on Hownet, such as:α=0.5.
The semantic formula of notional word concept is divided into four parts:
(1) the first independent former description formula of justice:Use Sim1(S1, S2) this part similarity for indicating two concepts, particularly as being
The former similarity of two justice, can be calculated according to formula (3);
(2) other independent former description formulas of justice:Every other independent justice in semantic formula in addition to the first independent justice is former
Original uses Sim2(S1, S2) indicate this part similarities of two concepts, it is specific to calculate that steps are as follows:
I, first all independent justice predecessor meanings two expression formulas are matched, and the former similarity of justice two-by-two is calculated;
The maximum a pair of II, value is classified as one group;
III executes the i-th i steps to the remaining independent former similarity two-by-two of justice.Repeatedly, grouping is all completed until all.
Any former similarity with null value of justice is defined as constant δ, such as:δ=0.2;
IV is finally averaged.
(3) the former description formula of relationship justice:All relationships justice is former in semantic formula, uses Sim3(S1, S2) indicate two concepts
This part similarity, the former identical description formula of relationship justice is divided into one group, similarity is calculated, finally averages.
(4) Signifier original describes formula:All Signifiers are former in semantic formula, use Sim4(S1, S2) indicate two concepts
This part similarity, be divided into one group relational symbol is identical, calculate similarity, finally average.
In conclusion shown in the similarity calculating method of two words such as formula (3).
Have the semantic similarity between word and word, so that it may since calculate sentence between semantic similarity.We calculate sentence
The method of semantic similarity is between son:If 2 sentences A and B, the word that A includes is A1, A2..., Am, the word that B includes is B1,
B2..., Bn, then word Ai(1im) and BjSimilarity between (1jn) can use s (Ai,Bj) indicate, thus obtain two sentences
The similarity of middle any two word, the semantic similarity s (A, B) between sentence A and sentence B is such as shown in formula (4).
Wherein, ai=max (s (Ai,B1),,s(Ai,B2),…,s(Ai,Bn)), bj=max (s (Bj,A1),s(Bj,
A2),,…,s(Bj,Am))。
It is that the similarity to TF-IDF features is described below to the description of the similarity calculation principle of Hownet features above
Computing Principle.
The method that TF-IDF calculates similarity is a kind of method of vector space model (VSM) text similarity amount.It is counting
When calculation, weight is assigned to each lexical item using TF-IDF.The meeting when calculating the weight of lexical item in vector using TF-IDF methods
It is related to two concepts:
A) TF- word frequency.The frequency that some lexical item occurs in a text thinks that the word frequency of some word is got under normal conditions
Greatly, it is more related to the theme of text.
B) IDF- is against text frequency.The frequency that some lexical item occurs in more texts of text collection is higher, the lexical item
Separating capacity it is poorer.Such as in a text collection for containing 100, if some lexical item A goes out in 50 texts
It is existing, and another lexical item B only occurs in 5 texts, then lexical item B ratios A has better separating capacity.
The TF-IDF values that each lexical item is calculated using above-mentioned concept, as shown in formula (5).
TF-IDF(ωi)=tf (ωi)×idf(ωi)=tf (ωi)×log(N/df(ωi)) formula (5)
Wherein, TF-IDF (ωi) indicate current lexical item ωiTF-IDF values, the value be equal to lexical item ωiWord frequency tf (ωi)
With inverse text frequency idf (ωi) product, specifically, any lexical item ω in text jiTF-IDF values can pass through tf (ωi)
With log (N/df (ωi)) be calculated;tf(ωi) indicate current lexical item ωiThe frequency occurred in text j;N indicates text set
The sum of all texts in conjunction;df(ωi) indicate current lexical item ω occurs in how many text in text collectioni.Herein,
Each question sentence is obtained into sample sentence as a text by carrying out above-mentioned analysis to each lexical item in sample sentence group
Then the TF-IDF values of each lexical item in group are to establish a vector space mould per sample sentence using these TF-IDF values
The similarity between two sentences is calculated by cosine in type.
It is the description to the similarity calculation principle of TF-IDF features above, is described below to Term Weight features
Similarity calculation principle.
The method for calculating similarity using Term Weight is also based on vector space model (VSM) text similarity
A kind of method of amount.Its with TF-IDF calculate similarity method the difference is that give the entitled method of lexical item.It utilizes
Search engine assigns power by the coincidence factor of search result for the lexical item in sentence.The used specific method of power is assigned for lexical item
It is as follows:
1) entire sentence is put into and is retrieved in search engine, record preceding 20 retrieval results.
2) remove a word, then sentence is put into search engine and is retrieved, record preceding 20 retrieval results.
3) percentage that secondary retrieval result accounts for the retrieval result of first time is calculated, is then subtracted with 1, obtained numerical value
It is regarded as importance scores of this word in sentence.The score of word is bigger, illustrates more important, and weight is bigger.
The weight of each lexical item in a sentence can be obtained by this method.But, it is contemplated that if to each language
Sentence will be put into search engine and retrieve repeatedly, and time loss is bigger, so we are using the SVM-RANK in machine learning
Algorithm is reached automatically by learning to the entitled purpose of lexical item in sentence.
It for sentence, first has to pre-process, pretreatment includes participle, part-of-speech tagging, according to syntactic analysis etc., to obtain
Syntactical relationship between the part of speech feature and word of word itself.Following spy is had chosen for each lexical item in sentence
Sign:
1)NOUN:Whether the word is noun.
2)S&C:Whether the word is subject or object.
3)TermFreq:Word frequency, the frequency that word occurs in entire document.
4)DocFreq:There is the number of the document of the word in entire document in document frequency.
In this way, the weight that each lexical item in a sentence can be obtained, then with TF-IDF similarity calculations
Method is the same, establishes vector space model for each text (question sentence), is calculated by cosine similar between two sentence pairs
Degree.
It is that the phase to Rank features is described below to the description of the similarity calculation principle of Term Weight features above
Like degree Computing Principle.
In Algorithms for Page Ranking, if a webpage, by a lot of other web page interlinkages, this webpage is to come relatively
Say important webpage.According to this thought, from the dependency tree of a sentence, by the dependence of each lexical item,
Measurement is made that the weight of each word.Such as:For sentence " I has bought an iPhone " in this sentence, word
The interdependent chain in-degree of " buying " is 2, and the interdependent chain in-degree of " mobile phone " is also 2, other lexical items are 1.Then as can be seen that in this sentence
In, " buying " and " mobile phone " is comparatively important word, therefore the weight obtained using this method, also can be from certain
Aspect reacts importance of the lexical item in sentence.We utilize such method, and power is assigned to the word in a sentence by in-degree.
For word w, shown in weight such as formula (6).
Weight (w)=In (w)/Norm formula (6)
Wherein, In (w) indicates the interdependent chain in-degree of word w.Norm be this sentence in all words in-degree and.
It calculates in sentence after the weight of each word, is that question sentence establishes vector space model using weight, then passes through
The similarity between two sentence pairs is calculated in cosine.
For Hownet features, TF-IDF features, Term Weight features, the similarity of Rank features and generic features
Computational methods can the computational methods ripe according to above-mentioned principle and these types of feature carry out similarity calculation, this hair
The similarity calculation of syntactic structure feature is primarily directed in bright embodiment, below in conjunction with the accompanying drawings to being directed to syntactic structure feature
The calculating process of similarity illustrate.
Syntactic structure is characterized in the feature extracted from syntactic structure.Which includes core predicate HED, subject-predicate relationships
The relationships such as SBV, dynamic guest's relationship VOB, attribute, the adverbial modifier and complement.
Involved syntactic relation can be understood refering to table 2 on syntactic structure.
Table 2:Syntactic type table
Relationship type | It identifies (tag) | Explanation |
Subject-predicate relationship | SBV | Subject-verb |
Dynamic guest's relationship | VOB | Direct object, verb-object |
Between guest's relationship | IOB | Indirect object, indirect-object |
Preposition object | FOB | fronting-object |
And language | DBL | double |
Relationship in fixed | ATT | attribute |
Verbal endocentric phrase | ADV | adverbial |
Structure of complementation | CMP | complement |
Parallel construction | COO | coordinate |
Guest's Jie relationship | POB | preposition-object |
Left additional relationships | LAD | left adjunct |
Right additional relationships | RAD | right adjunct |
Absolute construction | IS | independent structure |
Core predicate | HED | head |
When from the similarity for calculating sample sentence in sample sentence group in terms of syntax structure feature, in the multiple dimension
Including syntactic structure dimension, the similarity feature of syntactic structure dimension in the multiple dimensions of extraction may include:
Interdependent syntactic analysis is carried out at least two samples sentence, to respectively obtain at least two samples sentence
In each sample sentence interdependent syntax tree;
Each sample language is determined according to the interdependent syntax tree of each sample sentence in at least two samples sentence
The syntax tuple of sentence;
The syntax tuple for comparing each sample sentence determines that the syntactic structure of the syntactic structure dimension is similar
Spend feature.
Wherein, the interdependent syntax tree according to each sample sentence in at least two samples sentence determines described every
The syntax tuple of a sample sentence may include:
Determine that the core predicate of each sample sentence, subject-predicate close according to the interdependent syntax tree of each sample sentence
System and dynamic guest's relationship, using the core predicate of each sample sentence, subject-predicate relationship and dynamic guest's relationship as each sample
The component of the syntax tuple of sentence.
In the embodiment of the present invention, as shown in figure 3, if two sample sentences in sample sentence group are respectively " Xie Tingfeng
It is whose son " and " whom the son of Xie Tingfeng is ", sample label indicates that the two sample sentences are different, during training,
The similarity feature for needing first to extract each dimension for example enumerated above, by taking syntactic structure dimension as an example, to sample sentence
" whose son Xie Tingfeng is " carries out interdependent syntactic analysis, to obtain interdependent syntax tree as shown in Figure 3, to sample sentence " Xie Ting
Whom the son of cutting edge of a knife or a sword is " interdependent syntactic analysis is carried out, to obtain interdependent syntax tree as shown in Figure 4.
As shown in figure 3, core predicate HED is "Yes", subject-predicate relationship SBV is " Xie Tingfeng is ", and dynamic guest's relationship VOB is " to be
Who ", in the embodiment of the present invention, is analyzed by taking tri- elements of HED, SBV and VOB as an example, then can determine the sample from Fig. 3
" Xie Tingfeng be, who " the syntax tuple of sentence be, which is referred to as triple.
As shown in figure 4, core predicate HED is "Yes", subject-predicate relationship SBV is " son is ", and dynamic guest's relationship VOB is " to be
Who ", in the embodiment of the present invention, is analyzed by taking tri- elements of HED, SBV and VOB as an example, then can determine the sample from Fig. 3
" son be, who " the syntax tuple of sentence be, which is referred to as triple.
Because the sample label of the sample sentence group is sentence difference, then two can be compared out from the example of Fig. 3 and Fig. 4
Triple is different.It can then determine that the similarity of the syntactic structure dimension is characterized as 0.The similarity knot of the syntactic structure dimension
Fruit feature 0, in conjunction with the similarity feature of other each dimensions, so that it may to obtain the similarity characteristic set of a various dimensions,
The similarity characteristic set is used into logistic regression algorithm (Logistic Regression, LR), support vector machines
The machine learning algorithms such as (Support Vector Machine, SVM), maximum entropy, Bayes carry out model training, so that it may with
To sentence consistency discrimination model.
After the completion of sentence consistency discrimination model training, need to be surveyed by the sample sentence group with sample label
Examination, test can just be used after passing through.
Sentence consistency model in the embodiment of the present invention can be used for intelligent robot question answering system and data retrieval system
In system.
The training method of sentence consistency discrimination model provided in an embodiment of the present invention, introduces during model training
Syntactic structure feature, it is similar for sentence, but the differentiation of the different sentence of syntax can improve differentiation accuracy rate, to improve
The accuracy of question answering system.
The embodiment of the present invention additionally provides the method based on sentence consistency discrimination model attribute sentence consistency really.
Refering to Fig. 5, the method for determining sentence consistency provided in an embodiment of the present invention includes:
301, the sentence group of consistency to be determined is received, the sentence group includes at least two sentences.
302, the consistency probability of at least two sentence, the sentence one are determined by sentence consistency discrimination model
The training characteristics of cause property discrimination model include syntactic structure similarity feature, and the syntactic structure similarity is characterized in by right
Sample sentence carries out what feature extraction obtained according to syntactic structure.
The training process of sentence consistency discrimination model in the embodiment of the present invention can refering in previous embodiment to mould
The embodiment of type training is understood.
303, when the consistency probability is more than probability threshold value, determine that at least two sentence is consistent.
Such as:If probability threshold value is 80%, the similarity of at least two sentences is 90%, then can determine at least two languages
Sentence is consistent.
It is close with some sentences are unable to judge accurately in the prior art, but the different sentence of syntax is compared, and the present invention is implemented
The method that example provides attribute sentence consistency really, includes syntactic structure similarity feature in the similarity feature of various dimensions, is led to
The similarity feature for crossing multiple dimensions comprising syntactic structure similarity feature determines the consistency of sentence, can improve judgement language
The accuracy of sentence consistency, and then improve the accuracy of artificial intelligence question answering system and search engine.
Optionally, the consistency probability that at least two sentence is determined by sentence consistency discrimination model, can
To include:
For at least two sentence, the similarity feature of each dimension in multiple dimensions is extracted;
According to the similarity feature of each dimension, the consistency probability of at least two sentence is determined.
Optionally, syntactic structure dimension in the multiple dimension, the multiple dimension includes syntactic structure dimension, described
The similarity feature for extracting syntactic structure dimension in multiple dimensions may include:
Interdependent syntactic analysis is carried out at least two sentence, to respectively obtain each language at least two sentence
The interdependent syntax tree of sentence;
The syntax tuple of each sentence is determined according to the interdependent syntax tree of each sentence at least two sentence;
The syntax tuple for comparing each sentence determines that the syntactic structure similarity of the syntactic structure dimension is special
Sign.
In the embodiment of the present invention, interdependent syntax tree can be understood refering to the example of Fig. 3 and Fig. 4, such as:If to be determined
There are two the sentences of consistency, and one of sentence is " whose son Xie Tingfeng is ", another sentence is that " thanking to thunderbolt cutting edge of a knife or a sword son is
Who ", it can determine the similarity feature of two sentences in syntactic structure dimension for example, by the analytic process of Fig. 3 and Fig. 4
It is 0.If one of sentence is " whose son Xie Tingfeng is ", another sentence is " whom the father of Xie Tingfeng is ", passes through language
Sentence consistency discrimination model can determine that similarity of two sentences in syntactic structure dimension is characterized as 1, if in conjunction with it
The similarity feature of his dimension determines that the similarity probability of the two sentences is more than probability threshold value, it is determined that the two is consistent.
Optionally, the interdependent syntax tree according to each sentence at least two sentence determines each sentence
Syntax tuple, including:
Core predicate, subject-predicate relationship and the dynamic guest of each sentence are determined according to the interdependent syntax tree of each sentence
Relationship, using the core predicate of each sentence, subject-predicate relationship and dynamic guest's relationship as the syntax tuple of each sentence
Component.
Optionally, described when the consistency probability is more than probability threshold value, determine at least two sentence it is consistent it
Afterwards, the method further includes:
Optimize the sentence consistency discrimination model according at least two sentence.
In the embodiment of the present invention, sentence consistency discrimination model in use, can also be constantly consistent to sentence
Property discrimination model optimizes, to improve the accuracy of sentence consistency discrimination model.
Refering to Fig. 6, an embodiment of the device 40 of determining sentence consistency provided in an embodiment of the present invention includes:
Receiving unit 401, the sentence group for receiving consistency to be determined, the sentence group include at least two languages
Sentence;
First determination unit 402, the institute for determining the reception of the receiving unit 401 by sentence consistency discrimination model
The consistency probability of at least two sentences is stated, the training characteristics of the sentence consistency discrimination model include that syntactic structure is similar
Feature is spent, the syntactic structure similarity is characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence;
Second determination unit 403, the consistency probability for being determined when first determination unit are more than probability threshold
When value, determine that at least two sentence is consistent.
In the embodiment of the present invention, receiving unit 401 receives the sentence group of consistency to be determined, the sentence group include to
Few two sentences;First determination unit 402 determines that the receiving unit 401 receives for passing through sentence consistency discrimination model
At least two sentence consistency probability, the training characteristics of the sentence consistency discrimination model include syntactic structure
Similarity feature, the syntactic structure similarity are characterized in obtaining by carrying out feature extraction according to syntactic structure to sample sentence
's;Second determination unit 403 determines institute when the consistency probability that first determination unit determines is more than probability threshold value
It is consistent to state at least two sentences.With accurate can not judge that some sentences are close in the prior art, but the different sentence of syntax is compared,
Inventive embodiments provide the device of attribute sentence consistency really, include syntactic structure similarity in the similarity feature of various dimensions
Feature determines the consistency of sentence, Ke Yiti by the similarity feature of multiple dimensions comprising syntactic structure similarity feature
Height judges the accuracy of sentence consistency, and then improves the accuracy of artificial intelligence question answering system and search engine.
Optionally, on the basis of the above embodiments, the device 40 of determining sentence consistency provided in an embodiment of the present invention
Another embodiment in,
First determination unit 402 is used for:
For at least two sentence, the similarity feature of each dimension in multiple dimensions is extracted;
According to the similarity feature of each dimension, the consistency probability of at least two sentence is determined.
Optionally, on the basis of the above embodiments, the device 40 of determining sentence consistency provided in an embodiment of the present invention
Another embodiment in,
First determination unit 402 is used for:The multiple dimension includes syntactic structure dimension,
Interdependent syntactic analysis is carried out at least two sentence, to respectively obtain each language at least two sentence
The interdependent syntax tree of sentence;
The syntax tuple of each sentence is determined according to the interdependent syntax tree of each sentence at least two sentence;
The syntax tuple for comparing each sentence determines that the syntactic structure similarity of the syntactic structure dimension is special
Sign.
Optionally, on the basis of the above embodiments, the device 40 of determining sentence consistency provided in an embodiment of the present invention
Another embodiment in,
First determination unit 402 is used for:
Core predicate, subject-predicate relationship and the dynamic guest of each sentence are determined according to the interdependent syntax tree of each sentence
Relationship, using the core predicate of each sentence, subject-predicate relationship and dynamic guest's relationship as the syntax tuple of each sentence
Component.
Optionally, refering to Fig. 7, on the basis of the above embodiments, determining sentence consistency provided in an embodiment of the present invention
Device 40 another embodiment in, described device 40 further includes:
Optimize unit 404, after determining that at least two sentence is consistent in second determination unit 403, root
Optimize the sentence consistency discrimination model according at least two sentence.
The device of determining sentence consistency provided in an embodiment of the present invention can be refering to fig. 1 to the corresponding description of the parts Fig. 5
Understood, this place is no longer described in detail.
Refering to Fig. 8, an embodiment packet of the device 50 provided in an embodiment of the present invention for establishing sentence consistency discrimination model
It includes:
Acquiring unit 501, for obtaining multiple sample sentence groups for training pattern, wherein each sample sentence group
In all include at least two sample sentences, at least two samples sentence all carries sample label, and the sample label is used for
Indicate that the semanteme of at least two samples sentence is identical or differs;
Determination unit 502, at least two samples sentence for being obtained for the acquiring unit 501, extraction are more
The similarity feature of the similarity feature of each dimension in a dimension, each dimension includes syntactic structure similarity feature,
The syntactic structure similarity is characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence;
Model training unit 503, the similarity feature of each dimension for being determined according to the determination unit 502
Model training is carried out, sentence consistency discrimination model is established.
In the embodiment of the present invention, acquiring unit 501 obtains multiple sample sentence groups for training pattern, wherein each
All include at least two sample sentences in sample sentence group, at least two samples sentence all carries sample label, the sample
This label is used to indicate the semantic identical of at least two samples sentence or differs;Determination unit 502 is directed to the acquisition
At least two samples sentence that unit 501 obtains extracts the similarity feature of each dimension in multiple dimensions, described each
The similarity feature of dimension includes syntactic structure similarity feature, and the syntactic structure similarity is characterized in by sample sentence
Carry out what feature extraction obtained according to syntactic structure;Model training unit 503 determines described every according to the determination unit 502
The similarity feature of a dimension carries out model training, establishes sentence consistency discrimination model.Compared with prior art, the present invention is real
The training device for applying the sentence consistency discrimination model of example offer includes syntactic structure spy in the similarity feature of various dimensions
Sign, the consistency of sentence is determined by syntactic structure, can improve the accuracy for judging sentence consistency, and then improve artificial intelligence
The accuracy of energy question answering system and search engine.
Optionally, provided in an embodiment of the present invention to establish sentence consistency on the basis of above-mentioned Fig. 8 corresponding embodiments
In another embodiment of the device 50 of discrimination model,
The determination unit 502 is used for:The multiple dimension includes syntactic structure dimension,
Interdependent syntactic analysis is carried out at least two samples sentence, to respectively obtain at least two samples sentence
In each sample sentence interdependent syntax tree;
Each sample language is determined according to the interdependent syntax tree of each sample sentence in at least two samples sentence
The syntax tuple of sentence;
The syntax tuple for comparing each sample sentence determines that the syntactic structure of the syntactic structure dimension is similar
Spend feature.
Optionally, on the basis of the above embodiments, provided in an embodiment of the present invention to establish sentence consistency discrimination model
Device 50 another embodiment in,
The determination unit 502 is used for:
Determine that the core predicate of each sample sentence, subject-predicate close according to the interdependent syntax tree of each sample sentence
System and dynamic guest's relationship, using the core predicate of each sample sentence, subject-predicate relationship and dynamic guest's relationship as each sample
The component of the syntax tuple of sentence.
The device 50 provided in an embodiment of the present invention for establishing sentence consistency discrimination model can be refering to fig. 1 to the parts Fig. 4
It is corresponding description understood, this place is no longer described in detail.
The device 50 provided in an embodiment of the present invention for establishing sentence consistency discrimination model can be that the calculating such as server are set
It is standby, with reference to the form of server, the device 50 for establishing sentence consistency discrimination model in the embodiment of the present invention is described.
Fig. 9 is the structural schematic diagram of the device 50 provided in an embodiment of the present invention for establishing sentence consistency discrimination model.Institute
It includes processor 510, memory 550 and transceiver 530, memory 550 to state and establish the device 50 of sentence consistency discrimination model
It may include read-only memory and random access memory, and operational order and data provided to processor 510.Memory 550
It is a part of can also include nonvolatile RAM (NVRAM).
In some embodiments, memory 550 stores following element, executable modules or data structures, or
Their subset of person or their superset:
In embodiments of the present invention, by calling the operational order of the storage of memory 550, (operational order is storable in behaviour
Make in system),
Obtain multiple sample sentence groups for training pattern, wherein each sample sentence group includes at least two
Sample sentence, at least two samples sentence all carry sample label, and the sample label is used to indicate described at least two
The semanteme of sample sentence is identical or differs;
For at least two samples sentence, the similarity feature of each dimension in multiple dimensions is extracted, it is described each
The similarity feature of dimension includes syntactic structure similarity feature, and the syntactic structure similarity is characterized in by sample sentence
Carry out what feature extraction obtained according to syntactic structure;
Model training is carried out according to the similarity feature of each dimension, establishes sentence consistency discrimination model.
Compared with prior art, the training device of sentence consistency discrimination model provided in an embodiment of the present invention, in multidimensional
The similarity feature of degree includes syntactic structure feature, and the consistency of sentence is determined by syntactic structure, can improve judgement language
The accuracy of sentence consistency, and then improve the accuracy of artificial intelligence question answering system and search engine.
Processor 510 controls the operation for the device 50 for establishing sentence consistency discrimination model, and processor 510 can also be known as
CPU (Central Processing Unit, central processing unit).Memory 550 may include read-only memory and deposit at random
Access to memory, and provide instruction and data to processor 510.The a part of of memory 550 can also include non-volatile random
Access memory (NVRAM).The various components for the device 50 for establishing sentence consistency discrimination model in specific application pass through total
Linear system system 520 is coupled, and wherein bus system 520 can also include power bus, control in addition to including data/address bus
Bus and status signal bus in addition etc..But for the sake of clear explanation, various buses are all designated as bus system 520 in figure.
The method that the embodiments of the present invention disclose can be applied in processor 510, or be realized by processor 510.
Processor 510 may be a kind of IC chip, the processing capacity with signal.During realization, the above method it is each
Step can be completed by the integrated logic circuit of the hardware in processor 510 or the instruction of software form.Above-mentioned processing
Device 510 can be general processor, digital signal processor (DSP), application-specific integrated circuit (ASIC), ready-made programmable gate array
(FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware components.May be implemented or
Person executes disclosed each method, step and the logic diagram in the embodiment of the present invention.General processor can be microprocessor or
Person's processor can also be any conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be straight
Connect and be presented as that hardware decoding processor executes completion, or in decoding processor hardware and software module combination executed
At.Software module can be located at random access memory, and flash memory, read-only memory, programmable read only memory or electrically-erasable can
In the storage medium of this fields such as programmable memory, register maturation.The storage medium is located at memory 550, and processor 510 is read
Information in access to memory 550, in conjunction with the step of its hardware completion above method.
Optionally, processor 510 is used for:The multiple dimension includes syntactic structure dimension,
Interdependent syntactic analysis is carried out at least two samples sentence, to respectively obtain at least two samples sentence
In each sample sentence interdependent syntax tree;
Each sample language is determined according to the interdependent syntax tree of each sample sentence in at least two samples sentence
The syntax tuple of sentence;
The syntax tuple for comparing each sample sentence determines that the syntactic structure of the syntactic structure dimension is similar
Spend feature.
Optionally, processor 510 is used for:
Determine that the core predicate of each sample sentence, subject-predicate close according to the interdependent syntax tree of each sample sentence
System and dynamic guest's relationship, using the core predicate of each sample sentence, subject-predicate relationship and dynamic guest's relationship as each sample
The component of the syntax tuple of sentence.
The description of the device 50 to establishing sentence consistency discrimination model can be refering to fig. 1 to the description of the parts Fig. 4 above
Understood, it is no longer repeated at this place.
The device of determining sentence consistency provided in an embodiment of the present invention can be by mobile terminal and personal computer, intelligence
Robot etc. is equipped with the terminal device of search engine to realize, for determining that the device of sentence consistency is mobile phone, introduces
The process of attribute sentence consistency really of the embodiment of the present invention.
As shown in Figure 10, for convenience of description, it illustrates only and the relevant part of the embodiment of the present invention, particular technique details
It does not disclose, please refers to present invention method part.
Figure 10 shows the block diagram with the part-structure of mobile terminal 800 provided in an embodiment of the present invention.With reference to figure 10,
Mobile terminal includes:Camera 810, memory 820, input unit 830, display unit 840, sensor 850, voicefrequency circuit
860, the components such as WiFi module 870, processor 880 and power supply 890.It will be understood by those skilled in the art that being shown in Figure 10
Mobile terminal structure do not constitute the restriction to mobile terminal, may include than illustrating more or fewer components or group
Close certain components or different components arrangement.
Each component parts of mobile terminal is specifically introduced with reference to Figure 10:
Camera 810 can be used for shooting;
Memory 820 can be used for storing software program and module, and processor 880 is stored in memory 820 by operation
Software program and module, to execute various function application and the data processing of mobile terminal.Memory 820 can be main
Including storing program area and storage data field, wherein storing program area can storage program area, answering needed at least one function
With program (such as sound-playing function, image player function etc.) etc.;Storage data field can store the use according to mobile terminal
Data (such as audio data, phone directory etc.) created etc..In addition, memory 820 may include high random access storage
Device, can also include nonvolatile memory, and a for example, at least disk memory, flush memory device or other volatibility are solid
State memory device.
Input unit 830 can be used for receiving the operational order of user, and generate with the user setting of mobile terminal 800 with
And the related key signals input of function control.Specifically, input unit 830 may include that touch panel 831 and other inputs are set
Standby 832.Touch panel 831, also referred to as touch screen, collect user on it or neighbouring touch operation (such as user use
The operation of any suitable object or attachment such as finger, stylus on touch panel 831 or near touch panel 831), and root
Corresponding connection mobile terminal is driven according to preset formula.Optionally, touch panel 831 may include touch detection movement eventually
End and two parts of touch controller.Wherein, the touch orientation of touch detection mobile terminal detection user, and detect touch operation
The signal brought, transmits a signal to touch controller;Touch controller receives touch information from touch detection mobile terminal,
And it is converted into contact coordinate, then processor 880 is given, and order that processor 880 is sent can be received and executed.This
Outside, the multiple types such as resistance-type, condenser type, infrared ray and surface acoustic wave may be used and realize touch panel 831.In addition to touch-control
Panel 831, input unit 830 can also include other input equipments 832.Specifically, other input equipments 832 may include but
One be not limited in physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, operating lever etc.
Kind is a variety of.
Display unit 840 can be used for display interface.Display unit 840 may include indicator light 841, optionally, may be used
Liquid crystal display (Liquid Crystal Display, LCD), Organic Light Emitting Diode (Organic Light-Emitting
Diode, OLED) etc. forms carry out configuration instruction lamp 841.Further, touch panel 831 can cover indicator light 841, work as touch surface
Plate 831, which detects, sends processor 880 to determine the type of touch event, then on it or after neighbouring touch operation
Processor 880 provides corresponding visual output according to the type of touch event on indicator light 841.Although in Fig. 10, touch-control
Panel 831 and indicator light 841 are to realize input and the input function of mobile terminal as two independent components, but at certain
In a little embodiments, can be integrated by touch panel 831 and indicator light 841 and that realizes mobile terminal output and input function.
Mobile terminal 800 may also include at least one sensor 850.
Voicefrequency circuit 860, loud speaker 861, microphone 862 can provide the audio interface between user and mobile terminal.Sound
The transformed electric signal of the audio data received can be transferred to loud speaker 861, is converted to by loud speaker 861 by frequency circuit 860
Voice signal exports;On the other hand, the voice signal of collection is converted to electric signal by microphone 862, is received by voicefrequency circuit 860
After be converted to audio data, it is such as another to be sent to through camera 810 then by after the processing of audio data output processor 880
Mobile terminal, or audio data is exported to memory 820 to be further processed.
WiFi module 870 can be used for communicating.
Processor 880 is the control centre of mobile terminal, utilizes each of various interfaces and the entire mobile terminal of connection
A part by running or execute the software program and/or module that are stored in memory 820, and calls and is stored in storage
Data in device 820 execute the various functions and processing data of mobile terminal, to carry out integral monitoring to mobile terminal.It can
Choosing, processor 880 may include one or more processing units;Preferably, processor 880 can integrate application processor and modulation
Demodulation processor, wherein the main processing operation system of application processor, user interface and application program etc., modulation /demodulation processing
Device mainly handles wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 880.
Mobile terminal 800 further includes the power supply 890 (such as battery) powered to all parts, it is preferred that power supply can lead to
Cross power-supply management system and processor 880 be logically contiguous, to by power-supply management system realize management charging, electric discharge and
The functions such as power managed.
Although being not shown, mobile terminal 800 can also include radio frequency (Radio Frequency, RF) circuit, bluetooth module
Deng details are not described herein.
In embodiments of the present invention, the mobile terminal processor 880 included when determining sentence consistency also has
Following functions:
The sentence group of consistency to be determined is received, the sentence group includes at least two sentences;
The consistency probability of at least two sentence, the sentence consistency are determined by sentence consistency discrimination model
The training characteristics of discrimination model include syntactic structure similarity feature, and the syntactic structure similarity is characterized in by sample
Sentence carries out what feature extraction obtained according to syntactic structure;
When the consistency probability is more than probability threshold value, determine that at least two sentence is consistent.
Optionally, the consistency probability that at least two sentence is determined by sentence consistency discrimination model, can
To include:
For at least two sentence, the similarity feature of each dimension in multiple dimensions is extracted;
According to the similarity feature of each dimension, the consistency probability of at least two sentence is determined.
Optionally, the multiple dimension includes syntactic structure dimension, syntactic structure dimension in the multiple dimensions of extraction
Similarity feature, may include:
Interdependent syntactic analysis is carried out at least two sentence, to respectively obtain each language at least two sentence
The interdependent syntax tree of sentence;
The syntax tuple of each sentence is determined according to the interdependent syntax tree of each sentence at least two sentence;
The syntax tuple for comparing each sentence determines that the syntactic structure similarity of the syntactic structure dimension is special
Sign.
Optionally, the interdependent syntax tree according to each sentence at least two sentence determines each sentence
Syntax tuple, may include:
Core predicate, subject-predicate relationship and the dynamic guest of each sentence are determined according to the interdependent syntax tree of each sentence
Relationship, using the core predicate of each sentence, subject-predicate relationship and dynamic guest's relationship as the syntax tuple of each sentence
Component.
Optionally, described when the consistency probability is more than probability threshold value, determine at least two sentence it is consistent it
Afterwards, the method can also include:
Optimize the sentence consistency discrimination model according at least two sentence.
Mobile phone provided in an embodiment of the present invention can be refering to fig. 1 to the parts Fig. 6 associated description understood that this place is no longer
Repetition repeats.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or its arbitrary combination real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.
The computer program product includes one or more computer instructions.Load and execute on computers the meter
When calculation machine program instruction, entirely or partly generate according to the flow or function described in the embodiment of the present invention.The computer can
To be all-purpose computer, special purpose computer, computer network or other programmable devices.The computer instruction can be deposited
Storage in a computer-readable storage medium, or from a computer readable storage medium to another computer readable storage medium
Transmission, for example, the computer instruction can pass through wired (example from a web-site, computer, server or data center
Such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave) mode to another website
Website, computer, server or data center are transmitted.The computer readable storage medium, which can be computer, to be deposited
Any usable medium of storage is either set comprising data storages such as one or more usable mediums integrated server, data centers
It is standby.The usable medium can be magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or partly lead
Body medium (such as solid state disk Solid State Disk (SSD)) etc..
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage
Medium may include:ROM, RAM, disk or CD etc..
It is provided for the embodiments of the invention the method for determining sentence consistency above, establishes sentence consistency discrimination model
Method and related device be described in detail, specific case used herein is to the principle of the present invention and embodiment
It is expounded, the explanation of above example is only intended to facilitate the understanding of the method and its core concept of the invention;Meanwhile for
Those of ordinary skill in the art have change in specific embodiments and applications according to the thought of the present invention
Place, in conclusion the content of the present specification should not be construed as limiting the invention.
Claims (16)
1. a kind of method of determining sentence consistency, which is characterized in that including:
The sentence group of consistency to be determined is received, the sentence group includes at least two sentences;
The consistency probability of at least two sentence, the sentence consistency discrimination are determined by sentence consistency discrimination model
The training characteristics of model include syntactic structure similarity feature, and the syntactic structure similarity is characterized in by sample sentence
Carry out what feature extraction obtained according to syntactic structure;
When the consistency probability is more than probability threshold value, determine that at least two sentence is consistent.
2. according to the method described in claim 1, it is characterized in that, it is described by sentence consistency discrimination model determine described in extremely
The consistency probability of few two sentences, including:
For at least two sentence, the similarity feature of each dimension in multiple dimensions is extracted;
According to the similarity feature of each dimension, the consistency probability of at least two sentence is determined.
3. described according to the method described in claim 2, it is characterized in that, the multiple dimension includes syntactic structure dimension
The similarity feature of syntactic structure dimension in multiple dimensions is extracted, including:
Interdependent syntactic analysis is carried out at least two sentence, to respectively obtain each sentence at least two sentence
Interdependent syntax tree;
The syntax tuple of each sentence is determined according to the interdependent syntax tree of each sentence at least two sentence;
The syntax tuple for comparing each sentence determines the syntactic structure similarity feature of the syntactic structure dimension.
4. according to the method described in claim 3, it is characterized in that, described according to each sentence at least two sentence
Interdependent syntax tree determines the syntax tuple of each sentence, including:
Determine that core predicate, subject-predicate relationship and the dynamic guest of each sentence are closed according to the interdependent syntax tree of each sentence
System, using the core predicate of each sentence, subject-predicate relationship and dynamic guest's relationship as the group of the syntax tuple of each sentence
At element.
5. according to any methods of claim 1-4, which is characterized in that described when the consistency probability is more than probability threshold
When value, after determining that at least two sentence is consistent, the method further includes:
Optimize the sentence consistency discrimination model according at least two sentence.
6. a kind of method for establishing sentence consistency discrimination model, which is characterized in that including
Obtain multiple sample sentence groups for training pattern, wherein each sample sentence group includes at least two samples
Sentence, at least two samples sentence all carry sample label, and the sample label is used to indicate at least two sample
The semanteme of sentence is identical or differs;
For at least two samples sentence, the similarity feature of each dimension in multiple dimensions, each dimension are extracted
Similarity feature include syntactic structure similarity feature, the syntactic structure similarity be characterized in by sample sentence according to
Syntactic structure carries out what feature extraction obtained;
Model training is carried out according to the similarity feature of each dimension, establishes sentence consistency discrimination model.
7. described according to the method described in claim 6, it is characterized in that, the multiple dimension includes syntactic structure dimension
The similarity feature of syntactic structure dimension in multiple dimensions is extracted, including:
Interdependent syntactic analysis is carried out at least two samples sentence, it is every in at least two samples sentence to respectively obtain
The interdependent syntax tree of a sample sentence;
Each sample sentence is determined according to the interdependent syntax tree of each sample sentence in at least two samples sentence
Syntax tuple;
The syntax tuple for comparing each sample sentence determines that the syntactic structure similarity of the syntactic structure dimension is special
Sign.
8. the method according to the description of claim 7 is characterized in that described according to each sample in at least two samples sentence
The interdependent syntax tree of this sentence determines the syntax tuple of each sample sentence, including:
According to the interdependent syntax tree of each sample sentence determine the core predicate of each sample sentence, subject-predicate relationship and
Dynamic guest's relationship, using the core predicate of each sample sentence, subject-predicate relationship and dynamic guest's relationship as each sample sentence
Syntax tuple component.
9. a kind of device of determining sentence consistency, which is characterized in that including:
Receiving unit, the sentence group for receiving consistency to be determined, the sentence group include at least two sentences;
First determination unit, for determining described at least two of the receiving unit reception by sentence consistency discrimination model
The training characteristics of the consistency probability of sentence, the sentence consistency discrimination model include syntactic structure similarity feature, institute
Syntactic structure similarity is stated to be characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence;
Second determination unit is used for when the consistency probability that first determination unit determines is more than probability threshold value, really
Fixed at least two sentence is consistent.
10. device according to claim 9, which is characterized in that
First determination unit is used for:
For at least two sentence, the similarity feature of each dimension in multiple dimensions is extracted;
According to the similarity feature of each dimension, the consistency probability of at least two sentence is determined.
11. device according to claim 10, which is characterized in that
First determination unit is used for:The multiple dimension includes syntactic structure dimension,
Interdependent syntactic analysis is carried out at least two sentence, to respectively obtain each sentence at least two sentence
Interdependent syntax tree;
The syntax tuple of each sentence is determined according to the interdependent syntax tree of each sentence at least two sentence;
The syntax tuple for comparing each sentence determines the syntactic structure similarity feature of the syntactic structure dimension.
12. according to the devices described in claim 11, which is characterized in that
First determination unit is used for:
Determine that core predicate, subject-predicate relationship and the dynamic guest of each sentence are closed according to the interdependent syntax tree of each sentence
System, using the core predicate of each sentence, subject-predicate relationship and dynamic guest's relationship as the group of the syntax tuple of each sentence
At element.
13. according to any devices of claim 9-12, which is characterized in that described device further includes:
Optimize unit, after determining that at least two sentence is consistent in second determination unit, according to it is described at least
Two sentences optimize the sentence consistency discrimination model.
14. a kind of device for establishing sentence consistency discrimination model, which is characterized in that including:
Acquiring unit, for obtaining multiple sample sentence groups for training pattern, wherein each sample sentence group includes
At least two sample sentences, at least two samples sentence all carry sample label, and the sample label is used to indicate described
The semanteme of at least two sample sentences is identical or differs;
Determination unit, at least two samples sentence for being obtained for the acquiring unit extract every in multiple dimensions
The similarity feature of the similarity feature of a dimension, each dimension includes syntactic structure similarity feature, the syntax knot
Structure similarity is characterized in by carrying out what feature extraction obtained according to syntactic structure to sample sentence;
The similarity feature of model training unit, each dimension for being determined according to the determination unit carries out model instruction
Practice, establishes sentence consistency discrimination model.
15. device according to claim 14, which is characterized in that
The determination unit is used for:The multiple dimension includes syntactic structure dimension,
Interdependent syntactic analysis is carried out at least two samples sentence, it is every in at least two samples sentence to respectively obtain
The interdependent syntax tree of a sample sentence;
Each sample sentence is determined according to the interdependent syntax tree of each sample sentence in at least two samples sentence
Syntax tuple;
The syntax tuple for comparing each sample sentence determines that the syntactic structure similarity of the syntactic structure dimension is special
Sign.
16. device according to claim 15, which is characterized in that
The determination unit is used for:
According to the interdependent syntax tree of each sample sentence determine the core predicate of each sample sentence, subject-predicate relationship and
Dynamic guest's relationship, using the core predicate of each sample sentence, subject-predicate relationship and dynamic guest's relationship as each sample sentence
Syntax tuple component.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710121244.9A CN108536665A (en) | 2017-03-02 | 2017-03-02 | A kind of method and device of determining sentence consistency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710121244.9A CN108536665A (en) | 2017-03-02 | 2017-03-02 | A kind of method and device of determining sentence consistency |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108536665A true CN108536665A (en) | 2018-09-14 |
Family
ID=63489255
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710121244.9A Pending CN108536665A (en) | 2017-03-02 | 2017-03-02 | A kind of method and device of determining sentence consistency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108536665A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800292A (en) * | 2018-12-17 | 2019-05-24 | 北京百度网讯科技有限公司 | The determination method, device and equipment of question and answer matching degree |
CN112489740A (en) * | 2020-12-17 | 2021-03-12 | 北京惠及智医科技有限公司 | Medical record detection method, training method of related model, related equipment and device |
CN112632234A (en) * | 2019-10-09 | 2021-04-09 | 科沃斯商用机器人有限公司 | Human-computer interaction method and device, intelligent robot and storage medium |
CN112650836A (en) * | 2020-12-28 | 2021-04-13 | 成都网安科技发展有限公司 | Text analysis method and device based on syntax structure element semantics and computing terminal |
CN113128201A (en) * | 2019-12-31 | 2021-07-16 | 阿里巴巴集团控股有限公司 | Sentence similarity determining method, answer searching method, device, equipment, system and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110257963A1 (en) * | 2006-10-10 | 2011-10-20 | Konstantin Zuev | Method and system for semantic searching |
CN103646112A (en) * | 2013-12-26 | 2014-03-19 | 中国科学院自动化研究所 | Dependency parsing field self-adaption method based on web search |
CN104391969A (en) * | 2014-12-04 | 2015-03-04 | 百度在线网络技术(北京)有限公司 | User query statement syntactic structure determining method and device |
CN105975458A (en) * | 2016-05-03 | 2016-09-28 | 安阳师范学院 | Fine-granularity dependence relationship-based method for calculating Chinese long sentence similarity |
-
2017
- 2017-03-02 CN CN201710121244.9A patent/CN108536665A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110257963A1 (en) * | 2006-10-10 | 2011-10-20 | Konstantin Zuev | Method and system for semantic searching |
CN103646112A (en) * | 2013-12-26 | 2014-03-19 | 中国科学院自动化研究所 | Dependency parsing field self-adaption method based on web search |
CN104391969A (en) * | 2014-12-04 | 2015-03-04 | 百度在线网络技术(北京)有限公司 | User query statement syntactic structure determining method and device |
CN105975458A (en) * | 2016-05-03 | 2016-09-28 | 安阳师范学院 | Fine-granularity dependence relationship-based method for calculating Chinese long sentence similarity |
Non-Patent Citations (1)
Title |
---|
李佳媛: "汉语句子相似度计算技术及其应用", 《中国优秀硕士学位论文全文数据库(基础科学辑)》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800292A (en) * | 2018-12-17 | 2019-05-24 | 北京百度网讯科技有限公司 | The determination method, device and equipment of question and answer matching degree |
CN112632234A (en) * | 2019-10-09 | 2021-04-09 | 科沃斯商用机器人有限公司 | Human-computer interaction method and device, intelligent robot and storage medium |
CN113128201A (en) * | 2019-12-31 | 2021-07-16 | 阿里巴巴集团控股有限公司 | Sentence similarity determining method, answer searching method, device, equipment, system and medium |
CN112489740A (en) * | 2020-12-17 | 2021-03-12 | 北京惠及智医科技有限公司 | Medical record detection method, training method of related model, related equipment and device |
CN112650836A (en) * | 2020-12-28 | 2021-04-13 | 成都网安科技发展有限公司 | Text analysis method and device based on syntax structure element semantics and computing terminal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11907277B2 (en) | Method, apparatus, and computer program product for classification and tagging of textual data | |
US11334635B2 (en) | Domain specific natural language understanding of customer intent in self-help | |
CN108536852B (en) | Question-answer interaction method and device, computer equipment and computer readable storage medium | |
US9767166B2 (en) | System and method for predicting user behaviors based on phrase connections | |
US20180341871A1 (en) | Utilizing deep learning with an information retrieval mechanism to provide question answering in restricted domains | |
US9146987B2 (en) | Clustering based question set generation for training and testing of a question and answer system | |
CN108536665A (en) | A kind of method and device of determining sentence consistency | |
US20150161230A1 (en) | Generating an Answer from Multiple Pipelines Using Clustering | |
US20150066711A1 (en) | Methods, apparatuses and computer-readable mediums for organizing data relating to a product | |
US20130282704A1 (en) | Search system with query refinement | |
US20230177360A1 (en) | Surfacing unique facts for entities | |
US10984056B2 (en) | Systems and methods for evaluating search query terms for improving search results | |
CN106874441A (en) | Intelligent answer method and apparatus | |
WO2015143239A1 (en) | Providing search recommendation | |
CN110390094B (en) | Method, electronic device and computer program product for classifying documents | |
CN109829045A (en) | A kind of answering method and device | |
EP3867790A1 (en) | Enhanced intent matching using keyword-based word mover's distance | |
CN115248839A (en) | Knowledge system-based long text retrieval method and device | |
CN114116997A (en) | Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium | |
TW202022635A (en) | System and method for adaptively adjusting related search words | |
CN110245357B (en) | Main entity identification method and device | |
Polat | Experiments on company name disambiguation with supervised classification techniques | |
Lu et al. | Improving web search relevance with semantic features | |
CN115795023B (en) | Document recommendation method, device, equipment and storage medium | |
CN117591511A (en) | Method and device for constructing search database, and search method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180914 |