CN107818081A - Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling - Google Patents
Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling Download PDFInfo
- Publication number
- CN107818081A CN107818081A CN201710876254.3A CN201710876254A CN107818081A CN 107818081 A CN107818081 A CN 107818081A CN 201710876254 A CN201710876254 A CN 201710876254A CN 107818081 A CN107818081 A CN 107818081A
- Authority
- CN
- China
- Prior art keywords
- similarity
- semantic
- sentence
- mrow
- predicate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to a kind of sentence similarity appraisal procedure based on deep semantic model and semantic character labeling, text-string is mapped to the characteristic vector in low semantic space, the similarity between two sentences is measured using cosine similarity;Existing semantic role is retained, and other semantic roles are uniformly handled;According to the size of the similarity between predicate to pairing of the sentence to progress predicate, predicate matching pair is obtained, further obtains the similar calculated value between semantic role;Multiple semantic roles of each predicate in multiple predicates of one sentence are subjected to Semanteme collocation, the similarity of semantic role is calculated, the similarity that deep semantic model is calculated and the similarity two parts calculated based on semantic role carry out final similarity of the linear combination as sentence.The present invention combines semantic role, and than Pearson correlation coefficient lifting 2.226%, the result than being ranked the first on SemEval2017 evaluation and tests official website is higher by 0.266%.
Description
Technical field
The present invention relates to a kind of natural language processing technique, is specially that one kind is based on deep semantic model and semantic role mark
The sentence similarity appraisal procedure of note.
Background technology
Sentence similarity (Sentence Similarity Computing) is to measure the semantic equivalence between two sentences
Property, it is research work particularly significant in natural language processing field and compared with based on.Such as in Case-based design
In calculated by sentence similarity match similar sentence be used as the candidate collection translated, in automatically request-answering system problem with answering
The matching of case, in information filtering, for rejecting possible junk information, in automatic abstract by similarity abstracting sentence
Son, in classification or cluster, for judging classification of sentence or document etc..
The similarity based method of research sentence has the method matched based on morphology, word order that Lv Xueqiang et al. is proposed at present;The Qin
Soldier et al. proposes the method based on keyword;Pan Qian is red et al. to propose the method based on On The Attribute Theory;The use that Li Bin et al. is proposed
The method that semantic dependency calculates;The method based on skeleton dependency tree that fringe will side et al. proposes;The improvement that Che Wanxiang et al. is proposed
The method of editing distance;Journey passes sentence similarity computational methods based on Hownet that roc et al. proposes etc..
Can be divided into by calculating the method for sentence similarity at present by three kinds:(1) method based on word feature, such as vector space mould
Type, morphology, word order etc.;(3) method based on semanteme:Method such as based on semantic dictionary;(3) side based on syntactic analysis
Method, the sentence similarity such as based on interdependent syntactic analysis calculate.
Method based on word feature has only used the surface layer information of sentence, for containing the vocabulary such as synonym, antonym
Sentence cannot be handled well.Based on the method for semantic dictionary, solve to a certain extent based on word characterization method
Deficiency, but this method depends on the completeness of semantic dictionary, have ignored the interaction relationship and sentence between sentence word
Profound syntactic structure.And the method based on interdependent syntactic analysis can excavate the profound information of sentence, obtain sentence
Institutional framework and word between dependence, but the method for the interdependent syntax used at present only make use of effective collocation of sentence
It is right, it have ignored influence of other words to sentence similarity.
The content of the invention
Calculate and established in the frame using verb as core for the sentence similarity based on semantic character labeling in the prior art
In the similarity of frame, the deficiencies of composition information that can not make full use of verb and its domination be present, the present invention, which proposes, to be based on
The method that the sentence similarity of deep semantic model and semantic character labeling calculates, sentence structure, semantic level from sentence enter
Row analysis.
In order to solve the above technical problems, the technical solution adopted by the present invention is:
A kind of sentence similarity appraisal procedure based on deep semantic model and semantic character labeling of the present invention, including it is following
Step:
1) deep semantic model is established:By relatively short text-string be mapped to feature in low semantic space to
Measure, after the semantic feature vector for obtaining each sentence, the similarity between two sentences is measured using cosine similarity;
2) semantic role classification is handled:The existing semantic role of A0, A1, A2 is retained, and other semantic roles are unified
Handled as a kind of semantic role;A0, A1, A2 are disclosed semantic role mark;
3) predicate Similarity Measure:On semantic role basis of classification, for more predicate sentences, according to the phase between predicate
Like degree size to sentence to carry out predicate pairing, obtain predicate matching pair, respectively for multiple predicate matchings to carry out language
The calculating of adopted role, obtain the similar calculated value between semantic role;
4) sentence similarity based on semantic role calculates:According to similarity value calculation between semantic role by a sentence
Multiple predicates in each predicate multiple semantic roles carry out Semanteme collocation, calculate the similarity of semantic role, that is, convert
Similarity Measure between predicate and between identical semantic role;
5) sentence similarity calculates:The similarity that deep semantic model is calculated and the phase calculated based on semantic role
Final similarity of the linear combination as sentence is carried out like degree two parts.
Deep semantic model includes word Hash layer, hidden layer and output layer three parts, and each layer of function is as follows
It is shown:
l1=W1x (1)
li=f (Wili-1+bi), i=2 ..., N-1 (2)
Y=f (WNlN-1+bN) (3)
Wherein, x is input vector, and y is output vector, li, i=1 ..., N-1 are the output of hidden layer, WiRepresent the i-th power
Weight, biI-th of biasing is represented, f (*) represents tanh activation primitives;
By word Hash layer generate characteristic vector projected by hidden layer, and output layer formed semantic feature to
Amount;
After the semantic feature vector of each sentence is obtained, measured using cosine similarity between two sentences
Semantic Similarity.
Predicate matching method is as follows:
The similarity for calculate in sentence A in i-th of predicate and sentence B between j-th of predicate by DSSM models is
Sij, obtain similarity matrix N between any two between two sentence predicates:
Wherein n, m are respectively the number of predicate in two sentences;
The specific algorithm of predicate pairing is as follows:
301) all elements in row searching matrix N are pressed, find the maximum element of similarity, as sentence A and sentence B the
One predicate matching pair;
302) row and column corresponding to the maximum element of similarity is deleted, ensures each predicate with another predicate only
One pairing;
303) remaining element is put together the matrix N new as onei, whether element is empty in judgement, if so, then
Predicate pairing terminates, and otherwise continues executing with step 301), until all predicates all find unique pairing predicate.
When searching predicate matching pair, evaluate and test the testing material given in official website for SemEval2017 and only looked into from matrix N
Maximum preceding 4 predicates of similarity are looked for, if the row or column of matrix searches predicate matching pair less than 4 dimensions according to actual conditions.
Sentence similarity based on semantic role calculates:By multiple languages of each predicate in multiple predicates of a sentence
Adopted role carries out Semanteme collocation, calculates the similarity of semantic role, that is, is converted between predicate and between identical semantic role
Similarity Measure, be specially:
If A, the maximum predicate of similarity is respectively A in B sentences1、B1, determine for the similarity of each predicate matching pair
Justice is:
Wherein, n, m are respectively the number of the semantic role of corresponding predicate in sentence A and sentence B,For semanteme
Role ri、rjBetween similarity, S (VA1,VB1) represent two predicates between similarity, α be semantic role similarity complete
Shared proportion in sentence;
Similarity based on semantic role is defined as:
Wherein, the matching pair of count (V) predicates between sentence A, B, ∑ S (Ai,Bj) it is each predicate in formula (4)
The sum of the corresponding semantic role similarity of pairing.
In sentence similarity calculation procedure, it is as the final similarity of sentence using the progress linear combination of above-mentioned two parts:
The similarity calculated based on DSSM models is designated as S1, S will be designated as based on the similarity that semantic role calculates2,
Then the similarity of sentence is:
S (A, B)=β × S1+(1-β)×S2 (6)
β is represented based on DSSM sentence similarity in weight shared by the final similarity of sentence in above formula.
The invention has the advantages that and advantage:
1. the present invention proposes the method calculated based on the sentence similarity of deep semantic model and semantic character labeling, from
Sentence structure, the semantic level of sentence are analyzed, and also utilize the predicate in upper sentence and its composition information dominated.
2. the inventive method makes full use of sentence structure information, predicate information, on baselines experiment basis,
With reference to semantic role, Pearson correlation coefficient improves 2.226%, than the knot to be ranked the first on SemEval2017 evaluation and tests official website
Fruit is higher by 0.266%, and (SemEval is Semantic Evaluation abbreviation, is to carry out a system to calculating semantic parsing system
The assessment of row, mainly inquire into the essence of meaning in language.Task one is that sentence similarity is calculated in SemEval2017
Assessment).
Brief description of the drawings
Fig. 1 is the inventive method flow chart;
Fig. 2 is the DSSM models that the inventive method is related to.
Embodiment
With reference to Figure of description, the present invention is further elaborated.
Such as Fig. 1 institutes method, a kind of sentence similarity assessment side based on deep semantic model and semantic character labeling of the present invention
Method, comprise the following steps:
1) deep semantic model is established:By relatively short text-string be mapped to feature in low semantic space to
Measure, after the semantic feature vector for obtaining each sentence, the semantic phase between two sentences is measured using cosine similarity
Like degree;
2) semantic role classification is handled:A0, A1, A2 (A0, A1, A2 are that disclosed semantic role identifies) existing semantic angle
Color is retained, and other semantic roles are handled collectively as a kind of semantic role;
3) predicate Similarity Measure:On semantic role basis of classification, for more predicate sentences, according to the phase between predicate
Like degree size to sentence to carry out predicate pairing, obtain predicate matching pair, respectively for multiple predicate matchings to carry out language
The calculating of adopted role, obtain the similar calculated value between semantic role;
4) sentence similarity based on semantic role calculates:By in multiple predicates of a sentence on the basis of step 3)
Multiple semantic roles of each predicate carry out Semanteme collocation, calculate the similarity of semantic role, that is, be converted between predicate with
And the Similarity Measure between identical semantic role;
5) sentence similarity calculates:The similarity that DSSM models are calculated and the similarity calculated based on semantic role
Two parts carry out final similarity of the linear combination as sentence.
The present invention proposes the sentence similarity based on semantic character labeling and calculated, using different semantic roles to be substantially single
Member, consider multiple semantic verbs and the similarity of role in sentence.
In step 1), deep semantic model (Deep Structured Model, abbreviation DSSM) is as shown in Fig. 2 be a kind of
Technology based on deep learning, it is mainly used in the semantic understanding of text, it is by relatively short text-string (such as sentence)
The characteristic vector being mapped in low semantic space.These vectors can be used for document by comparing the similitude of document and inquiry
Retrieval, this method are used for the result of file retrieval better than other.
DSSM represents a sentence in semantic vector space using typical deep neutral net (DNN) architecture
(document).DNN is to be used as input using bag of words vector, and DSSM is subtracted using a kind of new word Hash (word harshing)
The dimension of few bag of words vector.Word Hash is that the beginning and end of each word is added into a " # " respectively, then with three characters
For a unit, the input as network.Such as word " cat ", beginning, ending become " #cat# " plus " # " respectively, three characters
For a unit, become " #ca ", " cat ", " at# ".Represent word by this way, share 3073 kinds of situations, then by this 3073
Kind representation is expressed as input of the form of vector as neutral net.
DSSM models mainly include three parts, are respectively:Word Hash layer, hidden layer, the function of each layer of output layer are as follows
It is shown:
l1=W1x (1)
li=f (Wili-1+bi), i=2 ..., N-1 (2)
Y=f (WNlN-1+bN) (3)
X represents input vector, and y represents output vector, li, the output of i=1 ..., N-1 expression hidden layers, WiRepresent i-th
Weight, biI-th of biasing is represented, f () represents tanh activation primitives.The characteristic vector generated by word Hash layer is entered by hidden layer
Row projection, and form semantic feature vector in output layer.After the semantic feature vector of each sentence is obtained, cosine is utilized
Similarity measures the Semantic Similarity between two sentences.Except that can calculate the similarity between sentence, the model may be used also
To calculate the similarity between word.The DSSM models that Fig. 2 is represented, wherein Q represent a question sentence, and D represents to treat candidate sentence subset
Close, R represents the cosine similarity between two vectors, and P represents to choose the probability of some sentence in candidate sentences set.At this
During invention sentence similarity calculates, D only selects a sentence, and need not calculate probable value P.
In step 2), semantic role classification processing, different syntax theory system has the classification of different semantic roles,
What Meng Cong etc. was compiled《Verb usage dictionary》Noun object is divided into 14 classes, Li Linding by the case relation of itself and verb《The modern Chinese
Sentence-type》In divided 21 classes etc..The species of semantic role is various, but because sentence similarity research is sentencing in simulation people
Disconnected process, therefore semantic role can be subjected to classification processing.The present invention is retained original semantic role A0, A1, A2,
Other semantic angles are handled collectively as a kind of semantic role, labeled as o_srl.
In step 3), predicate Similarity Measure a, it is generally the case that sentence can contain multiple predicates, if by two
All predicates in sentence carry out the calculating of similarity two-by-two, and complexity is very big when can not only make the long sentence of more predicates, Er Qiehui
Make experimental result influenced to different extents.Therefore more predicate problems are directed to, the present invention is proposed according to the phase between predicate
Like degree size to sentence to carry out predicate pairing.
If the similarity in sentence A in i-th of predicate and sentence B between j-th of predicate is Sij, two sentences can be obtained
Similarity matrix N between predicate between any two (similarity between predicate is calculated by DSSM models):
Wherein m, n are respectively the number of predicate in two sentences, and the specific algorithm of predicate pairing is as follows:
301) all elements in row searching matrix N are pressed, find the maximum element of similarity, as sentence A and sentence B the
One predicate matching pair;
302) delete step 301) in the maximum element of the similarity that finds corresponding to row and column, that is, ensure each
Predicate only uniquely matches with another predicate;
303) remaining element in step 302) is put together the matrix N new as one, element is in judgment matrix N
No is sky, if so, then predicate pairing terminates, step 301) is otherwise continued executing with, until all predicates all find unique pairing
Predicate.
P (p=min (n, m)) can be found to predicate matching pair by above-mentioned method, respectively for this p to carrying out
The calculating of semantic role.
The present invention is improved predicate similarity and semantic role similarity integration algorithm, and major part obtains similarity
Value obtains Similarity value to secondary part and plays a part of restriction.Therefore, if the predicate similarity-rough set in two sentences is low, that
The similarity of semantic role corresponding to predicate is played a part of also reducing for the overall similarity of sentence, therefore is being looked into
When looking for predicate matching pair, evaluate and test the testing material given in official website for SemEval2017 and only search similarity most from matrix N
Big preceding 4 predicates, if the row or column of matrix searches predicate matching pair less than 4 dimensions according to actual conditions.
The sentence similarity based on semantic role calculates in step 4), and a sentence typically contains multiple predicates, each
Predicate typically also includes multiple semantic roles, and predicate and this structure of semantic role are referred to as Semanteme collocation by the present invention.Calculate
The similarity of semantic role is to be converted into the Similarity Measure between predicate and between identical semantic role, such as:
Sentence A:
A man is shaved in front of a lecture hall
Semantic role analysis result be:
[A0 A man]is[V shaved][o_srl in front of a lecture hall]
Sentence B:
A man is sitting in the grass
Semantic role analysis result be:
[A0 A man]is[V sitting][o_srl in the grass]
[shaved, A0, A man] in sentence A can be used as a language with [sitting, A0, the A man] in sentence B
Justice collocation is to carrying out Similarity Measure, the similarity between as semantic role A0.Similarly, [shaved, o_srl, in front
Of a lecture hall] and [sitting, o_srl, in the grass] semantic role for calculating it is semantic as other
Similarity between role.
It is defined as (by taking the maximum predicate of similarity in A, B sentence as an example) for the similarity of each predicate matching pair:
In above formula, n, m are respectively the number of the semantic role of corresponding predicate in sentence A and sentence B,For language
Adopted role ri、rjBetween similarity, S (VA1,VB1) represent two predicates between similarity, α be semantic role similarity exist
Shared proportion in full sentence.Similarity between semantic role is calculated with DSSM models.
Similarity based on semantic role is defined as:
The matching pair of count (V) predicates between sentence A, B, ∑ S (A in above formulai,Bj) be formula (4) in all predicates
The sum of corresponding semantic role similarity.
The result that sentence similarity calculates in step 5) is made up of two parts:The similarity that is calculated based on DSSM models and
The similarity calculated based on semantic role, above-mentioned two parts are subjected to final similarity of the linear combination as sentence.
The similarity calculated based on DSSM models is designated as S1, S will be designated as based on the similarity that semantic role calculates2,
Then the similarity of sentence is:
S (A, B)=β × S1+(1-β)×S2 (6)
β is represented based on DSSM sentence similarity in weight, value 0.6 shared by the final similarity of sentence in above formula.
As shown in figure 1, input sentence pair, divides two parts to carry out the calculating of sentence similarity, a part is based on DSSM sentences
Similarity Measure, another part are to carry out sentence similarity calculating, most two-part experimental result line at last based on semantic role
Property combination as final sentence similarity.
Such as:
Sentence A:
[A0 A man]is[V shaved][o_srl in front of a lecture hall]
Sentence B:
[A0 A man]is[V sitting][o_srl in the grass]
Predicate matching to for:shaved、sitting
The corresponding semantic semantic role of predicate is:
A0:A man A man
o_srl:in front of a lecture hall in the grass
Above-mentioned predicate matching carries out the calculating of similarity to, semantic role respectively, and the result of calculating is carried out into linear group
Close, obtain the sentence similarity S based on semantic character labeling1。
Two sentences in above-mentioned example are subjected to sentence phase using the existing instrument sent2vec based on DSSM models
Like the calculating of degree, S is designated as2。
Above-mentioned two direction is carried out to the result S of Similarity Measure1、S2It is similar as final sentence to carry out linear combination
Degree.
It is in the upper experimental result of SemEval2017 language materials:
The experimental result of table 1
The experimental result that baseline experiments are drawn based on DSSM models.Semantic character labeling selected by the present invention
Instrument reached 88.25% in the test_wsj data set F values in CoNLL2005Shared Task.
On baselines experiment basis, with reference to semantic role, Pearson correlation coefficient improves 2.226%, than
The result of (ruthva) of being ranked the first on SemEval2017 evaluation and tests official website is higher by 0.266%.The result of semantic role identification is entered
Go and partly identified wrong amendment, the Pearson correlation coefficient after correcting has reached 0.85936, improved than baseline
2.416%.Experimental result, which illustrates semantic role identification to be dissolved into the calculating of sentence similarity, can make up current method and exist
The defects of using in terms of semantic information, and finally lift the result of calculation of sentence similarity.
Claims (6)
- A kind of 1. sentence similarity appraisal procedure based on deep semantic model and semantic character labeling, it is characterised in that including with Lower step:1) deep semantic model is established:Text-string is mapped to the characteristic vector in low semantic space, obtains each sentence Semantic feature vector after, measure the similarity between two sentences using cosine similarity;2) semantic role classification is handled:The existing semantic role of A0, A1, A2 is retained, other semantic roles collectively as A kind of semantic role is handled;A0, A1, A2 are disclosed semantic role mark;3) predicate Similarity Measure:On semantic role basis of classification, for more predicate sentences, according to the similarity between predicate Size to sentence to carrying out the pairing of predicate, predicate matching pair is obtained, respectively for multiple predicate matchings to carrying out semantic angle The calculating of color, obtain the similar calculated value between semantic role;4) sentence similarity based on semantic role calculates:According to similarity value calculation between semantic role by the more of sentence Multiple semantic roles of each predicate carry out Semanteme collocation in individual predicate, calculate the similarity of semantic role, that is, are converted into meaning Similarity Measure between word and between identical semantic role;5) sentence similarity calculates:The similarity that deep semantic model is calculated and the similarity calculated based on semantic role Two parts carry out final similarity of the linear combination as sentence.
- 2. the sentence similarity appraisal procedure according to claim 1 based on deep semantic model and semantic character labeling, It is characterized in that:Deep semantic model includes word Hash layer, hidden layer and output layer three parts, and each layer of function is such as Shown in lower:l1=W1x (1)li=f (Wili-1+bi), i=2 ..., N-1 (2)Y=f (WNlN-1+bN) (3)Wherein, x is input vector, and y is output vector, li, i=1 ..., N-1 are the output of hidden layer, WiThe i-th weight is represented, biI-th of biasing is represented, f (*) represents tanh activation primitives;The characteristic vector generated by word Hash layer is projected by hidden layer, and forms semantic feature vector in output layer;After the semantic feature vector of each sentence is obtained, the semanteme between two sentences is measured using cosine similarity Similitude.
- 3. the sentence similarity appraisal procedure according to claim 1 based on deep semantic model and semantic character labeling, It is characterized in that:Predicate matching method is as follows:The similarity for calculate in sentence A in i-th of predicate and sentence B between j-th of predicate by DSSM models is Sij, obtain To the similarity matrix N between two sentence predicates between any two:<mrow> <mi>N</mi> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msub> <mi>S</mi> <mn>11</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>S</mi> <mrow> <mn>1</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mo>...</mo> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <mo>...</mo> </mtd> </mtr> <mtr> <mtd> <msub> <mi>S</mi> <mrow> <mi>n</mi> <mn>1</mn> </mrow> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>S</mi> <mrow> <mi>n</mi> <mi>m</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> </mrow>Wherein n, m are respectively the number of predicate in two sentences;The specific algorithm of predicate pairing is as follows:301) all elements in row searching matrix N are pressed, the maximum element of similarity are found, as sentence A and sentence B first Predicate matching pair;302) row and column corresponding to the maximum element of similarity is deleted, ensures that each predicate is only uniquely matched somebody with somebody with another predicate It is right;303) remaining element is put together the matrix N new as onei, whether element is empty in judgement, if so, then predicate Pairing terminates, and otherwise continues executing with step 301), until all predicates all find unique pairing predicate.
- 4. the sentence similarity appraisal procedure according to claim 3 based on deep semantic model and semantic character labeling, It is characterized in that:When searching predicate matching pair, the testing material given in official website is evaluated and tested only from matrix N for SemEval2017 It is middle to search maximum preceding 4 predicates of similarity, if the row or column of matrix searches predicate matching less than 4 dimensions according to actual conditions It is right.
- 5. the sentence similarity appraisal procedure according to claim 1 based on deep semantic model and semantic character labeling, It is characterized in that:Sentence similarity based on semantic role calculates:By in multiple predicates of a sentence each predicate it is more Individual semantic role carries out Semanteme collocation, calculates the similarity of semantic role, that is, is converted between predicate and identical semantic role Between Similarity Measure, be specially:If A, the maximum predicate of similarity is respectively A in B sentences1、B1, it is defined as the similarity of each predicate matching pair:<mrow> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>B</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>&alpha;</mi> <mo>&times;</mo> <mfrac> <mrow> <mi>&Sigma;</mi> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>r</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>r</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>,</mo> <mi>m</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>&times;</mo> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mrow> <mi>A</mi> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>V</mi> <mrow> <mi>B</mi> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>Wherein, n, m are respectively the number of the semantic role of corresponding predicate in sentence A and sentence B,For semantic role ri、rjBetween similarity, S (VA1,VB1) represent two predicates between similarity, α be semantic role similarity in full sentence Shared proportion;Similarity based on semantic role is defined as:<mrow> <mi>S</mi> <mrow> <mo>(</mo> <mi>A</mi> <mo>,</mo> <mi>B</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>&Sigma;</mi> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>c</mi> <mi>o</mi> <mi>u</mi> <mi>n</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>V</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>Wherein, the matching pair of count (V) predicates between sentence A, B, ∑ S (Ai,Bj) it is each predicate matching pair in formula (4) The sum of corresponding semantic role similarity.
- 6. the sentence similarity appraisal procedure according to claim 1 based on deep semantic model and semantic character labeling, It is characterized in that in sentence similarity calculation procedure, above-mentioned two parts are subjected to final similarity of the linear combination as sentence For:The similarity calculated based on DSSM models is designated as S1, S will be designated as based on the similarity that semantic role calculates2, then sentence Son similarity be:S (A, B)=β × S1+(1-β)×S2 (6)β is represented based on DSSM sentence similarity in weight shared by the final similarity of sentence in above formula.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710876254.3A CN107818081A (en) | 2017-09-25 | 2017-09-25 | Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710876254.3A CN107818081A (en) | 2017-09-25 | 2017-09-25 | Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107818081A true CN107818081A (en) | 2018-03-20 |
Family
ID=61607137
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710876254.3A Pending CN107818081A (en) | 2017-09-25 | 2017-09-25 | Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107818081A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765240A (en) * | 2019-10-31 | 2020-02-07 | 中国科学技术大学 | Semantic matching evaluation method for multiple related sentence pairs |
CN112559713A (en) * | 2020-12-24 | 2021-03-26 | 北京百度网讯科技有限公司 | Text relevance judgment method and device, model, electronic equipment and readable medium |
CN113609304A (en) * | 2021-07-20 | 2021-11-05 | 广州大学 | Entity matching method and device |
CN115062619A (en) * | 2022-08-11 | 2022-09-16 | 中国人民解放军国防科技大学 | Chinese entity linking method, device, equipment and storage medium |
CN116306663A (en) * | 2022-12-27 | 2023-06-23 | 华润数字科技有限公司 | Semantic role labeling method, device, equipment and medium |
CN118035712A (en) * | 2024-04-12 | 2024-05-14 | 数据空间研究院 | NLP-based data collection rule identification method |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103562907A (en) * | 2011-05-10 | 2014-02-05 | 日本电气株式会社 | Device, method and program for assessing synonymous expressions |
-
2017
- 2017-09-25 CN CN201710876254.3A patent/CN107818081A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103562907A (en) * | 2011-05-10 | 2014-02-05 | 日本电气株式会社 | Device, method and program for assessing synonymous expressions |
Non-Patent Citations (4)
Title |
---|
PO-SEN HUANG ET AL.: "Learning Deep Structured Semantic Models for Web Search using Clickthrough Data", 《CIKM’13》 * |
张丹 等: "引入层次成分分析的依存句法分析", 《沈阳航空航天大学学报》 * |
李茹 等: "基于框架语义分析的汉语句子相似度计算", 《计算机研究与发展》 * |
田堃 等: "基于语义角色标注的汉语句子相似度算法", 《中文信息学报》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765240A (en) * | 2019-10-31 | 2020-02-07 | 中国科学技术大学 | Semantic matching evaluation method for multiple related sentence pairs |
CN110765240B (en) * | 2019-10-31 | 2023-06-20 | 中国科学技术大学 | Semantic matching evaluation method for multi-phase sentence pairs |
CN112559713A (en) * | 2020-12-24 | 2021-03-26 | 北京百度网讯科技有限公司 | Text relevance judgment method and device, model, electronic equipment and readable medium |
CN112559713B (en) * | 2020-12-24 | 2023-12-01 | 北京百度网讯科技有限公司 | Text relevance judging method and device, model, electronic equipment and readable medium |
CN113609304A (en) * | 2021-07-20 | 2021-11-05 | 广州大学 | Entity matching method and device |
CN113609304B (en) * | 2021-07-20 | 2023-05-23 | 广州大学 | Entity matching method and device |
CN115062619A (en) * | 2022-08-11 | 2022-09-16 | 中国人民解放军国防科技大学 | Chinese entity linking method, device, equipment and storage medium |
CN115062619B (en) * | 2022-08-11 | 2022-11-22 | 中国人民解放军国防科技大学 | Chinese entity linking method, device, equipment and storage medium |
CN116306663A (en) * | 2022-12-27 | 2023-06-23 | 华润数字科技有限公司 | Semantic role labeling method, device, equipment and medium |
CN116306663B (en) * | 2022-12-27 | 2024-01-02 | 华润数字科技有限公司 | Semantic role labeling method, device, equipment and medium |
CN118035712A (en) * | 2024-04-12 | 2024-05-14 | 数据空间研究院 | NLP-based data collection rule identification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106484664B (en) | Similarity calculating method between a kind of short text | |
CN107818081A (en) | Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling | |
CN104408173B (en) | A kind of kernel keyword extraction method based on B2B platform | |
CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
CN110377715A (en) | Reasoning type accurate intelligent answering method based on legal knowledge map | |
CN104361127B (en) | The multilingual quick constructive method of question and answer interface based on domain body and template logic | |
CN111931506B (en) | Entity relationship extraction method based on graph information enhancement | |
US20070106499A1 (en) | Natural language search system | |
CN109783806B (en) | Text matching method utilizing semantic parsing structure | |
CN105930452A (en) | Smart answering method capable of identifying natural language | |
CN113221567A (en) | Judicial domain named entity and relationship combined extraction method | |
CN111241294A (en) | Graph convolution network relation extraction method based on dependency analysis and key words | |
CN107562919B (en) | Multi-index integrated software component retrieval method and system based on information retrieval | |
CN110390006A (en) | Question and answer corpus generation method, device and computer readable storage medium | |
CN104484411A (en) | Building method for semantic knowledge base based on a dictionary | |
CN113962219A (en) | Semantic matching method and system for knowledge retrieval and question answering of power transformer | |
CN112328800A (en) | System and method for automatically generating programming specification question answers | |
CN112036178A (en) | Distribution network entity related semantic search method | |
CN111651569B (en) | Knowledge base question-answering method and system in electric power field | |
CN114997288A (en) | Design resource association method | |
Buchholz et al. | Applying a natural language dialogue tool for designing databases | |
CN112417170A (en) | Relation linking method for incomplete knowledge graph | |
Liu et al. | The extension of domain ontology based on text clustering | |
CN107818078B (en) | Semantic association and matching method for Chinese natural language dialogue | |
Dereje et al. | Sentence level Amharic word sense disambiguation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180320 |
|
RJ01 | Rejection of invention patent application after publication |