CN107818081A - Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling - Google Patents

Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling Download PDF

Info

Publication number
CN107818081A
CN107818081A CN201710876254.3A CN201710876254A CN107818081A CN 107818081 A CN107818081 A CN 107818081A CN 201710876254 A CN201710876254 A CN 201710876254A CN 107818081 A CN107818081 A CN 107818081A
Authority
CN
China
Prior art keywords
similarity
semantic
sentence
mrow
predicate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710876254.3A
Other languages
Chinese (zh)
Inventor
周俏丽
杨凤玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Aerospace University
Original Assignee
Shenyang Aerospace University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Aerospace University filed Critical Shenyang Aerospace University
Priority to CN201710876254.3A priority Critical patent/CN107818081A/en
Publication of CN107818081A publication Critical patent/CN107818081A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of sentence similarity appraisal procedure based on deep semantic model and semantic character labeling, text-string is mapped to the characteristic vector in low semantic space, the similarity between two sentences is measured using cosine similarity;Existing semantic role is retained, and other semantic roles are uniformly handled;According to the size of the similarity between predicate to pairing of the sentence to progress predicate, predicate matching pair is obtained, further obtains the similar calculated value between semantic role;Multiple semantic roles of each predicate in multiple predicates of one sentence are subjected to Semanteme collocation, the similarity of semantic role is calculated, the similarity that deep semantic model is calculated and the similarity two parts calculated based on semantic role carry out final similarity of the linear combination as sentence.The present invention combines semantic role, and than Pearson correlation coefficient lifting 2.226%, the result than being ranked the first on SemEval2017 evaluation and tests official website is higher by 0.266%.

Description

Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling
Technical field
The present invention relates to a kind of natural language processing technique, is specially that one kind is based on deep semantic model and semantic role mark The sentence similarity appraisal procedure of note.
Background technology
Sentence similarity (Sentence Similarity Computing) is to measure the semantic equivalence between two sentences Property, it is research work particularly significant in natural language processing field and compared with based on.Such as in Case-based design In calculated by sentence similarity match similar sentence be used as the candidate collection translated, in automatically request-answering system problem with answering The matching of case, in information filtering, for rejecting possible junk information, in automatic abstract by similarity abstracting sentence Son, in classification or cluster, for judging classification of sentence or document etc..
The similarity based method of research sentence has the method matched based on morphology, word order that Lv Xueqiang et al. is proposed at present;The Qin Soldier et al. proposes the method based on keyword;Pan Qian is red et al. to propose the method based on On The Attribute Theory;The use that Li Bin et al. is proposed The method that semantic dependency calculates;The method based on skeleton dependency tree that fringe will side et al. proposes;The improvement that Che Wanxiang et al. is proposed The method of editing distance;Journey passes sentence similarity computational methods based on Hownet that roc et al. proposes etc..
Can be divided into by calculating the method for sentence similarity at present by three kinds:(1) method based on word feature, such as vector space mould Type, morphology, word order etc.;(3) method based on semanteme:Method such as based on semantic dictionary;(3) side based on syntactic analysis Method, the sentence similarity such as based on interdependent syntactic analysis calculate.
Method based on word feature has only used the surface layer information of sentence, for containing the vocabulary such as synonym, antonym Sentence cannot be handled well.Based on the method for semantic dictionary, solve to a certain extent based on word characterization method Deficiency, but this method depends on the completeness of semantic dictionary, have ignored the interaction relationship and sentence between sentence word Profound syntactic structure.And the method based on interdependent syntactic analysis can excavate the profound information of sentence, obtain sentence Institutional framework and word between dependence, but the method for the interdependent syntax used at present only make use of effective collocation of sentence It is right, it have ignored influence of other words to sentence similarity.
The content of the invention
Calculate and established in the frame using verb as core for the sentence similarity based on semantic character labeling in the prior art In the similarity of frame, the deficiencies of composition information that can not make full use of verb and its domination be present, the present invention, which proposes, to be based on The method that the sentence similarity of deep semantic model and semantic character labeling calculates, sentence structure, semantic level from sentence enter Row analysis.
In order to solve the above technical problems, the technical solution adopted by the present invention is:
A kind of sentence similarity appraisal procedure based on deep semantic model and semantic character labeling of the present invention, including it is following Step:
1) deep semantic model is established:By relatively short text-string be mapped to feature in low semantic space to Measure, after the semantic feature vector for obtaining each sentence, the similarity between two sentences is measured using cosine similarity;
2) semantic role classification is handled:The existing semantic role of A0, A1, A2 is retained, and other semantic roles are unified Handled as a kind of semantic role;A0, A1, A2 are disclosed semantic role mark;
3) predicate Similarity Measure:On semantic role basis of classification, for more predicate sentences, according to the phase between predicate Like degree size to sentence to carry out predicate pairing, obtain predicate matching pair, respectively for multiple predicate matchings to carry out language The calculating of adopted role, obtain the similar calculated value between semantic role;
4) sentence similarity based on semantic role calculates:According to similarity value calculation between semantic role by a sentence Multiple predicates in each predicate multiple semantic roles carry out Semanteme collocation, calculate the similarity of semantic role, that is, convert Similarity Measure between predicate and between identical semantic role;
5) sentence similarity calculates:The similarity that deep semantic model is calculated and the phase calculated based on semantic role Final similarity of the linear combination as sentence is carried out like degree two parts.
Deep semantic model includes word Hash layer, hidden layer and output layer three parts, and each layer of function is as follows It is shown:
l1=W1x (1)
li=f (Wili-1+bi), i=2 ..., N-1 (2)
Y=f (WNlN-1+bN) (3)
Wherein, x is input vector, and y is output vector, li, i=1 ..., N-1 are the output of hidden layer, WiRepresent the i-th power Weight, biI-th of biasing is represented, f (*) represents tanh activation primitives;
By word Hash layer generate characteristic vector projected by hidden layer, and output layer formed semantic feature to Amount;
After the semantic feature vector of each sentence is obtained, measured using cosine similarity between two sentences Semantic Similarity.
Predicate matching method is as follows:
The similarity for calculate in sentence A in i-th of predicate and sentence B between j-th of predicate by DSSM models is Sij, obtain similarity matrix N between any two between two sentence predicates:
Wherein n, m are respectively the number of predicate in two sentences;
The specific algorithm of predicate pairing is as follows:
301) all elements in row searching matrix N are pressed, find the maximum element of similarity, as sentence A and sentence B the One predicate matching pair;
302) row and column corresponding to the maximum element of similarity is deleted, ensures each predicate with another predicate only One pairing;
303) remaining element is put together the matrix N new as onei, whether element is empty in judgement, if so, then Predicate pairing terminates, and otherwise continues executing with step 301), until all predicates all find unique pairing predicate.
When searching predicate matching pair, evaluate and test the testing material given in official website for SemEval2017 and only looked into from matrix N Maximum preceding 4 predicates of similarity are looked for, if the row or column of matrix searches predicate matching pair less than 4 dimensions according to actual conditions.
Sentence similarity based on semantic role calculates:By multiple languages of each predicate in multiple predicates of a sentence Adopted role carries out Semanteme collocation, calculates the similarity of semantic role, that is, is converted between predicate and between identical semantic role Similarity Measure, be specially:
If A, the maximum predicate of similarity is respectively A in B sentences1、B1, determine for the similarity of each predicate matching pair Justice is:
Wherein, n, m are respectively the number of the semantic role of corresponding predicate in sentence A and sentence B,For semanteme Role ri、rjBetween similarity, S (VA1,VB1) represent two predicates between similarity, α be semantic role similarity complete Shared proportion in sentence;
Similarity based on semantic role is defined as:
Wherein, the matching pair of count (V) predicates between sentence A, B, ∑ S (Ai,Bj) it is each predicate in formula (4) The sum of the corresponding semantic role similarity of pairing.
In sentence similarity calculation procedure, it is as the final similarity of sentence using the progress linear combination of above-mentioned two parts:
The similarity calculated based on DSSM models is designated as S1, S will be designated as based on the similarity that semantic role calculates2, Then the similarity of sentence is:
S (A, B)=β × S1+(1-β)×S2 (6)
β is represented based on DSSM sentence similarity in weight shared by the final similarity of sentence in above formula.
The invention has the advantages that and advantage:
1. the present invention proposes the method calculated based on the sentence similarity of deep semantic model and semantic character labeling, from Sentence structure, the semantic level of sentence are analyzed, and also utilize the predicate in upper sentence and its composition information dominated.
2. the inventive method makes full use of sentence structure information, predicate information, on baselines experiment basis, With reference to semantic role, Pearson correlation coefficient improves 2.226%, than the knot to be ranked the first on SemEval2017 evaluation and tests official website Fruit is higher by 0.266%, and (SemEval is Semantic Evaluation abbreviation, is to carry out a system to calculating semantic parsing system The assessment of row, mainly inquire into the essence of meaning in language.Task one is that sentence similarity is calculated in SemEval2017 Assessment).
Brief description of the drawings
Fig. 1 is the inventive method flow chart;
Fig. 2 is the DSSM models that the inventive method is related to.
Embodiment
With reference to Figure of description, the present invention is further elaborated.
Such as Fig. 1 institutes method, a kind of sentence similarity assessment side based on deep semantic model and semantic character labeling of the present invention Method, comprise the following steps:
1) deep semantic model is established:By relatively short text-string be mapped to feature in low semantic space to Measure, after the semantic feature vector for obtaining each sentence, the semantic phase between two sentences is measured using cosine similarity Like degree;
2) semantic role classification is handled:A0, A1, A2 (A0, A1, A2 are that disclosed semantic role identifies) existing semantic angle Color is retained, and other semantic roles are handled collectively as a kind of semantic role;
3) predicate Similarity Measure:On semantic role basis of classification, for more predicate sentences, according to the phase between predicate Like degree size to sentence to carry out predicate pairing, obtain predicate matching pair, respectively for multiple predicate matchings to carry out language The calculating of adopted role, obtain the similar calculated value between semantic role;
4) sentence similarity based on semantic role calculates:By in multiple predicates of a sentence on the basis of step 3) Multiple semantic roles of each predicate carry out Semanteme collocation, calculate the similarity of semantic role, that is, be converted between predicate with And the Similarity Measure between identical semantic role;
5) sentence similarity calculates:The similarity that DSSM models are calculated and the similarity calculated based on semantic role Two parts carry out final similarity of the linear combination as sentence.
The present invention proposes the sentence similarity based on semantic character labeling and calculated, using different semantic roles to be substantially single Member, consider multiple semantic verbs and the similarity of role in sentence.
In step 1), deep semantic model (Deep Structured Model, abbreviation DSSM) is as shown in Fig. 2 be a kind of Technology based on deep learning, it is mainly used in the semantic understanding of text, it is by relatively short text-string (such as sentence) The characteristic vector being mapped in low semantic space.These vectors can be used for document by comparing the similitude of document and inquiry Retrieval, this method are used for the result of file retrieval better than other.
DSSM represents a sentence in semantic vector space using typical deep neutral net (DNN) architecture (document).DNN is to be used as input using bag of words vector, and DSSM is subtracted using a kind of new word Hash (word harshing) The dimension of few bag of words vector.Word Hash is that the beginning and end of each word is added into a " # " respectively, then with three characters For a unit, the input as network.Such as word " cat ", beginning, ending become " #cat# " plus " # " respectively, three characters For a unit, become " #ca ", " cat ", " at# ".Represent word by this way, share 3073 kinds of situations, then by this 3073 Kind representation is expressed as input of the form of vector as neutral net.
DSSM models mainly include three parts, are respectively:Word Hash layer, hidden layer, the function of each layer of output layer are as follows It is shown:
l1=W1x (1)
li=f (Wili-1+bi), i=2 ..., N-1 (2)
Y=f (WNlN-1+bN) (3)
X represents input vector, and y represents output vector, li, the output of i=1 ..., N-1 expression hidden layers, WiRepresent i-th Weight, biI-th of biasing is represented, f () represents tanh activation primitives.The characteristic vector generated by word Hash layer is entered by hidden layer Row projection, and form semantic feature vector in output layer.After the semantic feature vector of each sentence is obtained, cosine is utilized Similarity measures the Semantic Similarity between two sentences.Except that can calculate the similarity between sentence, the model may be used also To calculate the similarity between word.The DSSM models that Fig. 2 is represented, wherein Q represent a question sentence, and D represents to treat candidate sentence subset Close, R represents the cosine similarity between two vectors, and P represents to choose the probability of some sentence in candidate sentences set.At this During invention sentence similarity calculates, D only selects a sentence, and need not calculate probable value P.
In step 2), semantic role classification processing, different syntax theory system has the classification of different semantic roles, What Meng Cong etc. was compiled《Verb usage dictionary》Noun object is divided into 14 classes, Li Linding by the case relation of itself and verb《The modern Chinese Sentence-type》In divided 21 classes etc..The species of semantic role is various, but because sentence similarity research is sentencing in simulation people Disconnected process, therefore semantic role can be subjected to classification processing.The present invention is retained original semantic role A0, A1, A2, Other semantic angles are handled collectively as a kind of semantic role, labeled as o_srl.
In step 3), predicate Similarity Measure a, it is generally the case that sentence can contain multiple predicates, if by two All predicates in sentence carry out the calculating of similarity two-by-two, and complexity is very big when can not only make the long sentence of more predicates, Er Qiehui Make experimental result influenced to different extents.Therefore more predicate problems are directed to, the present invention is proposed according to the phase between predicate Like degree size to sentence to carry out predicate pairing.
If the similarity in sentence A in i-th of predicate and sentence B between j-th of predicate is Sij, two sentences can be obtained Similarity matrix N between predicate between any two (similarity between predicate is calculated by DSSM models):
Wherein m, n are respectively the number of predicate in two sentences, and the specific algorithm of predicate pairing is as follows:
301) all elements in row searching matrix N are pressed, find the maximum element of similarity, as sentence A and sentence B the One predicate matching pair;
302) delete step 301) in the maximum element of the similarity that finds corresponding to row and column, that is, ensure each Predicate only uniquely matches with another predicate;
303) remaining element in step 302) is put together the matrix N new as one, element is in judgment matrix N No is sky, if so, then predicate pairing terminates, step 301) is otherwise continued executing with, until all predicates all find unique pairing Predicate.
P (p=min (n, m)) can be found to predicate matching pair by above-mentioned method, respectively for this p to carrying out The calculating of semantic role.
The present invention is improved predicate similarity and semantic role similarity integration algorithm, and major part obtains similarity Value obtains Similarity value to secondary part and plays a part of restriction.Therefore, if the predicate similarity-rough set in two sentences is low, that The similarity of semantic role corresponding to predicate is played a part of also reducing for the overall similarity of sentence, therefore is being looked into When looking for predicate matching pair, evaluate and test the testing material given in official website for SemEval2017 and only search similarity most from matrix N Big preceding 4 predicates, if the row or column of matrix searches predicate matching pair less than 4 dimensions according to actual conditions.
The sentence similarity based on semantic role calculates in step 4), and a sentence typically contains multiple predicates, each Predicate typically also includes multiple semantic roles, and predicate and this structure of semantic role are referred to as Semanteme collocation by the present invention.Calculate The similarity of semantic role is to be converted into the Similarity Measure between predicate and between identical semantic role, such as:
Sentence A:
A man is shaved in front of a lecture hall
Semantic role analysis result be:
[A0 A man]is[V shaved][o_srl in front of a lecture hall]
Sentence B:
A man is sitting in the grass
Semantic role analysis result be:
[A0 A man]is[V sitting][o_srl in the grass]
[shaved, A0, A man] in sentence A can be used as a language with [sitting, A0, the A man] in sentence B Justice collocation is to carrying out Similarity Measure, the similarity between as semantic role A0.Similarly, [shaved, o_srl, in front Of a lecture hall] and [sitting, o_srl, in the grass] semantic role for calculating it is semantic as other Similarity between role.
It is defined as (by taking the maximum predicate of similarity in A, B sentence as an example) for the similarity of each predicate matching pair:
In above formula, n, m are respectively the number of the semantic role of corresponding predicate in sentence A and sentence B,For language Adopted role ri、rjBetween similarity, S (VA1,VB1) represent two predicates between similarity, α be semantic role similarity exist Shared proportion in full sentence.Similarity between semantic role is calculated with DSSM models.
Similarity based on semantic role is defined as:
The matching pair of count (V) predicates between sentence A, B, ∑ S (A in above formulai,Bj) be formula (4) in all predicates The sum of corresponding semantic role similarity.
The result that sentence similarity calculates in step 5) is made up of two parts:The similarity that is calculated based on DSSM models and The similarity calculated based on semantic role, above-mentioned two parts are subjected to final similarity of the linear combination as sentence.
The similarity calculated based on DSSM models is designated as S1, S will be designated as based on the similarity that semantic role calculates2, Then the similarity of sentence is:
S (A, B)=β × S1+(1-β)×S2 (6)
β is represented based on DSSM sentence similarity in weight, value 0.6 shared by the final similarity of sentence in above formula.
As shown in figure 1, input sentence pair, divides two parts to carry out the calculating of sentence similarity, a part is based on DSSM sentences Similarity Measure, another part are to carry out sentence similarity calculating, most two-part experimental result line at last based on semantic role Property combination as final sentence similarity.
Such as:
Sentence A:
[A0 A man]is[V shaved][o_srl in front of a lecture hall]
Sentence B:
[A0 A man]is[V sitting][o_srl in the grass]
Predicate matching to for:shaved、sitting
The corresponding semantic semantic role of predicate is:
A0:A man A man
o_srl:in front of a lecture hall in the grass
Above-mentioned predicate matching carries out the calculating of similarity to, semantic role respectively, and the result of calculating is carried out into linear group Close, obtain the sentence similarity S based on semantic character labeling1
Two sentences in above-mentioned example are subjected to sentence phase using the existing instrument sent2vec based on DSSM models Like the calculating of degree, S is designated as2
Above-mentioned two direction is carried out to the result S of Similarity Measure1、S2It is similar as final sentence to carry out linear combination Degree.
It is in the upper experimental result of SemEval2017 language materials:
The experimental result of table 1
The experimental result that baseline experiments are drawn based on DSSM models.Semantic character labeling selected by the present invention Instrument reached 88.25% in the test_wsj data set F values in CoNLL2005Shared Task.
On baselines experiment basis, with reference to semantic role, Pearson correlation coefficient improves 2.226%, than The result of (ruthva) of being ranked the first on SemEval2017 evaluation and tests official website is higher by 0.266%.The result of semantic role identification is entered Go and partly identified wrong amendment, the Pearson correlation coefficient after correcting has reached 0.85936, improved than baseline 2.416%.Experimental result, which illustrates semantic role identification to be dissolved into the calculating of sentence similarity, can make up current method and exist The defects of using in terms of semantic information, and finally lift the result of calculation of sentence similarity.

Claims (6)

  1. A kind of 1. sentence similarity appraisal procedure based on deep semantic model and semantic character labeling, it is characterised in that including with Lower step:
    1) deep semantic model is established:Text-string is mapped to the characteristic vector in low semantic space, obtains each sentence Semantic feature vector after, measure the similarity between two sentences using cosine similarity;
    2) semantic role classification is handled:The existing semantic role of A0, A1, A2 is retained, other semantic roles collectively as A kind of semantic role is handled;A0, A1, A2 are disclosed semantic role mark;
    3) predicate Similarity Measure:On semantic role basis of classification, for more predicate sentences, according to the similarity between predicate Size to sentence to carrying out the pairing of predicate, predicate matching pair is obtained, respectively for multiple predicate matchings to carrying out semantic angle The calculating of color, obtain the similar calculated value between semantic role;
    4) sentence similarity based on semantic role calculates:According to similarity value calculation between semantic role by the more of sentence Multiple semantic roles of each predicate carry out Semanteme collocation in individual predicate, calculate the similarity of semantic role, that is, are converted into meaning Similarity Measure between word and between identical semantic role;
    5) sentence similarity calculates:The similarity that deep semantic model is calculated and the similarity calculated based on semantic role Two parts carry out final similarity of the linear combination as sentence.
  2. 2. the sentence similarity appraisal procedure according to claim 1 based on deep semantic model and semantic character labeling, It is characterized in that:Deep semantic model includes word Hash layer, hidden layer and output layer three parts, and each layer of function is such as Shown in lower:
    l1=W1x (1)
    li=f (Wili-1+bi), i=2 ..., N-1 (2)
    Y=f (WNlN-1+bN) (3)
    Wherein, x is input vector, and y is output vector, li, i=1 ..., N-1 are the output of hidden layer, WiThe i-th weight is represented, biI-th of biasing is represented, f (*) represents tanh activation primitives;
    The characteristic vector generated by word Hash layer is projected by hidden layer, and forms semantic feature vector in output layer;
    After the semantic feature vector of each sentence is obtained, the semanteme between two sentences is measured using cosine similarity Similitude.
  3. 3. the sentence similarity appraisal procedure according to claim 1 based on deep semantic model and semantic character labeling, It is characterized in that:Predicate matching method is as follows:
    The similarity for calculate in sentence A in i-th of predicate and sentence B between j-th of predicate by DSSM models is Sij, obtain To the similarity matrix N between two sentence predicates between any two:
    <mrow> <mi>N</mi> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msub> <mi>S</mi> <mn>11</mn> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>S</mi> <mrow> <mn>1</mn> <mi>m</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mo>...</mo> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <mo>...</mo> </mtd> </mtr> <mtr> <mtd> <msub> <mi>S</mi> <mrow> <mi>n</mi> <mn>1</mn> </mrow> </msub> </mtd> <mtd> <mo>...</mo> </mtd> <mtd> <msub> <mi>S</mi> <mrow> <mi>n</mi> <mi>m</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> </mrow>
    Wherein n, m are respectively the number of predicate in two sentences;
    The specific algorithm of predicate pairing is as follows:
    301) all elements in row searching matrix N are pressed, the maximum element of similarity are found, as sentence A and sentence B first Predicate matching pair;
    302) row and column corresponding to the maximum element of similarity is deleted, ensures that each predicate is only uniquely matched somebody with somebody with another predicate It is right;
    303) remaining element is put together the matrix N new as onei, whether element is empty in judgement, if so, then predicate Pairing terminates, and otherwise continues executing with step 301), until all predicates all find unique pairing predicate.
  4. 4. the sentence similarity appraisal procedure according to claim 3 based on deep semantic model and semantic character labeling, It is characterized in that:When searching predicate matching pair, the testing material given in official website is evaluated and tested only from matrix N for SemEval2017 It is middle to search maximum preceding 4 predicates of similarity, if the row or column of matrix searches predicate matching less than 4 dimensions according to actual conditions It is right.
  5. 5. the sentence similarity appraisal procedure according to claim 1 based on deep semantic model and semantic character labeling, It is characterized in that:Sentence similarity based on semantic role calculates:By in multiple predicates of a sentence each predicate it is more Individual semantic role carries out Semanteme collocation, calculates the similarity of semantic role, that is, is converted between predicate and identical semantic role Between Similarity Measure, be specially:
    If A, the maximum predicate of similarity is respectively A in B sentences1、B1, it is defined as the similarity of each predicate matching pair:
    <mrow> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>B</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>&amp;alpha;</mi> <mo>&amp;times;</mo> <mfrac> <mrow> <mi>&amp;Sigma;</mi> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>r</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>r</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>,</mo> <mi>m</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&amp;alpha;</mi> <mo>)</mo> </mrow> <mo>&amp;times;</mo> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mrow> <mi>A</mi> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>V</mi> <mrow> <mi>B</mi> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
    Wherein, n, m are respectively the number of the semantic role of corresponding predicate in sentence A and sentence B,For semantic role ri、rjBetween similarity, S (VA1,VB1) represent two predicates between similarity, α be semantic role similarity in full sentence Shared proportion;
    Similarity based on semantic role is defined as:
    <mrow> <mi>S</mi> <mrow> <mo>(</mo> <mi>A</mi> <mo>,</mo> <mi>B</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>&amp;Sigma;</mi> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>c</mi> <mi>o</mi> <mi>u</mi> <mi>n</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>V</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>
    Wherein, the matching pair of count (V) predicates between sentence A, B, ∑ S (Ai,Bj) it is each predicate matching pair in formula (4) The sum of corresponding semantic role similarity.
  6. 6. the sentence similarity appraisal procedure according to claim 1 based on deep semantic model and semantic character labeling, It is characterized in that in sentence similarity calculation procedure, above-mentioned two parts are subjected to final similarity of the linear combination as sentence For:
    The similarity calculated based on DSSM models is designated as S1, S will be designated as based on the similarity that semantic role calculates2, then sentence Son similarity be:
    S (A, B)=β × S1+(1-β)×S2 (6)
    β is represented based on DSSM sentence similarity in weight shared by the final similarity of sentence in above formula.
CN201710876254.3A 2017-09-25 2017-09-25 Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling Pending CN107818081A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710876254.3A CN107818081A (en) 2017-09-25 2017-09-25 Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710876254.3A CN107818081A (en) 2017-09-25 2017-09-25 Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling

Publications (1)

Publication Number Publication Date
CN107818081A true CN107818081A (en) 2018-03-20

Family

ID=61607137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710876254.3A Pending CN107818081A (en) 2017-09-25 2017-09-25 Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling

Country Status (1)

Country Link
CN (1) CN107818081A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765240A (en) * 2019-10-31 2020-02-07 中国科学技术大学 Semantic matching evaluation method for multiple related sentence pairs
CN112559713A (en) * 2020-12-24 2021-03-26 北京百度网讯科技有限公司 Text relevance judgment method and device, model, electronic equipment and readable medium
CN113609304A (en) * 2021-07-20 2021-11-05 广州大学 Entity matching method and device
CN115062619A (en) * 2022-08-11 2022-09-16 中国人民解放军国防科技大学 Chinese entity linking method, device, equipment and storage medium
CN116306663A (en) * 2022-12-27 2023-06-23 华润数字科技有限公司 Semantic role labeling method, device, equipment and medium
CN118035712A (en) * 2024-04-12 2024-05-14 数据空间研究院 NLP-based data collection rule identification method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103562907A (en) * 2011-05-10 2014-02-05 日本电气株式会社 Device, method and program for assessing synonymous expressions

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103562907A (en) * 2011-05-10 2014-02-05 日本电气株式会社 Device, method and program for assessing synonymous expressions

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
PO-SEN HUANG ET AL.: "Learning Deep Structured Semantic Models for Web Search using Clickthrough Data", 《CIKM’13》 *
张丹 等: "引入层次成分分析的依存句法分析", 《沈阳航空航天大学学报》 *
李茹 等: "基于框架语义分析的汉语句子相似度计算", 《计算机研究与发展》 *
田堃 等: "基于语义角色标注的汉语句子相似度算法", 《中文信息学报》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765240A (en) * 2019-10-31 2020-02-07 中国科学技术大学 Semantic matching evaluation method for multiple related sentence pairs
CN110765240B (en) * 2019-10-31 2023-06-20 中国科学技术大学 Semantic matching evaluation method for multi-phase sentence pairs
CN112559713A (en) * 2020-12-24 2021-03-26 北京百度网讯科技有限公司 Text relevance judgment method and device, model, electronic equipment and readable medium
CN112559713B (en) * 2020-12-24 2023-12-01 北京百度网讯科技有限公司 Text relevance judging method and device, model, electronic equipment and readable medium
CN113609304A (en) * 2021-07-20 2021-11-05 广州大学 Entity matching method and device
CN113609304B (en) * 2021-07-20 2023-05-23 广州大学 Entity matching method and device
CN115062619A (en) * 2022-08-11 2022-09-16 中国人民解放军国防科技大学 Chinese entity linking method, device, equipment and storage medium
CN115062619B (en) * 2022-08-11 2022-11-22 中国人民解放军国防科技大学 Chinese entity linking method, device, equipment and storage medium
CN116306663A (en) * 2022-12-27 2023-06-23 华润数字科技有限公司 Semantic role labeling method, device, equipment and medium
CN116306663B (en) * 2022-12-27 2024-01-02 华润数字科技有限公司 Semantic role labeling method, device, equipment and medium
CN118035712A (en) * 2024-04-12 2024-05-14 数据空间研究院 NLP-based data collection rule identification method

Similar Documents

Publication Publication Date Title
CN106484664B (en) Similarity calculating method between a kind of short text
CN107818081A (en) Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling
CN104408173B (en) A kind of kernel keyword extraction method based on B2B platform
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN110377715A (en) Reasoning type accurate intelligent answering method based on legal knowledge map
CN104361127B (en) The multilingual quick constructive method of question and answer interface based on domain body and template logic
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
US20070106499A1 (en) Natural language search system
CN109783806B (en) Text matching method utilizing semantic parsing structure
CN105930452A (en) Smart answering method capable of identifying natural language
CN113221567A (en) Judicial domain named entity and relationship combined extraction method
CN111241294A (en) Graph convolution network relation extraction method based on dependency analysis and key words
CN107562919B (en) Multi-index integrated software component retrieval method and system based on information retrieval
CN110390006A (en) Question and answer corpus generation method, device and computer readable storage medium
CN104484411A (en) Building method for semantic knowledge base based on a dictionary
CN113962219A (en) Semantic matching method and system for knowledge retrieval and question answering of power transformer
CN112328800A (en) System and method for automatically generating programming specification question answers
CN112036178A (en) Distribution network entity related semantic search method
CN111651569B (en) Knowledge base question-answering method and system in electric power field
CN114997288A (en) Design resource association method
Buchholz et al. Applying a natural language dialogue tool for designing databases
CN112417170A (en) Relation linking method for incomplete knowledge graph
Liu et al. The extension of domain ontology based on text clustering
CN107818078B (en) Semantic association and matching method for Chinese natural language dialogue
Dereje et al. Sentence level Amharic word sense disambiguation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180320

RJ01 Rejection of invention patent application after publication