CN106844331A - Sentence similarity calculation method and system - Google Patents

Sentence similarity calculation method and system Download PDF

Info

Publication number
CN106844331A
CN106844331A CN201611143723.2A CN201611143723A CN106844331A CN 106844331 A CN106844331 A CN 106844331A CN 201611143723 A CN201611143723 A CN 201611143723A CN 106844331 A CN106844331 A CN 106844331A
Authority
CN
China
Prior art keywords
text
shallow
syntax tree
layer syntax
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611143723.2A
Other languages
Chinese (zh)
Inventor
杨萌
李培峰
朱巧明
周国栋
朱晓旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201611143723.2A priority Critical patent/CN106844331A/en
Publication of CN106844331A publication Critical patent/CN106844331A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a sentence similarity calculation method and a sentence similarity calculation system, which use structural features to express the similarity of sentences. The invention obtains the structural characteristics suitable for sentence similarity calculation through proper modification on the basis of the shallow syntax tree, and combines the structural characteristics with the plane characteristics to calculate the sentence similarity.

Description

A kind of sentence similarity computational methods and system
Technical field
The present invention relates to natural language processing field, more particularly to a kind of sentence similarity computational methods and system.
Background technology
Similarity Measure is the element task of natural language processing.Current sentence similarity computational methods mainly have 4 classes, point Be not word-based overlap method, based on corpus statistics method, based on philological method and mixed method.
The method of word-based overlap is the phase for calculating sentence by some vocabularies common to two sentences with a group Like the measure of degree.Jacob etc. [4] propose Jaccard Similar operators, the method calculate two sentences in word occur simultaneously with The ratio of word union calculates the similarity of sentence in two sentences.Metzler etc. [5] uses inverse document frequency (IDF) conduct The weight of the word occurred in two sentences, improves result of calculation.Banerjee etc. [6] phrase-based length and they The characteristics of frequency of use is distributed in Zipfian designs phrase-based sentence similarity computational methods.
The set of words that sentence centering occurs is used as feature set by the method based on corpus, will be based on corpus The cosine angle value of vector is used as similarity.Landauer etc. [7] is united by analyzing a large-scale database for natural language The TF-IDF values for counting keyword form sentence semantics vector, and sentence semantic similarity is calculated with the cosine angle of vector.Lund Obtain high-dimensional vector space to calculate sentence or short Documents Similarity Deng the co-occurrence between [8] statistics vocabulary.
The similarity of sentence is determined using the semantic relation and its grammatical item between vocabulary based on philological method. Kashyap etc. [9] is based on the similarity between word semantic similarity measurement sentence, it is considered to which there are word different separating capacities to come Carry out the similarity calculating method of sentence vector.Malik etc. [10] is by the summation of the similarity between the word for constituting sentence pair Maximum normalizes income value as sentence similarity value by sentence length.
Mixed method is the mixed method based on above method.Chukfong etc. [11-14] is based on various method realities above Existing sentence similarity is calculated.
The sentence similarity evaluation work for being now based on structured representation is fewer, and Aliaksei [15] proposes a kind of base In the computational methods that simple structure is represented.
Existing sentence similarity calculates patent:
A kind of similarity calculating method and device based on semanteme:This invention provides a kind of based on semantic similarity meter Method and apparatus are calculated, wherein method includes:Obtain sentence S1 and S2 to be compared;Participle is carried out to the S1 and S2 respectively;It is right The word that there is Semantic mapping in each word obtained after the participle is mapped as normalized statement;Calculate through step C treatment Similarity Sim (S1, S2) between rear S1 and S2.The present invention is returned by the way that the word that there is Semantic mapping in sentence is mapped to One statement changed, and the calculating of similarity is incorporated, it is not only so as to the similarity between semantically sentence is embodied Literal similarity degree, improves the accuracy of similarity between calculating sentence.
Sentence similarity computational methods and device:This invention provide a kind of degree of accuracy sentence similarity computational methods high and Device.The sentence similarity computational methods, including:For the first sentence and the second sentence determine repetitor, the first orphan deposit word and Second orphan deposits word, wherein, repetitor had not only belonged to the first sentence but also had belonged to the second sentence, and the first orphan deposits word and only belongs to the first sentence, Second orphan deposits word and only belongs to the second sentence;Word is deposited according to all first orphans and all second orphans deposit word, calculate orphan and deposit Word similarity Total contribution margin G is total, wherein, G is total >=0, and all first orphans to deposit the similarity degree that word and all second orphans deposited between word higher, G total values are bigger;According to formula calculating SIM (A, B), the sentence of wherein SIM (A, B) the first sentences of expression and the second sentence is similar Degree, GAlwaysRepresent corresponding first vector of the first sentence, GAlwaysRepresent corresponding second vector of the second sentence.
The computational methods and system of a kind of sentence similarity:This invention provides a kind of sentence similarity computational methods and System, by using word2vec algorithms, the corpus to pre-building is trained, obtain all words in corpus to Amount;Two sentences to similarity to be calculated carry out Word Intelligent Segmentation, and first sentence and second are found out from corpus Vector in sentence corresponding to each participle, calculates the phase between each participle of the first sentence and the second sentence each participle successively Like degree;The two component set of words that the similarity between participle exceedes predetermined threshold are obtained, and according to every component lexeme in sentence The side-play amount of sub- position, calculates the contribution margin per component word similarity in whole sentence;By the contribution of participle in two sentences Value is added, and obtains the similarity between sentence.
Existing most of sentence similarity computational methods represent a pair of sentences using a large amount of planimetric shape outline features Similarity degree.It is that its is representational weaker to the problem of similarity to represent sentence using only plane characteristic vector.
Some newest similarity calculating methods, the knowledge (Wiki hundred for depending on the collocation of word and being obtained from big data Section etc.) Similarity Measure is carried out, do not consider the structured messages such as sentence syntax.Assuming that two sentences S1 and S2 are given, these Method can typically do following treatment:The first step, each word in S1 will match somebody with somebody with S2 with its similarity highest word It is right.Similarity between second step, all of pairing word adds up, and carries out standardization processing to similarity by the way that the sentence of S1 is long, enters And obtain the similarity of sentence S1 and S2.
Now analyze a pair of sentence S1:Tigers hit lions S2:Lions hit tigers.By mentioned above Method, each word in S1 can find the high word of similarity (being identical word in this example) pairing in S2, So as to Similarity Measure result will be considered that then two sentence implications are identical.As shown in figure 1, the dependency tree of analysis S1, S2 is obtained Agent and the word denoting the receiver of an action person for going out them are reverse.Although the word occurred in two sentences is identical, by analyzing its interdependent pass System tree, it can be deduced that their implication is simultaneously different.
The structured messages such as syntactic structure are very important information in natural language processing application.But, how each Using structured message but it is common problem in kind of task.When using plane characteristic vector representation structured features, When structured features are converted into plane characteristic, may lost part effective information.
In view of above-mentioned defect, the design people is actively subject to research and innovation, in the calculating represented based on simple structure On the basis of method, propose a kind of new structured representation method, calculated for sentence similarity, with embody Sentence Grammar, semanteme, Dependence.
Term is explained:
Pearson correlation coefficient (Pearson Correlation Coefficient):For measuring two variable Xs and Y Between correlation (linear correlation), its value is between -1 and 1.In natural science field, the coefficient is widely used in measurement two Degree of correlation between individual variable.
Support vector regression model (Support Vector Regression, abbreviation SVR):Mainly after rising dimension, Construct linear decision function in higher dimensional space to realize linear regression, during function insensitive with e, it is unwise that its basis is mainly e Sense function and Kernels.If the Mathematical Modeling that will be fitted expresses a certain curve of hyperspace, according to the insensitive functions of e The result of gained, is exactly " the e pipelines " for including the curve and training points.In all sample points, only it is distributed on " tube wall " That a part of sample point determine the position of pipeline.This part of training sample is referred to as " supporting vector ".It is adaptation training sample What is collected is non-linear, and traditional approximating method is typically behind linear equation plus higher order term.This method is really effective, but thus increases Adjustable parameter increased the risk of over-fitting rather.Support vector regression algorithm solves this contradiction using kernel function.Use core Function can make original linear algorithm " non-linearization " instead of the linear term in linear equation, can do nonlinear regression.With This introduces the purpose that kernel function has reached " rise dimension " simultaneously, and increased adjustable parameter to be over-fitting can still control.
Kernel method (Kernel Methods):A mapping from lower dimensional space to higher dimensional space is imply, and this reflects Penetrating can become linear separability linear inseparable two classes point in lower dimensional space.For SVMs.
Tree kernel method (Tree Kernel Methods):By directly calculating two entity relationship objects (i.e. syntax tree) The number of identical subtree compare similarity.
Name Entity recognition (Named Entity Recognition, abbreviation NER):Also referred to as " proper name identification ", refer to There is the entity of certain sense, mainly including name, place name, mechanism's name, proper noun etc. in identification text.
WordNet:It is by the psychologist of Princeton University the one of linguist and Computer Engineer's co-design Plant the English dictionary based on cognitive linguistics.It is not that light alphabetically arranges word, and according to the meaning group of word Into one " network of word ".It is the broad English glossary semantic net of a coverage.Noun, verb, adjective and pair Word is each organized into a network for synonym, and each TongYiCi CiLin represents a basic semantic concept, and this Also connected by various relations between a little set.
Tree:Dendrogram is a kind of data structure, and it is by n (n>=1) individual limited node constitutes one has hierarchical relationship Set.It is called " tree " because it looks like a projecting tree, that is to say, that it be root upward, and leaf is down 's.The characteristics of it has following:Each node has zero or more child node;There is no the node referred to as root node of father node;Often One non-root node has and only one of which father node;In addition to root node, each child node can be divided into multiple disjoint sons Tree.
N-gram models:N-Gram is a kind of language model commonly used during big vocabulary is continuously recognized, model utilizes context In collocation information between adjacent word, the sentence with maximum probability can be calculated.
Bibliography
[1] Culotta A, Sorensen J.Dependency tree kernels for relation Extraction [C] //Meeting of the Association for Computational Linguistics, 21- 26July, 2004, Barcelona, Spain.2010:423--429.
[2] Bunescu R C, Mooney R J.A shortest path dependency kernel for relation extraction[C]//Conference on Human Language Technology and Empirical Methods in Natural Language Processing.Association for Computational Linguistics, 2005:724-731.
[3] Zhang M, Zhang J, Su J, et al.A Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features.[C]//International Conference on Computational Linguistics and Meeting of the Association for Computational Linguistics, 17-21 July.2006, Sydney, Australia.
[4] Jacob B, Benjamin C.Calculating the Jaccard Similarity Coefficient with Map Reduce for Entity Pairs in Wikipedia[OL].http:// Www.infosci.cornell.edu/weblab/papers/Bank2008.pdf, 2008
[5] Metzler D, Bernstein Y, Croft W B, et al.Similarity measures for tracking information flow[C]//ACM CIKM International Conference on Information and Knowledge Management,Bremen,Germany,October 31-November.2005: 517-524.
[6] Banerjee S, Pedersen T.Extended gloss overlap as a measure of semantic relatedness[c]∥Proceedings of International Joint Conference on Artificial Intelligence.2003:805-810
[7] Landauer T K, Foltz P W, Laham D.Introduction to Latent Semantic Analysis [J] .Discourse Processes, 1998,25 (2/3):259-284
[8] Lund K, Burgess C.Producing high-dimensional semantic spaces from Lexical co-occurrence [J] .Behavior Research Methods, Instruments&Computers, 1996,28 (2):203-208
[9] Kashyap A, Han L, Yus R, et al.Robust semantic text similarity using LSA, machine learning, and linguistic resources [J] .Language Resources& Evaluation, 2016,50 (1):125-161.
[10] Malik R, Subramaniam L V, Kaushik S.Automatically Selecting Answer Templates to Respond to Customer Emails. [C] //IJCAI 2007, Proceedings of the, International Joint Conference on Artificial Intelligence, Hyderabad, India, January.2007:1659-1664.
[11] Jaffe E, Jin L, King D, et al.AZMAT:Sentence Similarity Using Associative Matrices[C]//International Workshop on Semantic Evaluation.2015.
[12] Haque R, Naskar S K, Way A, et al.Sentence Similarity-Based Source Context Modelling in PBSMT[C]//International Conference on Asian Language Processing, Ialp 2010, Harbin, Heilongjiang, China, 28-30December.2010:257-260.
[13] Li R, Li S, Zhang Z.The Semantic Computing Model of Sentence Similarity Based on Chinese FrameNet[C]//2009IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.IEEE Computer Society, 2009:255-258.
[14] Liu Y, Liu Q.Chinese Sentence Similarity Based on Multi-feature Combination [C] //Intelligent Systems, 2009.GCIS'09.WRI Global Congress on.2009: 14-19.
Severyn A, Nicosia M, Moschitti A.Learning Semantic Textual Similarity with Structural Representations[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics.Sofia,2013:714-718.
The content of the invention
In order to solve the above technical problems, it is an object of the invention to provide a kind of sentence similarity computational methods and system, being Based on shallow-layer syntax tree, the new syntax tree construction of proposition makes structured features more to show the information such as the syntactic-semantic of sentence, And the syntax tree is applied to sentence similarity calculating, to obtain good performance.
Sentence similarity computational methods of the invention, it is characterised in that including step:
S10, sentence is called to training text and sentence to all sentences in test text part-of-speech tagging, syntactic analysis, Name Entity recognition, WordNet identification facilities carry out part-of-speech tagging, syntactic analysis, name Entity recognition, WordNet knowledges respectively Huo get not part-of-speech tagging training text, phrase training text, name entity training text, WordNet training texts and part of speech mark Note test text, phrase test text, name entity test text, WordNet test texts,
Wherein, the sentence be to test text to training text and sentence every row contain two need calculate similarity The text of sentence;
S20, based on part-of-speech tagging training text, phrase training text, name entity training text, WordNet training texts This acquisition shallow-layer syntax tree training text,
Obtained based on part-of-speech tagging test text, phrase test text, name entity test text, WordNet test texts Obtain shallow-layer syntax tree test text;
S30, multiple plane characteristics are obtained to a pair of sentences of every row to training text based on sentence, obtain plane characteristic training Text, by plane characteristic training text, shallow-layer syntax tree training text is combined with sentence to artificial scoring training text obtains shallow Layer syntax tree features training text,
Multiple plane characteristics are obtained to a pair of sentences of every row to test text based on sentence, plane characteristic test text is obtained This, plane characteristic test text is combined with shallow-layer syntax tree test text and is obtained shallow-layer syntax tree characteristic test text;
S40, it is trained based on shallow-layer syntax tree features training text using SVR models, training pattern is obtained, by training Model and shallow-layer syntax tree characteristic test text obtain Similarity Measure resulting text.
Further, the detailed process of the step S10 is as follows:
S101, to sentence to all sentences in training text using part-of-speech tagging instrument (such as:Stanford Postagger the part of speech of each word in sentence) is obtained, corresponding part-of-speech tagging training text is obtained;
Same treatment is carried out to test text to sentence and obtains part-of-speech tagging test text;
S102, to sentence to all sentences in training text using syntactic analysis instrument (such as:Stanford parser) obtain The phrase belonging to each word is obtained, phrase training text is obtained;
Same treatment is carried out to test text to sentence and obtains phrase test text;
S103, based on sentence to training text using name Entity recognition instrument (such as SST-light tagger) obtain list Name Entity recognition result belonging to word, obtains name entity training text;
Same treatment is carried out to test text to sentence and obtains name entity test text;
S104, based on sentence to training text using WordNet identification facilities (such as:SST-light tagger) obtain single On WordNet belonging to word adopted (WNSS), if justice is represented with space on no WordNet, WordNet training texts are obtained;
Same treatment is carried out to test text to sentence and obtains WordNet test texts.
Further, the detailed process of the step S20 is as follows:
S201, according to part-of-speech tagging training text, be sentence to each the syntax shallow-layer syntax tree in training text, Obtain basic shallow-layer syntax tree training text;Basic shallow-layer syntax is obtained to test text and part-of-speech tagging test text by sentence Tree test text;
Wherein, shallow-layer syntax tree is the tree that depth is 3, and building method is as follows:Word in one sentence is generated as The leaf node of the bottom;Using the part of speech of each leaf node equivalent as each leaf node father node;Finally, set The father node of all of part of speech node is root node;
S202, according to phrase training text, be each syntax deeper in basic shallow-layer syntax tree training text The shallow-layer syntax tree of layer obtains phrase shallow-layer syntax tree training text;Phrase-based test text and basic shallow-layer syntax tree are tested Text obtains phrase shallow-layer syntax tree test text;
Wherein, more further shallow-layer syntax tree is the tree that depth is 4, and building method is as follows:By sentence phrase chunking knot Really, the information for belonging to same phrase word is obtained;It is connected on the part of speech father node of the word leaf node that will belong to same phrase Same chunker nodes;Contacting between root node and part of speech node is disconnected, chunker nodes are connected to corresponding part of speech Node;Finally, the father node for setting all of part of speech node is root node;
S203, phrase-based shallow-layer syntax tree training text, name entity training text and WordNet training texts are obtained Semantic shallow-layer syntax tree training text;Phrase-based shallow-layer syntax tree test text, name entity test text and WordNet are surveyed Examination text obtains semantic shallow-layer syntax tree test text;
Semantic shallow-layer syntax tree training text is to add semantic information on phrase shallow-layer syntax tree training text, specific side Method is as follows:If a word in phrase shallow-layer syntax tree training text is in name entity training text and WordNet training There is NER or WNSS information in text, the syntactic information of the chunker nodes comprising the word is modified as NER or WNSS information; If meeting above-mentioned situation containing multiple words in a phrase node, the NER and WNSS of last word in phrase are used Information;
S204, based on semantic shallow-layer syntax tree training text delete definite article and conjunction interdependent node, obtain prune shallow-layer Syntax tree training text;Definite article and conjunction interdependent node are deleted based on semantic shallow-layer syntax tree test text, is obtained and is pruned shallow Layer syntax tree test text;
The present invention reduces definite article and conjunction and their father's node (part of speech node) in structured representation, subtracts Few influence of the insignificant information to result of calculation, in shallow-layer syntax tree if the word that certain leaf node is represented be article or Conjunction, deletes the leaf node, and its father's node (part of speech node) and father father's node (chunker nodes);
S205, based on prune shallow-layer syntax tree training text, the shallow-layer syntax tree relevant portion of a pair of sentences is associated To obtain shallow-layer syntax tree training text;Based on shallow-layer syntax tree test text is pruned, by a pair of shallow-layer syntax tree phases of sentence Close partial association and get up to obtain shallow-layer syntax tree test text;
Wherein, the method for the corresponding shallow-layer syntax tree of a pair of sentences being associated:If certain word in two sentences Identical (leaf node is identical), obtains their father's node (part of speech node), grandparent node and be nonterminal node, on mark REL。
Further, the detailed process of the step S30 is as follows:
S301, based on sentence to training text obtain plane characteristic training text, test text is put down based on sentence Region feature test text;
Wherein, plane characteristic training text and plane characteristic test text are respectively sentence to training text and sentence to surveying The Similarity Measure plane characteristic of every a pair of sentences of row in examination text;
The present invention provides 11 plane characteristics:
1st, Longest Common Substring method result of calculation:Calculate the length of continuation character sequence most long;
2nd, longest common subsequence method result of calculation:There is no successional requirement compared to Longest Common Substring method, and Allow to calculate similarity in the case of the insertion of word or missing;
3rd, greedy string-concatenation method result of calculation:Allow the out of order portion to calculate common similar substring of text Point, the maximum length of each substring matching;
4th, 5,6, the 7 sentence editing distance result of calculation based on character n-gram:Assuming that there is a sentence s, then the word The N-Gram for according with string means that the word section obtained by length N cuttings original word, that is, all length is the substring of N in s;If Want, if two character strings, their N-Gram then to be sought respectively, then just from this angle of the quantity of their total substring Degree goes to define the N-Gram distances between two character strings, N=1,2,3,4, totally four features;
8,9,10, the 11 sentence editing distance result of calculations based on word n-gram:With feature 4,5,6,7 is similar, will be with Character is changed into word as segmentation unit for segmentation unit;
S302, shallow-layer syntax tree features training text is obtained by plane characteristic training text and shallow-layer syntax tree training text This;Shallow-layer syntax tree characteristic test text is obtained by plane characteristic test text and shallow-layer syntax tree test text;
Wherein, often row includes corresponding sentence for shallow-layer syntax tree features training text and shallow-layer syntax tree characteristic test text To ten plane characteristics and a pair of shallow-layer syntax tree features.
Further, the detailed process of the step S40 is as follows:
S401, using SVR obtain similarity calculation, entered in SVR models by shallow-layer syntax tree features training text Row training obtains training pattern;
S402, using training pattern and shallow-layer syntax tree characteristic test text as input, obtain similar using SVR instruments Degree result of calculation text;
Wherein, the numerical value that Similarity Measure resulting text is often gone corresponds to phase of the sentence to every a pair of the sentences of row of test text Like degree result of calculation.
Sentence similarity computing system of the invention, including:
- pretreatment module, part-of-speech tagging, sentence are called to sentence to training text and sentence to all sentences in test text Method analysis, name Entity recognition, WordNet identification facilities carry out respectively part-of-speech tagging, syntactic analysis, name Entity recognition, WordNet identifications obtain part-of-speech tagging training text, phrase training text, name entity training text, WordNet training texts With part-of-speech tagging test text, phrase test text, name entity test text, WordNet test texts;
- structured features the module based on shallow-layer syntax tree, based on part-of-speech tagging training text, phrase training text, life Name entity training text, WordNet training texts obtain shallow-layer syntax tree training text;
Obtained based on part-of-speech tagging test text, phrase test text, name entity test text, WordNet test texts Obtain shallow-layer syntax tree test text;
The feature set module of-Similarity Measure based on shallow-layer syntax tree, based on sentence to training text to a pair of every row Sentence obtains multiple plane characteristics, obtains plane characteristic training text, by plane characteristic training text, shallow-layer syntax tree training text This is combined with sentence to artificial scoring training text obtains shallow-layer syntax tree features training text;
Multiple plane characteristics are obtained to a pair of sentences of every row to test text based on sentence, plane characteristic test text is obtained This, plane characteristic test text is combined with shallow-layer syntax tree test text and is obtained shallow-layer syntax tree characteristic test text;
- the similarity calculation module based on shallow-layer syntax tree, shallow-layer syntax tree features training text is based on using SVR models It is trained, obtains training pattern, Similarity Measure result text is obtained by training pattern and shallow-layer syntax tree characteristic test text This.
Further, the pretreatment module is specifically included:
- part-of-speech tagging unit, part-of-speech tagging instrument is used (such as to sentence to all sentences in training text:Stanford Postagger the part of speech of each word in sentence) is obtained, corresponding part-of-speech tagging training text is obtained;
Same treatment is carried out to test text to sentence and obtains part-of-speech tagging test text;
- phrase tagging unit, syntactic analysis instrument is used (such as to sentence to all sentences in training text:Stanford Parser the phrase belonging to each word) is obtained, phrase test text is obtained;Same treatment is carried out to sentence to test text to obtain Obtain phrase test text;
- name Entity recognition unit, name Entity recognition instrument (such as SST-light is used based on sentence to training text Tagger the name Entity recognition result belonging to word) is obtained, name entity training text is obtained, sentence is entered to test text Row same treatment obtains name entity test text;
- WordNet recognition units, WordNet identification facilities are used (such as based on sentence to training text:SST-light Tagger) obtain on the WordNet belonging to word adopted (WNSS), if justice is represented with space on no WordNet, obtain WordNet training texts, carry out same treatment to test text to sentence and obtain WordNet test texts.
Further, the structured features module based on shallow-layer syntax tree is specifically included:
- basic shallow-layer syntax tree unit, is sentence to each sentence in training text according to part-of-speech tagging training text Construction shallow-layer syntax tree, obtains basic shallow-layer syntax tree training text;By sentence to test text and part-of-speech tagging test text Obtain basic shallow-layer syntax tree test text;
Wherein, shallow-layer syntax tree is the tree that depth is 3, and building method is as follows:Word in one sentence is generated as The leaf node of the bottom;Using the part of speech of each leaf node equivalent as each leaf node father node;Finally, set The father node of all of part of speech node is root node;
The shallow-layer syntax tree unit of-addition phrase level syntactic structure, is basic shallow-layer syntax tree according to phrase training text The more further shallow-layer syntax tree of each syntax in training text obtains phrase shallow-layer syntax tree training text;Based on short Language test text and basic shallow-layer syntax tree test text obtain phrase shallow-layer syntax tree test text;
Wherein, more further shallow-layer syntax tree is the tree that depth is 4, and building method is as follows:By sentence phrase chunking knot Really, the information for belonging to same phrase word is obtained;It is connected on the part of speech father node of the word leaf node that will belong to same phrase Same chunker nodes;Contacting between root node and part of speech node is disconnected, chunker nodes are connected to corresponding part of speech Node;Finally, the father node for setting all of part of speech node is root node;
- add semantic shallow-layer syntax tree unit, phrase-based shallow-layer syntax tree training text to name entity training text Semantic shallow-layer syntax tree training text is obtained with WordNet training texts;Phrase-based shallow-layer syntax tree test text, name are real Body test text and WordNet test texts obtain semantic shallow-layer syntax tree test text;
Semantic shallow-layer syntax tree training text is to add semantic information on phrase shallow-layer syntax tree training text, specific side Method is as follows:If a word in phrase shallow-layer syntax tree training text is in name entity training text and WordNet training There is NER or WNSS information in text, the syntactic information of the chunker nodes comprising the word is modified as NER or WNSS information; If meeting above-mentioned situation containing multiple words in a phrase node, the NER and WNSS of last word in phrase are used Information;
The shallow-layer syntax tree unit of the insignificant information of-deletion, definite article is deleted based on semantic shallow-layer syntax tree training text With conjunction interdependent node, obtain and prune shallow-layer syntax tree training text;Fixed hat is deleted based on semantic shallow-layer syntax tree test text Word and conjunction interdependent node, obtain and prune shallow-layer syntax tree test text;
The present invention reduces definite article and conjunction and their father's node (part of speech node) in structured representation, subtracts Few influence of the insignificant information to result of calculation.In shallow-layer syntax tree if certain leaf node represent word be article or Conjunction, deletes the leaf node, and its father's node (part of speech node) and father father's node (chunker nodes);
- sentence represents unit to joint, based on shallow-layer syntax tree training text is pruned, by a pair of shallow-layer syntax trees of sentence Relevant portion associates acquisition shallow-layer syntax tree training text;Based on shallow-layer syntax tree test text is pruned, by a pair of sentences Shallow-layer syntax tree relevant portion associate acquisition shallow-layer syntax tree test text;
Wherein, the method for the corresponding shallow-layer syntax tree of a pair of sentences being associated:If certain word in two sentences It is identical, obtain their father's node, grandparent node and be nonterminal node, REL on mark.
Further, the feature set module of the Similarity Measure based on shallow-layer syntax tree is specifically included:
- plane characteristic collection unit, plane characteristic training text is obtained based on sentence to training text, based on sentence to test Text obtains plane characteristic test text;
Wherein, plane characteristic training text and plane characteristic test text are respectively sentence to training text and sentence to surveying The Similarity Measure plane characteristic of every a pair of sentences of row in examination text;
The present invention provides 11 plane characteristics.
1st, Longest Common Substring method result of calculation:Calculate the length of continuation character sequence most long;
2nd, longest common subsequence method result of calculation:There is no successional requirement compared to Longest Common Substring method, and Allow to calculate similarity in the case of the insertion of word or missing;
3rd, greedy string-concatenation method result of calculation:Allow the out of order portion to calculate common similar substring of text Point, the maximum length of each substring matching;
4th, 5,6, the 7 sentence editing distance result of calculation based on character n-gram:Assuming that there is a sentence s, then the word The N-Gram for according with string means that the word section obtained by length N cuttings original word, that is, all length is the substring of N in s;If Want, if two character strings, their N-Gram then to be sought respectively, then just from this angle of the quantity of their total substring Degree goes to define the N-Gram distances between two character strings, N=1,2,3,4, totally four features;
8th, 9,10, the 11 sentence editing distance result of calculation based on word n-gram:It is similar to feature 4,5,6,7, will be with Character is changed into word as segmentation unit for segmentation unit;
- shallow-layer syntax tree feature set unit, shallow-layer is obtained by plane characteristic training text and shallow-layer syntax tree training text Syntax tree features training text, obtains shallow-layer syntax tree feature and surveys by plane characteristic test text and shallow-layer syntax tree test text Examination text;
Wherein, often row includes corresponding sentence for shallow-layer syntax tree features training text and shallow-layer syntax tree characteristic test text To ten plane characteristics and a pair of shallow-layer syntax tree features.
Further, the similarity calculation module based on shallow-layer syntax tree is specifically included:
- training unit, obtains similarity calculation, by shallow-layer syntax tree features training text in SVR models using SVR In be trained acquisition training pattern;
- test cell, using training pattern and shallow-layer syntax tree characteristic test text as input, is obtained using SVR instruments Obtain Similarity Measure resulting text;
Wherein, the numerical value that Similarity Measure resulting text is often gone corresponds to phase of the sentence to every a pair of the sentences of row of test text Like degree result of calculation.
By such scheme, the present invention at least has advantages below:
1st, sentence similarity computational methods and system based on structured representation proposed by the present invention, solve using only flat Region feature vector represents representational weaker problem of the sentence to similarity;
2 is different from the method for feature based vector, and the present invention directly calculates two structured features using tree kernel function The identical subtree number of (such as dependency tree) compares similarity, the method based on tree kernel function need not construct high dimensional feature to Quantity space, tree Kernel-Based Methods are to deal with objects with structure tree, by directly calculating two discrete objects (such as syntactic structure tree) Between similarity classified, this causes that implicit high dimensional feature can be explored on the theoretical method based on tree kernel function empty Between, such that it is able to effectively utilize the structured messages such as syntax tree;
3rd, the present invention is based on shallow-layer syntax tree, it is proposed that a kind of new syntax tree construction, structured features is more showed The information such as the syntactic-semantic of sentence, and the syntax tree is applied to sentence similarity calculating, obtain good performance.
Described above is only the general introduction of technical solution of the present invention, in order to better understand technological means of the invention, And can be practiced according to the content of specification, below with presently preferred embodiments of the present invention and coordinate accompanying drawing describe in detail as after.
Brief description of the drawings
Fig. 1 is the dependency tree of sentence " Tigers hit lions " and " Lions hit tigers " in background technology Schematic diagram;
Fig. 2 is sentence similarity computing system structure chart;
Fig. 3 is pretreatment module structure chart;
Fig. 4 is the structured features function structure chart based on shallow-layer syntax tree;
Fig. 5 is the feature set function structure chart of the Similarity Measure based on shallow-layer syntax tree;
Fig. 6 is the similarity calculation module structure chart ` based on shallow-layer syntax tree;
Fig. 7 is sentence similarity computational methods flow chart;
Fig. 8 is pretreatment module flow chart;
Fig. 9 is the structured features block flow diagram based on shallow-layer syntax tree;
Figure 10 is the feature set block flow diagram of the Similarity Measure based on shallow-layer syntax tree;
Figure 11 is the similarity calculation module flow chart based on shallow-layer syntax tree;
Figure 12 is the example of basic shallow-layer syntax tree;
Figure 13 is the example of phrase shallow-layer syntax tree;
Figure 14 is the example of semantic shallow-layer syntax tree;
Figure 15 is the example for pruning shallow-layer syntax tree;
Figure 16 is the example of joint shallow-layer syntax.
Specific embodiment
With reference to the accompanying drawings and examples, specific embodiment of the invention is described in further detail.Hereinafter implement Example is not limited to the scope of the present invention for illustrating the present invention.
The target that sentence similarity of the present invention is calculated is one scoring systems of study, gives a pair of sentences, and the system is returned Similarity score, fraction range is 0~5.0 to represent this implication to sentence completely irrelevant, and 5 to represent this identical to sentence implication. The performance of system is assessed by system-computed fraction and the artificial Pearson came relative coefficient for judging fraction.
Flow, implementation process of the invention is illustrated below in conjunction with illustration for the purpose of simplifying the description.
Sentence similarity computing system based on structured features, as shown in Fig. 2 including:
- pretreatment module 10, sentence is called to training text and sentence to all sentences in test text part-of-speech tagging, Syntactic analysis, name Entity recognition, WordNet identification facilities carry out respectively part-of-speech tagging, syntactic analysis, name Entity recognition, WordNet identifications obtain part-of-speech tagging training text, phrase training text, name entity training text, WordNet training texts With part-of-speech tagging test text, phrase test text, name entity test text, WordNet test texts;
- based on shallow-layer syntax tree structured features module 20, based on part-of-speech tagging training text, phrase training text, Name entity training text, WordNet training texts obtain shallow-layer syntax tree training text;
Obtained based on part-of-speech tagging test text, phrase test text, name entity test text, WordNet test texts Obtain shallow-layer syntax tree test text;
The feature set module 30 of-Similarity Measure based on shallow-layer syntax tree, based on sentence to training text to every row one Multiple plane characteristics are obtained to sentence, plane characteristic training text is obtained, plane characteristic training text, shallow-layer syntax tree are trained Text is combined with sentence to artificial scoring training text obtains shallow-layer syntax tree features training text;
Multiple plane characteristics are obtained to a pair of sentences of every row to test text based on sentence, plane characteristic test text is obtained This, plane characteristic test text is combined with shallow-layer syntax tree test text and is obtained shallow-layer syntax tree characteristic test text;
- the similarity calculation module 40 based on shallow-layer syntax tree, shallow-layer syntax tree features training text is based on using SVR models Originally it is trained, obtains training pattern, Similarity Measure result is obtained by training pattern and shallow-layer syntax tree characteristic test text Text.
As shown in figure 3, pretreatment module 10 is specifically included:
- part-of-speech tagging unit 101, uses in part-of-speech tagging instrument acquisition sentence all sentences in training text sentence The part of speech of each word, obtains corresponding part-of-speech tagging training text;
Same treatment is carried out to test text to sentence and obtains part-of-speech tagging test text;
- phrase tagging unit 102, each list is obtained to sentence to all sentences in training text using syntactic analysis instrument Phrase belonging to word, obtains phrase test text;Same treatment is carried out to test text to sentence and obtains phrase test text;
- name Entity recognition unit 103, word institute is obtained based on sentence to training text using name Entity recognition instrument The name Entity recognition result of category, obtains name entity training text, carries out same treatment to test text to sentence and is ordered Name entity test text;
- WordNet recognition units 104, are obtained belonging to word to training text based on sentence using WordNet identification facilities WordNet on justice, if no WordNet on justice represented with space, obtain WordNet training texts, to sentence to test Text carries out same treatment and obtains WordNet test texts.
As shown in figure 4, the structured features module 20 based on shallow-layer syntax tree is specifically included:
- basic shallow-layer syntax tree unit 201, is sentence to each in training text according to part-of-speech tagging training text Syntax shallow-layer syntax tree, obtains basic shallow-layer syntax tree training text;Test text and part-of-speech tagging are tested by sentence Text obtains basic shallow-layer syntax tree test text;
Wherein, shallow-layer syntax tree building method is as follows:Word in one sentence is generated as the leaf node of the bottom; Using the part of speech of each leaf node equivalent as each leaf node father node;Finally, all of part of speech node is set Father node is root node;
The shallow-layer syntax tree unit 202 of-addition phrase level syntactic structure, is basic shallow-layer sentence according to phrase training text The more further shallow-layer syntax tree of each syntax in method tree training text obtains phrase shallow-layer syntax tree training text;Base Phrase shallow-layer syntax tree test text is obtained in phrase test text and basic shallow-layer syntax tree test text;
Wherein, more further shallow-layer syntax tree building method is as follows:By sentence phrase chunking result, acquisition belongs to same The information of phrase word;Same chunker nodes are connected on the part of speech father node of the word leaf node that will belong to same phrase; Contacting between root node and part of speech node is disconnected, chunker nodes are connected to corresponding part of speech node;Finally, institute is set The father node of some part of speech nodes is root node;
- add semantic shallow-layer syntax tree unit 203, phrase-based shallow-layer syntax tree training text, name entity training Text and WordNet training texts obtain semantic shallow-layer syntax tree training text;Phrase-based shallow-layer syntax tree test text, life Name entity test text and WordNet test texts obtain semantic shallow-layer syntax tree test text;
Semantic shallow-layer syntax tree training text is to add semantic information on phrase shallow-layer syntax tree training text, specific side Method is as follows:If a word in phrase shallow-layer syntax tree training text is in name entity training text and WordNet training There is NER or WNSS information in text, the syntactic information of the chunker nodes comprising the word is modified as NER or WNSS information; If meeting above-mentioned situation containing multiple words in a phrase node, the NER and WNSS of last word in phrase are used Information;
The shallow-layer syntax tree unit 204 of the insignificant information of-deletion, fixed hat is deleted based on semantic shallow-layer syntax tree training text Word and conjunction interdependent node, obtain and prune shallow-layer syntax tree training text;Delete fixed based on semantic shallow-layer syntax tree test text Article and conjunction interdependent node, obtain and prune shallow-layer syntax tree test text;
- sentence represents unit 205 to joint, based on shallow-layer syntax tree training text is pruned, by a pair of shallow-layer sentences of sentence Method tree relevant portion associates acquisition shallow-layer syntax tree training text;Based on shallow-layer syntax tree test text is pruned, by a pair The shallow-layer syntax tree relevant portion of sentence associates acquisition shallow-layer syntax tree test text;
Wherein, the method for the corresponding shallow-layer syntax tree of a pair of sentences being associated:If certain word in two sentences It is identical, obtain their father's node, grandparent node and be nonterminal node, REL on mark.
As shown in figure 5, the feature set module 30 of the Similarity Measure based on shallow-layer syntax tree is specifically included:
- plane characteristic collection unit 301, obtains plane characteristic training text, based on sentence pair based on sentence to training text Test text obtains plane characteristic test text;
Wherein, plane characteristic training text and plane characteristic test text are respectively sentence to training text and sentence to surveying The Similarity Measure plane characteristic of every a pair of sentences of row in examination text;
- shallow-layer syntax tree feature set unit 302, obtains shallow by plane characteristic training text and shallow-layer syntax tree training text Layer syntax tree features training text, shallow-layer syntax tree feature is obtained by plane characteristic test text and shallow-layer syntax tree test text Test text.
As shown in fig. 6, the similarity calculation module 40 based on shallow-layer syntax tree is specifically included:
- training unit 401, obtains similarity calculation, by shallow-layer syntax tree features training text in SVR using SVR Acquisition training pattern is trained in model;
- test cell 402, using training pattern and shallow-layer syntax tree characteristic test text as input, using SVR instruments Obtain Similarity Measure resulting text;
Wherein, the numerical value that Similarity Measure resulting text is often gone corresponds to phase of the sentence to every a pair of the sentences of row of test text Like degree result of calculation.
Sentence similarity computational methods based on structured features, as shown in fig. 7, comprises:
S10, sentence is called to training text and sentence to all sentences in test text part-of-speech tagging, syntactic analysis, Name Entity recognition, WordNet identification facilities carry out part-of-speech tagging, syntactic analysis, name Entity recognition, WordNet knowledges respectively Huo get not part-of-speech tagging training text, phrase training text, name entity training text, WordNet training texts and part of speech mark Note test text, phrase test text, name entity test text, WordNet test texts;
The sentence is that every row contains two sentences for needing to calculate similarity to test text to training text and sentence Text.
S20, based on part-of-speech tagging training text, phrase training text, name entity training text, WordNet training texts This acquisition shallow-layer syntax tree training text;
Obtained based on part-of-speech tagging test text, phrase test text, name entity test text, WordNet test texts Obtain shallow-layer syntax tree test text.
S30,11 plane characteristics are obtained to a pair of sentences of every row to training text based on sentence, obtain plane characteristic training Text, plane characteristic training text, shallow-layer syntax tree training text is combined with sentence to artificial scoring training text and is obtained shallow Layer syntax tree features training text;
11 plane characteristics are obtained to a pair of sentences of every row to test text based on sentence, plane characteristic test text is obtained This, plane characteristic test text is combined with shallow-layer syntax tree test text and is obtained shallow-layer syntax tree characteristic test text.
S40, it is trained based on shallow-layer syntax tree features training text using SVR models, training pattern is obtained, by training Model and shallow-layer syntax tree characteristic test text obtain Similarity Measure resulting text.
Wherein, as shown in figure 8, the detailed process of S10 is as follows:
S101, to sentence to all sentences in training text using part-of-speech tagging instrument (such as:Stanford Postagger the part of speech of each word in sentence) is obtained, corresponding part-of-speech tagging training text is obtained.
Same treatment is carried out to test text to sentence and obtains part-of-speech tagging test text.
Example 1:" A woman and man are dancing in the rain " after part-of-speech tagging by obtaining:
Example 2:“DT NN CC NN VBP VBG IN DT NN”.
Wherein, DT represents definite article, and NN represents noun, and CC represents conjunction, and VBP represents verb third-person singular, VBG tables Show gerund and present participle, IN represents preposition or subordinate conjunction.
S102, to sentence to all sentences in training text using syntactic analysis instrument (such as:Stanford parser) obtain The phrase belonging to each word is obtained, phrase test text is obtained;Same treatment is carried out to test text to sentence and obtains phrase survey Examination text.
Example 1 carries out phrase chunking and obtains 4 phrases:
Example 3:“NP(a woman and man)VP(bedance)PP(in)NP(the rain)”
Wherein NP represents noun phrase, and VP represents verb phrase, and PP represents prepositional phrase.
S103, based on sentence to training text using name Entity recognition instrument (such as SST-light tagger) obtain list Name Entity recognition result belonging to word, obtains name entity training text;Same treatment is carried out to sentence to test text to obtain Entity test text must be named.
Example 1 is named Entity recognition and obtains two entities:
Example 4:“E:PER_DESC woman E:PER_DESC man”
Wherein E:PER_DESC is represented and is identified that entity type is people.
S104, based on sentence to training text using WordNet identification facilities (such as:SST-light tagger) obtain single On WordNet belonging to word adopted (WNSS);If justice is represented with space on no WordNet, WordNet training texts are obtained; Same treatment is carried out to test text to sentence and obtains WordNet test texts.
Example 1 carries out WNSS identifications and obtains:
Example 5:“female woman male man weather rain”
Wherein female, male, weather are respectively woman, justice on the WordNet of man, rain.
As shown in figure 9, the detailed process of S20 is as follows:
S201, according to part-of-speech tagging training text, be sentence to each the syntax shallow-layer syntax tree in training text, Obtain basic shallow-layer syntax tree training text;Basic shallow-layer syntax is obtained to test text and part-of-speech tagging test text by sentence Tree test text.
Wherein, shallow-layer syntax tree is the tree that depth is 3, and building method is as follows:Word in one sentence is generated as 9 words are exactly 9 leaf nodes in the leaf node of the bottom, such as example 1;Using the part of speech of each leaf node equivalent as every The father node of individual leaf node;Finally, the father node for setting all of part of speech node is root node.
Example 1 is as shown in figure 12 by construction shallow-layer syntax tree.
S202, according to phrase training text, be each syntax deeper in basic shallow-layer syntax tree training text The shallow-layer syntax tree of layer obtains phrase shallow-layer syntax tree training text;Phrase-based test text and basic shallow-layer syntax tree are tested Text obtains phrase shallow-layer syntax tree test text.
Wherein, more further shallow-layer syntax tree is the tree that depth is 4, and building method is as follows:By sentence phrase chunking knot Really, the information for belonging to same phrase word is obtained.It is connected on the part of speech father node of the word leaf node that will belong to same phrase Same chunker nodes.It is exactly 4 chunker nodes as 9 words belong to 4 phrases in example 1;Disconnect root node and part of speech Contact between node, corresponding part of speech node is connected to by chunker nodes, finally, sets the father of all of part of speech node Node is root node.
Figure 12 adds after one layer of chunker node as shown in figure 13.
S203, phrase-based shallow-layer syntax tree training text, name entity training text and WordNet training texts are obtained Semantic shallow-layer syntax tree training text;Phrase-based shallow-layer syntax tree test text, name entity test text and WordNet are surveyed Examination text obtains semantic shallow-layer syntax tree test text.
Semantic shallow-layer syntax tree training text is to add semantic information on phrase shallow-layer syntax tree training text, specific side Method is as follows:If a word in phrase shallow-layer syntax tree training text is in name entity training text and WordNet training There is NER or WNSS information in text, the syntactic information of the chunker nodes comprising the word is modified as NER or WNSS information; If meeting above-mentioned situation containing multiple words in a phrase node, the NER and WNSS of last word in phrase are used Information.
After Figure 13 addition semantic informations as shown in figure 14.
S204, definite article and conjunction interdependent node are deleted based on semantic shallow-layer syntax tree training text, are obtained and are pruned shallow-layer Syntax tree training text;Definite article and conjunction interdependent node are deleted based on semantic shallow-layer syntax tree test text, is obtained and is pruned shallow Layer syntax tree test text.
The present invention will reduce definite article and conjunction and their father's node (part of speech node) in structured representation, Reduce influence of the insignificant information to result of calculation.In shallow-layer syntax tree if certain leaf node represent word be article or Person's conjunction, deletes the leaf node, and its father's node (part of speech node) and father father's node (chunker nodes).Such as 3 words are definite article and conjunction in 9 words in example 1, delete two nodes above this 3 nodes, and each of which.
Figure 14 is as shown in figure 15 after deleting non-keynote message.
S205, based on prune shallow-layer syntax tree training text, the shallow-layer syntax tree relevant portion of a pair of sentences is associated To obtain shallow-layer syntax tree training text;Based on shallow-layer syntax tree test text is pruned, by a pair of shallow-layer syntax tree phases of sentence Close partial association and get up to obtain shallow-layer syntax tree test text.
Wherein, the method for the corresponding shallow-layer syntax tree of a pair of sentences being associated:If certain word in two sentences Identical (leaf node is identical), obtains their father's node (part of speech node), grandparent node and be nonterminal node, on mark REL.As in example 8, there is 6 words after two shallow-layer syntax tree-prunings of sentence, first sentence has 4 words at second Occur in sentence, by the father node of these leaf nodes (node of same word in i.e. two sentences) and the father node of father node Syntax and semantic information before add " REL- " mark.The shallow-layer syntax tree of example 8 is as shown in figure 16 after being associated.
Example 8:“the girl sing into a microphone”“the girl sing into the phone”
Wherein, as shown in Figure 10, the detailed process of S30 is as follows:
S301, is obtained by acquisition plane characteristic training text, based on sentence to test text based on sentence to training text Obtain plane characteristic test text.
Wherein, plane characteristic training text and plane characteristic test text are sentence literary to test to training text and sentence The every Similarity Measure plane characteristic of a pair of sentences of row in this.The present invention provides 11 plane characteristics.
1st, Longest Common Substring method result of calculation:Calculate the length of continuation character sequence most long;
2nd, longest common subsequence method result of calculation:There is no successional requirement compared to Longest Common Substring method, and Allow to calculate similarity in the case of the insertion of word or missing;
3rd, greedy string-concatenation method result of calculation:Allow the out of order portion to calculate common similar substring of text Point, the maximum length of each substring matching;
4th, 5,6, the 7 sentence editing distance result of calculation based on character n-gram:Assuming that there is a sentence s, then the word The N-Gram for according with string means that the word section obtained by length N cuttings original word, that is, all length is the substring of N in s.If Want, if two character strings, their N-Gram then to be sought respectively, then just from this angle of the quantity of their total substring Degree goes to define the N-Gram distances between two character strings, N=1,2,3,4, totally four features;
8th, 9,10, the 11 sentence editing distance result of calculation based on word n-gram:It is similar to feature 4,5,6,7, will be with Character is changed into word as segmentation unit for segmentation unit.
10 plane characteristics of example 8 are:
Example 10:1:1.000000000000697 2:0.9391768400811316 3:0.99999999999998314: 0.503555802360086 5:0.5677922252084567 6:0.520906295824116 7:0.98: 0.3333333333333333 9:0.910:0.2142857142857142711:0.07142857142857142
Wherein with 1:As a example by 1.000000000000697, ":" before numeral represent feature number, ":" behind numeral Value is characterized, calls special instrument (such as UKP) to obtain.
S302, shallow-layer syntax tree features training text is obtained by plane characteristic training text and shallow-layer syntax tree training text This;Shallow-layer syntax tree characteristic test text is obtained by plane characteristic test text and shallow-layer syntax tree test text.
Wherein, often row includes corresponding sentence for shallow-layer syntax tree features training text and shallow-layer syntax tree characteristic test text To ten plane characteristics and a pair of shallow-layer syntax tree features.
The feature set of example 8 is:
Example 11:|BT|(ROOT(root(REL-dance(REL-nsubj(REL-woman(REL-det a)(REL-cc and)(REL-conj:and man)))(REL-nsubj man)(REL-cop be)(REL-nmod:in(REL-rain(REL- case in)(det the))))))|BT|(ROOT(root(REL-dance(REL-nsubj(REL-man(REL-det a) (REL-cc and)(REL-conj:and woman)))(REL-nsubj woman)(REL-cop be)(REL-nmod:in (REL-rain(REL-case in))))))|ET|1:1.000000000000697 2:0.9391768400811316 3: 0.9999999999999831 4:0.503555802360086 5:0.5677922252084567 6: 0.520906295824116 7:0.9 8:0.3333333333333333 9:0.9 10:0.2142857142857142711: 0.07142857142857142
Wherein | BT | present invention corresponding with 8 two sentences of string table example between | BT | and | BT | and | ET | is proposed The shallow-layer syntax tree similar to Figure 14, root represents root node, and " (" represents child nodes opening flag, ") " represents child's section Point end mark.Content between " (" and ") " is the content of node.Finally splice the plane characteristic of upper example 10.
Wherein, as shown in figure 11, the detailed process of S40 is as follows:
S401, using SVR obtain similarity calculation, shallow-layer syntax tree features training text as input utilization SVR instruments are trained and obtain training pattern.
S402, using training pattern and shallow-layer syntax tree characteristic test text as input, obtain similar using SVR instruments Degree result of calculation text.
Wherein, the numerical value that Similarity Measure resulting text is often gone corresponds to phase of the sentence to every a pair of the sentences of row of test text Like degree result of calculation.If 8 two similarity scores of sentence of example are 3.8731014.
The above is only the preferred embodiment of the present invention, is not intended to limit the invention, it is noted that for this skill For the those of ordinary skill in art field, on the premise of the technology of the present invention principle is not departed from, can also make it is some improvement and Modification, these are improved and modification also should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of sentence similarity computational methods, it is characterised in that including step:
S10, part-of-speech tagging, syntactic analysis, name are called to all sentences in test text to training text and sentence to sentence Entity recognition, WordNet identification facilities carry out part-of-speech tagging, syntactic analysis, name Entity recognition, WordNet identifications to obtain respectively Part-of-speech tagging training text, phrase training text, name entity training text, WordNet training texts and part-of-speech tagging is obtained to survey Examination text, phrase test text, name entity test text, WordNet test texts,
Wherein, the sentence is that every row contains two sentences for needing to calculate similarity to test text to training text and sentence Text;
S20, obtained based on part-of-speech tagging training text, phrase training text, name entity training text, WordNet training texts Shallow-layer syntax tree training text is obtained,
Obtain shallow based on part-of-speech tagging test text, phrase test text, name entity test text, WordNet test texts Layer syntax tree test text;
S30, multiple plane characteristics are obtained to a pair of sentences of every row to training text based on sentence, obtain plane characteristic training text This, by plane characteristic training text, shallow-layer syntax tree training text is combined with sentence to artificial scoring training text obtains shallow-layer Syntax tree features training text,
Multiple plane characteristics are obtained to a pair of sentences of every row to test text based on sentence, plane characteristic test text is obtained, will Plane characteristic test text is combined with shallow-layer syntax tree test text obtains shallow-layer syntax tree characteristic test text;
S40, it is trained based on shallow-layer syntax tree features training text using SVR models, training pattern is obtained, by training pattern Similarity Measure resulting text is obtained with shallow-layer syntax tree characteristic test text.
2. sentence similarity computational methods according to claim 1, it is characterised in that:The detailed process of the step S10 It is as follows:
S101, the part of speech for obtaining each word in sentence using part-of-speech tagging instrument to all sentences in training text to sentence, Obtain corresponding part-of-speech tagging training text;
Same treatment is carried out to test text to sentence and obtains part-of-speech tagging test text;
S102, phrase belonging to each word is obtained using syntactic analysis instrument to all sentences in training text to sentence, obtained Obtain phrase training text;
Same treatment is carried out to test text to sentence and obtains phrase test text;
S103, the name Entity recognition knot for obtaining belonging to word using name Entity recognition instrument to training text based on sentence Really, name entity training text is obtained;
Same treatment is carried out to test text to sentence and obtains name entity test text;
S104, the WordNet obtained using WordNet identification facilities to training text belonging to word based on sentence are upper adopted, if There is no justice on WordNet to be represented with space, obtain WordNet training texts;
Same treatment is carried out to test text to sentence and obtains WordNet test texts.
3. sentence similarity computational methods according to claim 1, it is characterised in that:The detailed process of the step S20 It is as follows:
S201, according to part-of-speech tagging training text, be sentence to each the syntax shallow-layer syntax tree in training text, obtain Basic shallow-layer syntax tree training text;Basic shallow-layer syntax tree is obtained to test text and part-of-speech tagging test text by sentence to survey Examination text;
Wherein, shallow-layer syntax tree building method is as follows:Word in one sentence is generated as the leaf node of the bottom;Every The part of speech of individual leaf node equivalent as each leaf node father node;Finally, father's section of all of part of speech node is set Point is root node;
S202, according to phrase training text, be that each syntax in basic shallow-layer syntax tree training text is more further Shallow-layer syntax tree obtains phrase shallow-layer syntax tree training text;Phrase-based test text and basic shallow-layer syntax tree test text Obtain phrase shallow-layer syntax tree test text;
Wherein, more further shallow-layer syntax tree building method is as follows:By sentence phrase chunking result, acquisition belongs to same phrase The information of word;Same chunker nodes are connected on the part of speech father node of the word leaf node that will belong to same phrase;Disconnect Contacting between root node and part of speech node, corresponding part of speech node is connected to by chunker nodes;Finally, set all of The father node of part of speech node is root node;
S203, phrase-based shallow-layer syntax tree training text, name entity training text and WordNet training texts obtain semantic Shallow-layer syntax tree training text;Phrase-based shallow-layer syntax tree test text, name entity test text and WordNet test texts This acquisition semanteme shallow-layer syntax tree test text;
Semantic shallow-layer syntax tree training text is to add semantic information on phrase shallow-layer syntax tree training text, and specific method is such as Under:If a word in phrase shallow-layer syntax tree training text is in name entity training text and WordNet training texts In have NER or WNSS information, the syntactic information of the chunker nodes comprising the word is modified as NER or WNSS information;If Meet above-mentioned situation containing multiple words in one phrase node, use NER the and WNSS information of last word in phrase;
S204, based on semantic shallow-layer syntax tree training text delete definite article and conjunction interdependent node, obtain prune shallow-layer syntax Tree training text;Definite article and conjunction interdependent node are deleted based on semantic shallow-layer syntax tree test text, is obtained and is pruned shallow-layer sentence Method tree test text;
S205, based on prune shallow-layer syntax tree training text, the shallow-layer syntax tree relevant portion of a pair of sentences is associated and is obtained Obtain shallow-layer syntax tree training text;Based on shallow-layer syntax tree test text is pruned, by a pair of shallow-layer syntax tree dependent parts of sentence Divide and associate acquisition shallow-layer syntax tree test text;
Wherein, the method for the corresponding shallow-layer syntax tree of a pair of sentences being associated:If certain word is identical in two sentences, Obtain their father's node, grandparent node and be nonterminal node, REL on mark.
4. sentence similarity computational methods according to claim 1, it is characterised in that:The detailed process of the step S30 It is as follows:
S301, plane characteristic training text is obtained to training text based on sentence, it is special to test text to obtain plane based on sentence Levy test text;
Wherein, plane characteristic training text and plane characteristic test text are respectively sentence to training text and sentence to test text The every Similarity Measure plane characteristic of a pair of sentences of row in this;
S302, shallow-layer syntax tree features training text is obtained by plane characteristic training text and shallow-layer syntax tree training text;By Plane characteristic test text obtains shallow-layer syntax tree characteristic test text with shallow-layer syntax tree test text.
5. sentence similarity computational methods according to claim 1, it is characterised in that:The detailed process of the step S40 It is as follows:
S401, using SVR obtain similarity calculation, instructed in SVR models by shallow-layer syntax tree features training text Practice and obtain training pattern;
S402, using training pattern and shallow-layer syntax tree characteristic test text as input, obtain similarity meter using SVR instruments Calculate resulting text;
Wherein, the numerical value that Similarity Measure resulting text is often gone corresponds to similarity of the sentence to every a pair of the sentences of row of test text Result of calculation.
6. a kind of sentence similarity computing system, it is characterised in that including:
- pretreatment module, part-of-speech tagging, syntax point are called to sentence to training text and sentence to all sentences in test text Analysis, name Entity recognition, WordNet identification facilities carry out part-of-speech tagging, syntactic analysis, name Entity recognition, WordNet respectively Identification obtains part-of-speech tagging training text, phrase training text, name entity training text, WordNet training texts and part of speech Mark test text, phrase test text, name entity test text, WordNet test texts;
- structured features the module based on shallow-layer syntax tree, it is real based on part-of-speech tagging training text, phrase training text, name Body training text, WordNet training texts obtain shallow-layer syntax tree training text;
Obtain shallow based on part-of-speech tagging test text, phrase test text, name entity test text, WordNet test texts Layer syntax tree test text;
The feature set module of-Similarity Measure based on shallow-layer syntax tree, based on sentence to training text to a pair of sentences of every row Obtain multiple plane characteristics, obtain plane characteristic training text, by plane characteristic training text, shallow-layer syntax tree training text with Sentence is combined to artificial scoring training text and obtains shallow-layer syntax tree features training text;
Multiple plane characteristics are obtained to a pair of sentences of every row to test text based on sentence, plane characteristic test text is obtained, will Plane characteristic test text is combined with shallow-layer syntax tree test text obtains shallow-layer syntax tree characteristic test text;
- the similarity calculation module based on shallow-layer syntax tree, is carried out using SVR models based on shallow-layer syntax tree features training text Training, obtains training pattern, and Similarity Measure resulting text is obtained by training pattern and shallow-layer syntax tree characteristic test text.
7. sentence similarity computing system according to claim 6, it is characterised in that:The pretreatment module is specifically wrapped Include:
- part-of-speech tagging unit, each list in part-of-speech tagging instrument acquisition sentence is used to sentence to all sentences in training text The part of speech of word, obtains corresponding part-of-speech tagging training text;
Same treatment is carried out to test text to sentence and obtains part-of-speech tagging test text;
- phrase tagging unit, is obtained belonging to each word to all sentences in training text to sentence using syntactic analysis instrument Phrase, obtain phrase test text;Same treatment is carried out to test text to sentence and obtains phrase test text;
- name Entity recognition unit, the life belonging to word is obtained to training text based on sentence using name Entity recognition instrument Name Entity recognition result, obtains name entity training text, carries out same treatment to test text to sentence and obtains name entity Test text;
- WordNet recognition units, are obtained belonging to word to training text based on sentence using WordNet identification facilities The upper justice of WordNet, if justice is represented with space on no WordNet, obtains WordNet training texts, to sentence to test text Originally carry out same treatment and obtain WordNet test texts.
8. sentence similarity computing system according to claim 6, it is characterised in that:The knot based on shallow-layer syntax tree Structure characteristic module is specifically included:
- basic shallow-layer syntax tree unit, is sentence to each syntax in training text according to part-of-speech tagging training text Shallow-layer syntax tree, obtains basic shallow-layer syntax tree training text;Test text and part-of-speech tagging test text are obtained by sentence Basic shallow-layer syntax tree test text;
Wherein, shallow-layer syntax tree building method is as follows:Word in one sentence is generated as the leaf node of the bottom;Every The part of speech of individual leaf node equivalent as each leaf node father node;Finally, father's section of all of part of speech node is set Point is root node;
The shallow-layer syntax tree unit of-addition phrase level syntactic structure, is basic shallow-layer syntax tree training according to phrase training text The more further shallow-layer syntax tree of each syntax in text obtains phrase shallow-layer syntax tree training text;Phrase-based survey Examination text and basic shallow-layer syntax tree test text obtain phrase shallow-layer syntax tree test text;
Wherein, more further shallow-layer syntax tree building method is as follows:By sentence phrase chunking result, acquisition belongs to same phrase The information of word;Same chunker nodes are connected on the part of speech father node of the word leaf node that will belong to same phrase;Disconnect Contacting between root node and part of speech node, corresponding part of speech node is connected to by chunker nodes;Finally, set all of The father node of part of speech node is root node;
- add semantic shallow-layer syntax tree unit, phrase-based shallow-layer syntax tree training text, name entity training text and WordNet training texts obtain semantic shallow-layer syntax tree training text;Phrase-based shallow-layer syntax tree test text, name entity Test text and WordNet test texts obtain semantic shallow-layer syntax tree test text;
Semantic shallow-layer syntax tree training text is to add semantic information on phrase shallow-layer syntax tree training text, and specific method is such as Under:If a word in phrase shallow-layer syntax tree training text is in name entity training text and WordNet training texts In have NER or WNSS information, the syntactic information of the chunker nodes comprising the word is modified as NER or WNSS information;If Meet above-mentioned situation containing multiple words in one phrase node, use NER the and WNSS information of last word in phrase;
The shallow-layer syntax tree unit of the insignificant information of-deletion, definite article and company are deleted based on semantic shallow-layer syntax tree training text Word interdependent node, obtains and prunes shallow-layer syntax tree training text;Based on semantic shallow-layer syntax tree test text delete definite article and Conjunction interdependent node, obtains and prunes shallow-layer syntax tree test text;
- sentence represents unit to joint, based on shallow-layer syntax tree training text is pruned, a pair of shallow-layer syntax trees of sentence are related Partial association gets up to obtain shallow-layer syntax tree training text;Based on shallow-layer syntax tree test text is pruned, by the shallow of a pair of sentences Layer syntax tree relevant portion associates acquisition shallow-layer syntax tree test text;
Wherein, the method for the corresponding shallow-layer syntax tree of a pair of sentences being associated:If certain word is identical in two sentences, Obtain their father's node, grandparent node and be nonterminal node, REL on mark.
9. sentence similarity computing system according to claim 6, it is characterised in that:The phase based on shallow-layer syntax tree The feature set module calculated like degree is specifically included:
- plane characteristic collection unit, obtains plane characteristic training text, based on sentence to test text based on sentence to training text Obtain plane characteristic test text;
Wherein, plane characteristic training text and plane characteristic test text are respectively sentence to training text and sentence to test text The every Similarity Measure plane characteristic of a pair of sentences of row in this;
- shallow-layer syntax tree feature set unit, shallow-layer syntax is obtained by plane characteristic training text and shallow-layer syntax tree training text Tree features training text, shallow-layer syntax tree characteristic test text is obtained by plane characteristic test text and shallow-layer syntax tree test text This.
10. sentence similarity computing system according to claim 6, it is characterised in that:It is described based on shallow-layer syntax tree Similarity calculation module is specifically included:
- training unit, similarity calculation is obtained using SVR, is entered in SVR models by shallow-layer syntax tree features training text Row training obtains training pattern;
- test cell, using training pattern and shallow-layer syntax tree characteristic test text as input, phase is obtained using SVR instruments Like degree result of calculation text;
Wherein, the numerical value that Similarity Measure resulting text is often gone corresponds to similarity of the sentence to every a pair of the sentences of row of test text Result of calculation.
CN201611143723.2A 2016-12-13 2016-12-13 Sentence similarity calculation method and system Pending CN106844331A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611143723.2A CN106844331A (en) 2016-12-13 2016-12-13 Sentence similarity calculation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611143723.2A CN106844331A (en) 2016-12-13 2016-12-13 Sentence similarity calculation method and system

Publications (1)

Publication Number Publication Date
CN106844331A true CN106844331A (en) 2017-06-13

Family

ID=59139793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611143723.2A Pending CN106844331A (en) 2016-12-13 2016-12-13 Sentence similarity calculation method and system

Country Status (1)

Country Link
CN (1) CN106844331A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363692A (en) * 2018-02-13 2018-08-03 成都智库二八六信息技术有限公司 A kind of computational methods of sentence similarity and the public sentiment measure of supervision based on this method
CN108681535A (en) * 2018-04-11 2018-10-19 广州视源电子科技股份有限公司 Candidate word evaluation method and device, computer equipment and storage medium
CN108932320A (en) * 2018-06-27 2018-12-04 广州优视网络科技有限公司 Article search method, apparatus and electronic equipment
CN109063004A (en) * 2018-07-09 2018-12-21 深圳追科技有限公司 A kind of method and apparatus automatically generating the similar question sentence of FAQ
CN109284399A (en) * 2018-10-11 2019-01-29 深圳前海微众银行股份有限公司 Similarity prediction model training method, equipment and computer readable storage medium
CN109298796A (en) * 2018-07-24 2019-02-01 北京捷通华声科技股份有限公司 A kind of Word association method and device
CN109766260A (en) * 2018-12-11 2019-05-17 平安科技(深圳)有限公司 Configure method, apparatus, electronic equipment and the storage medium of test action
CN110008462A (en) * 2018-01-05 2019-07-12 阿里巴巴集团控股有限公司 A kind of command sequence detection method and command sequence processing method
CN110175585A (en) * 2019-05-30 2019-08-27 北京林业大学 It is a kind of letter answer correct system and method automatically
CN110209809A (en) * 2018-08-27 2019-09-06 腾讯科技(深圳)有限公司 Text Clustering Method and device, storage medium and electronic device
CN110246098A (en) * 2019-05-31 2019-09-17 暨南大学 A kind of reconstruction of fragments method
CN110287282A (en) * 2019-05-20 2019-09-27 湖南大学 The Intelligent dialogue systems response method and Intelligent dialogue system of calculation are assessed based on tree
CN110956033A (en) * 2019-12-04 2020-04-03 北京中电普华信息技术有限公司 Text similarity calculation method and device
CN110990537A (en) * 2019-12-11 2020-04-10 中山大学 Sentence similarity calculation method based on edge information and semantic information
CN111161730A (en) * 2019-12-27 2020-05-15 中国联合网络通信集团有限公司 Voice instruction matching method, device, equipment and storage medium
CN111274809A (en) * 2020-02-20 2020-06-12 苏宁云计算有限公司 Method and device for processing linguistic data in knowledge base
CN113569570A (en) * 2020-04-29 2021-10-29 阿里巴巴集团控股有限公司 Named entity identification method and device and electronic equipment
CN114691362A (en) * 2022-03-22 2022-07-01 重庆邮电大学 Edge calculation method for compromising time delay and energy consumption

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095188A (en) * 2015-08-14 2015-11-25 北京京东尚科信息技术有限公司 Sentence similarity computing method and device
JP2016197289A (en) * 2015-04-02 2016-11-24 日本電信電話株式会社 Parameter learning device, similarity calculation device and method, and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016197289A (en) * 2015-04-02 2016-11-24 日本電信電話株式会社 Parameter learning device, similarity calculation device and method, and program
CN105095188A (en) * 2015-08-14 2015-11-25 北京京东尚科信息技术有限公司 Sentence similarity computing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DANIEL BAR ET AL.: "UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures", 《FIRST JOINT CONFERENCE ON LEXICAL AND COMPUTATIONAL SEMANTICS》 *
MENG YANG ET AL.: "Sentence Similarity on Structural Representations", 《ICCPOL 2016, NLPCC 2016: NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008462A (en) * 2018-01-05 2019-07-12 阿里巴巴集团控股有限公司 A kind of command sequence detection method and command sequence processing method
CN110008462B (en) * 2018-01-05 2023-09-01 阿里巴巴集团控股有限公司 Command sequence detection method and command sequence processing method
CN108363692A (en) * 2018-02-13 2018-08-03 成都智库二八六信息技术有限公司 A kind of computational methods of sentence similarity and the public sentiment measure of supervision based on this method
CN108681535A (en) * 2018-04-11 2018-10-19 广州视源电子科技股份有限公司 Candidate word evaluation method and device, computer equipment and storage medium
CN108932320A (en) * 2018-06-27 2018-12-04 广州优视网络科技有限公司 Article search method, apparatus and electronic equipment
CN109063004A (en) * 2018-07-09 2018-12-21 深圳追科技有限公司 A kind of method and apparatus automatically generating the similar question sentence of FAQ
CN109298796A (en) * 2018-07-24 2019-02-01 北京捷通华声科技股份有限公司 A kind of Word association method and device
CN109298796B (en) * 2018-07-24 2022-05-24 北京捷通华声科技股份有限公司 Word association method and device
CN110209809A (en) * 2018-08-27 2019-09-06 腾讯科技(深圳)有限公司 Text Clustering Method and device, storage medium and electronic device
CN109284399B (en) * 2018-10-11 2022-03-15 深圳前海微众银行股份有限公司 Similarity prediction model training method and device and computer readable storage medium
CN109284399A (en) * 2018-10-11 2019-01-29 深圳前海微众银行股份有限公司 Similarity prediction model training method, equipment and computer readable storage medium
CN109766260A (en) * 2018-12-11 2019-05-17 平安科技(深圳)有限公司 Configure method, apparatus, electronic equipment and the storage medium of test action
CN110287282A (en) * 2019-05-20 2019-09-27 湖南大学 The Intelligent dialogue systems response method and Intelligent dialogue system of calculation are assessed based on tree
CN110175585B (en) * 2019-05-30 2024-01-23 北京林业大学 Automatic correcting system and method for simple answer questions
CN110175585A (en) * 2019-05-30 2019-08-27 北京林业大学 It is a kind of letter answer correct system and method automatically
CN110246098A (en) * 2019-05-31 2019-09-17 暨南大学 A kind of reconstruction of fragments method
CN110956033A (en) * 2019-12-04 2020-04-03 北京中电普华信息技术有限公司 Text similarity calculation method and device
CN110990537B (en) * 2019-12-11 2023-06-27 中山大学 Sentence similarity calculation method based on edge information and semantic information
CN110990537A (en) * 2019-12-11 2020-04-10 中山大学 Sentence similarity calculation method based on edge information and semantic information
CN111161730B (en) * 2019-12-27 2022-10-04 中国联合网络通信集团有限公司 Voice instruction matching method, device, equipment and storage medium
CN111161730A (en) * 2019-12-27 2020-05-15 中国联合网络通信集团有限公司 Voice instruction matching method, device, equipment and storage medium
CN111274809A (en) * 2020-02-20 2020-06-12 苏宁云计算有限公司 Method and device for processing linguistic data in knowledge base
CN113569570A (en) * 2020-04-29 2021-10-29 阿里巴巴集团控股有限公司 Named entity identification method and device and electronic equipment
CN114691362A (en) * 2022-03-22 2022-07-01 重庆邮电大学 Edge calculation method for compromising time delay and energy consumption
CN114691362B (en) * 2022-03-22 2024-04-30 重庆邮电大学 Edge computing method for time delay and energy consumption compromise

Similar Documents

Publication Publication Date Title
CN106844331A (en) Sentence similarity calculation method and system
Franco-Salvador et al. A systematic study of knowledge graph analysis for cross-language plagiarism detection
US9792277B2 (en) System and method for determining the meaning of a document with respect to a concept
CN109783806B (en) Text matching method utilizing semantic parsing structure
Saloot et al. An architecture for Malay Tweet normalization
CN103646112A (en) Dependency parsing field self-adaption method based on web search
CN103688254B (en) Error-detecting system based on example, method and error-detecting facility for assessment writing automatically
Hussein Arabic document similarity analysis using n-grams and singular value decomposition
CN108959630A (en) A kind of character attribute abstracting method towards English without structure text
Hussein Visualizing document similarity using n-grams and latent semantic analysis
Bella et al. Domain-based sense disambiguation in multilingual structured data
Dung Natural language understanding
JP2012043294A (en) Binomial relationship categorization program, method, and device for categorizing semantically similar word pair by binomial relationship
Kessler et al. Extraction of terminology in the field of construction
Garrido et al. GEO-NASS: A semantic tagging experience from geographical data on the media
CN113963748A (en) Protein knowledge map vectorization method
Selvaretnam et al. A linguistically driven framework for query expansion via grammatical constituent highlighting and role-based concept weighting
Hathout Acquisition of morphological families and derivational series from a machine readable dictionary
Korobkin et al. Prior art candidate search on base of statistical and semantic patent analysis
CN107818078B (en) Semantic association and matching method for Chinese natural language dialogue
Cole et al. A lightweight tool for automatically extracting causal relationships from text
Kalender et al. THINKER-entity linking system for Turkish language
Singh et al. An Insight into Word Sense Disambiguation Techniques
Soman et al. A comparative review of the challenges encountered in sentiment analysis of Indian regional language tweets vs English language tweets
Chaabene et al. Semantic annotation for the “on demand graphical representation” of variable data in Web documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170613

RJ01 Rejection of invention patent application after publication