CN106445920A - Sentence similarity calculation method based on sentence meaning structure characteristics - Google Patents

Sentence similarity calculation method based on sentence meaning structure characteristics Download PDF

Info

Publication number
CN106445920A
CN106445920A CN201610867254.2A CN201610867254A CN106445920A CN 106445920 A CN106445920 A CN 106445920A CN 201610867254 A CN201610867254 A CN 201610867254A CN 106445920 A CN106445920 A CN 106445920A
Authority
CN
China
Prior art keywords
sentence
rightarrow
word
topic
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610867254.2A
Other languages
Chinese (zh)
Inventor
罗森林
陈倩柔
潘丽敏
原玉娇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201610867254.2A priority Critical patent/CN106445920A/en
Publication of CN106445920A publication Critical patent/CN106445920A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a sentence similarity calculation method based on sentence meaning structure characteristics, aiming to solve the problem of characteristic sparsity in social short-text sentence similarity calculation. The sentence similarity calculation method includes analyzing the meaning of a sentence according to a sentence meaning structure model, digging potential thematic knowledge according to a thematic model, expanding sentence characteristics according to theme-word distribution to obtain a sentence vector based on the sentence characteristics, introducing a Paragraph Vector deep study model to study the context characteristics of the sentence, acquiring a sentence vector based on context information, and weighing sentence similarity obtained from calculation of the two sentence vectors. The sentence similarity calculation has the advantages that semantic information and the context information of the sentence are dug deeply, so that internal relations among sentences are described comprehensively and accurately, and accuracy in similarity calculation is improved.

Description

Sentence similarity computational methods using sentence justice architectural feature
Technical field
The sentence similarity computational methods of sentence justice architectural feature are the present invention relates to the use of, belongs to computer science and natural language Speech process field.
Background technology
Sentence similarity calculates the semantic similitude degree for weighing two content of text, is information in natural language processing The basic link of the tasks such as retrieval, autoabstract.With the fast development of social network sites, the social short text with microblogging as representative Emerge in multitude, its length is short and small, representation is diversified, as the structured message for lacking lengthy document causes traditional sentence phase The sentence similarity that such short text cannot be directly applied for like degree computational methods is calculated.
At present, according to the depth difference to sentence semantic analysis, for the Similarity Measure side of sentence in social short text Method mainly includes word-based feature, is based on meaning of a word feature and based on three class of syntactic analysis feature.
The method of word-based feature is the sentence similarity computational methods of early stage, and sentence is mainly considered as word by the method Linear combination, using the word surface layer informations such as the word frequency of the means calculating sentence of statistics, part of speech, the long, word order of sentence, typical method bag Jaccard Similarity Coefficient string matching is included, which passes through to count the same words number for including in two sentences Sentence is characterized as vector as the similarity of sentence, TF-IDF word frequency statisticses method by mesh, calculates COS distance as similarity As a result.
Method based on meaning of a word feature from the angle of semantic analysis, by catching the semanteme of word by semantic knowledge resource Information.According to the resource difference for utilizing, the method being divided into based on semantic dictionary and the method based on corpus.Based on semantic dictionary Method be mainly lexical data base by WordNet, HowNet etc. based on meaning of a word organized words information, in conjunction with word sense disambiguation Technology mining sentence in the expressed connotation under given context of co-text of word, the semanteme so as to improve whole sentence divides Resolution.Method based on corpus is pushed away with the common probability size for occurring of two words mainly by introducing language model framework Break its similarity, and conventional technology is will using potential linguistic analysiss (Latent Semantic Analysis, LSA) method Word-document matrix carries out singular value decomposition and realizes the space reflection that high dimensional feature represents that low-dimensional latent semantic space represents.
Method based on syntactic analysis feature, should by carrying out, to sentence, the similarity that overall structural analyses judge sentence Method advocates that sentence center word aroused in interest is the key for arranging other compositions, and core verb itself is not arranged by any composition, and Other sentence constituents are arranged by core verb, excavate the semantic letter of sentence by analyzing the dependence between word Breath, generally only calculates the effectively word such as verb, nouns and adjectives and is directly attached to what effective word was constituted in practical application Arrange in pairs or groups between similarity estimating the similarity of sentence, avoid increasing the deviation that noise data brings result with this.
Although above-mentioned all kinds of methods calculate the similarity of sentence from different analysis levels, the notional word of social short text is relatively Few, any sentence structure analysis and the excavation to sentence semantic information is not added with, only leans on the statistics to surface layer informations such as word frequency, morphologies The deep information of word cannot be distinguished.Method based on meaning of a word feature is although it is contemplated that the semantic information of word, but the method is received Be limited to external semantic resource, social short text is ageing strong comprising substantial amounts of unregistered word, content, due to dictionary comprehensively and The sparse impact of feature often results in the indefinite problem of semantic information.Method based on syntactic feature receives current syntactic analysis skill The jejune restriction of art, does not account for sentence contextual information and Deep Semantics information, and the disappearance of information is tied to Similarity Measure The accuracy of fruit brings unpredictable impact.
Content of the invention
The present invention is that to solve feature that social short text sentence similarity calculates sparse and do not account for Deep Semantics information Problem, propose using sentence justice architectural feature sentence similarity computational methods.Consider sentence semantic information and on On the premise of context information, by multiple information Weighted Fusion, make sentence information more comprehensive, sentence semantics letter is excavated by depth Breath, makes sentence similarity result of calculation not affected by form of presentation, calculates the association journey of sentence semantics more comprehensively, exactly Degree.
The design principle of the present invention is:1) sentence justice structural model (Chinese Semantic Structure is based on Model, CSM) parsing sentence semanteme, extracts sentence justice composition, using Latent Dirichlet Allocation (LDA) theme Model excavates potential thematic knowledge, expands, according to knowledge base, the feature 1112 that sentence justice composition corresponds to dimension, obtains based on sentence The sentence vector of semantic information itself;2) Paragraph Vector (PV) deep learning model adaptation ground learning text is introduced Feature, obtains the sentence vector based on sentence contextual information;3) it is calculated between sentence respectively using two kinds of sentence vectors Similarity, and linear weighted function is carried out, optimized coefficients are adjusted by gridding method, makes sentence similarity result of calculation more accurate.
Comprise the following steps that:
Step 1, carries out pretreatment to social assigned short text set, first carries out subordinate sentence, then carries out participle and part-of-speech tagging, goes to stop Word.
Step 2, based on using CSM to the sentence justice results of structural analysis per bar sentence and using LDA topic model to short essay This collection is analyzed the theme for obtaining and word distribution, carries out feature expansion to sentence, and calculates sentence similarity.
Step 2.1, on the basis of step 1, to carrying out sentence justice structural analyses per bar sentence, extracts the topic of sentence, states Topic, elemental term, general term.The semantic expressiveness of whole sentence justice is the form of structure tree by CSM, is embodied as sentence pattern layer, description Layer, object layer and four levels of levels of detail.Sentence pattern layer indicates the sentence justice type of sentence, adopted, compound including simple sentence justice, complex sentence Type in adopted, the multiple sentence justice four of sentence;Include topic in describing layer and topic is stated, topic is the adopted Preliminary division of distich with topic is stated, and is Essential sentence justice composition in the adopted structure of sentence, topic is defined as the topic for being described object, stating that topic is defined as in sentence justice in sentence justice Description content;Comprising predicate, elemental term, general term, semantic lattice in object layer, semantic lattice are the semantic taggers to word, bag 7 kinds of fundamental mesh and 12 kinds of general lattice are included, elemental term is defined as in sentence justice having the composition for directly contacting with predicate, constitutes a sentence Subsemantic trunk, its corresponding semanteme lattice is the ornamental equivalent that fundamental mesh, general term is defined as in sentence justice, its corresponding semanteme Lattice are general lattice;Amplification implication comprising sentence in levels of detail.
Step 2.2, is analyzed to assigned short text set using LDA topic model, is excavated the potential thematic knowledge in text, is carried The theme in text and the distribution of the word under theme is taken, obtains text-theme matrix and theme-word matrix.LDA topic model The theme in text can be obtained, can be used to the word in text is divided, the word under same subject has identical Or similar semanteme.
Step 2.3, carries out feature expansion according to topic to sentence, obtains the sentence vector based on topic.If two phases Same word each acts as topic in sentence and states a part for topic, then it is assumed that the two words have different semantemes, fixed The two words adopted are different words, according to this definition, when carrying out feature to sentence and expanding, according to topic and should state topic respectively Part carries out feature expansion to sentence.The feature of the topic part of sentence expands concrete grammar:Base topic under is extracted first This and the corresponding word of general term, then according to the theme for obtaining in step 2.2-word matrix, compare word different main Probability under topic, chooses probability highest theme, other words under the theme is added in sentence, as one of sentence Point, finally use all words of sentence as feature, construction feature vector representation sentence, wherein in sentence corresponding to original word Dimension on value for word the occurrence number in sentence, and expand word corresponding to dimension on value press formula (1) calculated,
V=n*w (1)
V is to expand the value that word is corresponded in dimension, and n is to expand the number of times that word occurs in sentence, and w is for expanding word Probit under corresponding theme.
Step 2.4, by the method for step 2.3, carries out feature expansion according to topic is stated to sentence, obtains based on the sentence for stating topic Vector.
Step 2.5, is based respectively on step 2.3 and the 2.4 two kinds of sentence vectors for obtaining calculate sentence similarity, to two phases It is weighted like angle value, the final Similarity value between sentence is obtained, specific formula for calculation is as follows,
Wherein, SAAnd SBRepresent any two sentence, sim1 (SA,SB) represent two sentences Similarity value,With Represent sentence S respectivelyAAnd SBBased on topic sentence vector,WithRepresent sentence S respectivelyAAnd SBExpression based on stating topic Sentence vector, ω be adjustable parameter, span be [0,1], for adjusting the weight coefficient of two kinds of similarities.
Step 3, will be through the pretreated all sentence inputting of step 1 to PV deep learning model, using PV model Text feature is practised, sentence vector is obtained, and based on the COS distance between the sentence vector calculating sentence as similar between sentence Degree, computing formula is as follows,
Wherein, SAAnd SBRepresent any two sentence, sim2 (SA,SB) represent two sentences Similarity value,With Represent the sentence vector for being obtained with PV model learning respectively.PV model is a kind of non-supervisory learning style, and input is arbitrarily long The text (text can be the arbitrary forms such as article, paragraph, sentence here, be referred to as text) of degree, output is then corresponding text Continuous distribution formula vector representation, be similar to the principle of word2vec term vector, on the basis of semantic and word order information is retained By obtaining the vector representation of effective sentence or chapter to feature learning, PV model energy effectively solving bag of words are not examined Consider the problem of the meaning of a word and word order, the vector dimension of generation is dense, also can effectively overcome the feature of the sentence expression of short text sparse Problem.
Step 4, the Similarity value between the sentence that step 2 and step 3 are obtained carries out linear weighted function, is adjusted by gridding method Parameter, finds one group of optimum parameter value, exports the Similarity value between final sentence pair, and computing formula is as follows,
sim(SA,SB)=θ * sim1 (SA,SB)+(1-θ)*sim2(SA,SB) (4)
Wherein, SAAnd SBRepresent any two sentence, sim (SA,SB) represent two sentences Similarity value, θ be adjustable ginseng Number, span is [0,1], sim1 (SA,SB) and sim2 (SA,SB) be calculated by formula (2) and (3) respectively.According to formula (4), in conjunction with formula (2) and (3), complete sentence similarity computing formula is:
ω and θ are adjustable parameters, and span is all [0,1], using gridding method according to the calculating of sentence similarity or Application result carries out tuning to parameter, takes optimal value of the parameter.
Beneficial effect
The sentence similarity computational methods of the present invention effectively reduce the loss of semantic information, more comprehensively, exactly The internal relation between sentence is featured, and the context for sentence being excavated by depth makes sentence similar with inherent semantic structure feature Degree calculates the form of presentation for being not directly dependent on sentence, improves the accuracy rate of result of calculation.
Specific embodiment
In order to better illustrate objects and advantages of the present invention, with reference to embodiment party of the instantiation to the inventive method Formula is described in further details.
Test using the NLP&&CC meeting language for extracting towards Chinese microblogging viewpoint key element disclosed in evaluation and test task of 2013 Material.Therefrom 5 topics of random choose, totally 10896 sentences are used as short essay collection, using sentence similarity is calculated be applied to short Text cluster simultaneously evaluates the mode of Clustering Effect, and the effect of sentence Similarity Measure is evaluated.Commenting for Clustering Effect Valency, is weighed using silhouette coefficient (Silhouette Coefficient) index, and this concept of silhouette coefficient is earliest by Peter J.Pousseeuw was proposed in 1986, and it judges Clustering Effect with reference to cohesion degree and two kinds of factors of separating degree.
The calculation procedure of silhouette coefficient is as follows:
(1) for i-th object, the average distance of its other object in affiliated cluster is calculated, is designated as ai.
(2) for i-th object, the object is calculated to the average departure of all objects in any cluster not comprising the object From the minima that finds out in each cluster is designated as bi.
(3) for i-th object, silhouette coefficient is designated as si, shown in computational methods such as formula (6).
Silhouette coefficient span is [- 1,1], from formula (6) if as can be seen that si< 0, show i-th object and The average distance between element inside same cluster is less than other clusters, and Clustering Effect is inaccurate.If aiValue tend to 0, Or biSufficiently large, then siValue is closer to 1, illustrates after cluster in cluster that data are tightr, overstepping the bounds of propriety from obvious difference between cluster, Clustering Effect is better.
Specific implementation step is:
Step 1, carries out subordinate sentence to social assigned short text set, then carries out participle to each sentence using ICTCLAS2015 And part-of-speech tagging, according to the deactivation vocabulary that downloads from Internet, remove the stop words in text.
Step 2, using CSM to carrying out sentence justice structural analyses in assigned short text set per bar sentence, and utilizes LDA topic model Assigned short text set is analyzed, and theme and the word distribution of short text is obtained, feature rich is carried out to sentence, and calculates sentence phase Like degree.
Step 2.1, on the basis of step 1, to carrying out sentence justice structural analyses per bar sentence, extracts the topic of sentence, states Topic, elemental term, general term.
Step 2.2, is analyzed to assigned short text set using LDA topic model, is extracted under the theme and theme in text Word is distributed, and obtains theme-word matrix.
Step 2.3, carries out feature expansion according to topic to sentence, obtains the sentence vector based on topic.Concrete grammar is: Elemental term and general term corresponding word topic under is extracted first, then according to the theme for obtaining in step 2.2-word square Battle array, compares probability of the word under different themes, chooses probability highest theme, other words under the theme are added to sentence In son, as a part for sentence, all words of sentence are finally used as feature, construction feature vector representation sentence, its The value in dimension in middle sentence corresponding to original word is the occurrence number in sentence of word, and corresponding to the word for expanding Dimension on value calculated by formula (1),
Step 2.4, by the method for step 2.3, carries out feature expansion according to topic is stated to sentence, obtains based on the sentence for stating topic Vector.
Step 2.5, is based respectively on step 2.3 and the 2.4 two kinds of sentence vectors for obtaining calculate sentence similarity, to two phases Be weighted like angle value, the final Similarity value between sentence is obtained by formula (2).
Step 3, will be through the pretreated all sentence inputting of step 1 to PV deep learning model, using PV model Text feature is practised, sentence vector is obtained, and based on the COS distance between the sentence vector calculating sentence as similar between sentence Degree, wherein the parameter in PV model is all using the default value in instrument.
Step 4, the Similarity value between the sentence that step 2 and step 3 are obtained carries out linear weighted function, is adjusted by gridding method Parameter ω and θ, select one group of optimum parameter.
Clustering Effect to 5 topics, the vector length size=100 in PV model, length of window window=5, When ω takes 0.33, θ takes 0.25, and silhouette coefficient reaches optimal effectiveness 0.45;When θ takes 0, i.e., only consider based on CSM sentence justice structure The Similarity Measure result that analysis is obtained, silhouette coefficient reaches 0.42;When θ takes 1, i.e., only consideration relies on the sentence that PV analysis is obtained Sub- similarity result, silhouette coefficient reaches 0.31.Test result indicate that the sentence vector for being obtained using CSM can include deeper The internal semantic information of secondary sentence, PV model makes sentence vector obtain abundant contextual information, has both considered that itself is semantic Information is while the also sentence similarity computational methods comprising contextual information more can accurately weigh the similarity degree between sentence.

Claims (3)

1., using the sentence similarity computational methods of sentence justice architectural feature, the method comprising the steps of:
Step 1, carries out pretreatment to assigned short text set, first carries out subordinate sentence, then carries out participle and part-of-speech tagging, remove stop words;
Step 2, in conjunction with sentence justice architectural feature and theme-word distribution characteristicss, carries out feature expansion, and calculates sentence phase to sentence Like degree;
Step 2.1, on the basis of step 1, to carrying out sentence justice structural analyses per bar sentence, extracts the topic of sentence, states topic, base This, general term;
Step 2.2, is analyzed to assigned short text set, is carried using LDA (Latent Dirichlet Allocation) topic model The theme in text and the distribution of the word under theme is taken, obtains theme-word matrix;
Step 2.3, carries out feature expansion according to topic to sentence, obtains the sentence vector based on topic;
Step 2.4, carries out feature expansion according to topic is stated to sentence, obtains based on the sentence vector for stating topic;
Step 2.5, is based respectively on step 2.3 and the 2.4 two kinds of sentence vectors for obtaining calculate sentence similarity, to two similarities Value is weighted, and obtains the final Similarity value between sentence, and specific formula for calculation is as follows,
s i m 1 ( S A , S B ) = ω * S A t → · S B t → | S A t → | | S B t → | + ( 1 - ω ) * S A c → · S B c → | S A c → | | S B c → |
Wherein, SAAnd SBRepresent any two sentence, sim1 (SA,SB) represent two sentences Similarity value,WithRespectively Represent sentence SAAnd SBBased on topic sentence vector,WithRepresent sentence S respectivelyAAnd SBExpression based on the sentence for stating topic Subvector, it is [0,1] that ω is adjustable parameter, span;
Step 3, will be through the pretreated all sentence inputting of step 1 to PV (Paragraph Vector) deep learning mould Type, using PV model learning text feature, is obtained sentence vector, and is calculated the COS distance work between sentence based on the sentence vector For the similarity between sentence, computing formula is as follows,
s i m 2 ( S A , S B ) = S A p → · S B p → | S A p → | | S B p → |
Wherein, SAAnd SBRepresent any two sentence, sim2 (SA,SB) represent two sentences Similarity value,WithRespectively Represent the sentence vector for being obtained with PV model learning;
Step 4, the Similarity value between the sentence that step 2 and step 3 are obtained carries out linear weighted function, adjusts ginseng by gridding method Number, finds one group of optimum parameter value, exports the Similarity value between final sentence pair.
2. sentence similarity computational methods of utilization sentence according to claim 1 justice architectural feature, it is characterised in that step Feature expansion concrete grammar being carried out to sentence based on topic in 2.3 is:The elemental term that extracts under topic first is corresponding with general term Word, then theme-word matrix for obtaining of short essay collection is analyzed according to LDA, compares probability of the word under different themes, choosing Probability highest theme is taken, other words under the theme are added in sentence, as a part for sentence, finally, use All words of sentence as feature, on construction feature vector representation sentence, the wherein dimension in sentence corresponding to original word Value is the occurrence number in sentence of word, and the value in the dimension corresponding to the word for expanding is counted as follows Calculate,
V=n*w
V is to expand the value that word is corresponded in dimension, and n is to expand the number of times that word occurs in sentence, and w is for expanding word right Answer the probit under theme;
In step 2.4 based on state topic carry out feature expansion to sentence method similar to the side for sentence being expanded based on topic Method.
3. sentence similarity computational methods of utilization sentence according to claim 1 justice architectural feature, it is characterised in that step The Similarity-Weighted for being obtained by the similarity for being obtained based on CSM and based on PV in 4 is merged, and specific formula for calculation is:
sim(SA,SB)=θ * sim1 (SA,SB)+(1-θ)*sim2(SA,SB)
Wherein, SAAnd SBRepresent any two sentence, sim (SA,SB) represent two sentences Similarity value, θ be adjustable parameter, take Value scope is [0,1], and in conjunction with the formula in the step 2.5 in claim 1 and step 3, complete sentence similarity calculates public Formula is:
s i m ( S A , S B ) = θ * [ ω * S A t → · S B t → | S A t → | | S B t → | + ( 1 - ω ) * S A c → · S B c → | S A c → | | S B c → | ] + ( 1 - θ ) * S A p → · S B p → | S A p → | | S B p → |
ω and θ are adjustable parameters, and span is all [0,1].
CN201610867254.2A 2016-09-29 2016-09-29 Sentence similarity calculation method based on sentence meaning structure characteristics Pending CN106445920A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610867254.2A CN106445920A (en) 2016-09-29 2016-09-29 Sentence similarity calculation method based on sentence meaning structure characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610867254.2A CN106445920A (en) 2016-09-29 2016-09-29 Sentence similarity calculation method based on sentence meaning structure characteristics

Publications (1)

Publication Number Publication Date
CN106445920A true CN106445920A (en) 2017-02-22

Family

ID=58172480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610867254.2A Pending CN106445920A (en) 2016-09-29 2016-09-29 Sentence similarity calculation method based on sentence meaning structure characteristics

Country Status (1)

Country Link
CN (1) CN106445920A (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273474A (en) * 2017-06-08 2017-10-20 成都数联铭品科技有限公司 Autoabstract abstracting method and system based on latent semantic analysis
CN108009152A (en) * 2017-12-04 2018-05-08 陕西识代运筹信息科技股份有限公司 A kind of data processing method and device of the text similarity analysis based on Spark-Streaming
CN108287824A (en) * 2018-03-07 2018-07-17 北京云知声信息技术有限公司 Semantic similarity calculation method and device
CN108363816A (en) * 2018-03-21 2018-08-03 北京理工大学 Open entity relation extraction method based on sentence justice structural model
CN108509408A (en) * 2017-02-27 2018-09-07 芋头科技(杭州)有限公司 A kind of sentence similarity judgment method
CN109101489A (en) * 2018-07-18 2018-12-28 武汉数博科技有限责任公司 A kind of text automatic abstracting method, device and a kind of electronic equipment
CN109145299A (en) * 2018-08-16 2019-01-04 北京金山安全软件有限公司 Text similarity determination method, device, equipment and storage medium
CN109857990A (en) * 2018-12-18 2019-06-07 重庆邮电大学 A kind of financial class notice information abstracting method based on file structure and deep learning
CN110008465A (en) * 2019-01-25 2019-07-12 网经科技(苏州)有限公司 The measure of sentence semantics distance
CN110020421A (en) * 2018-01-10 2019-07-16 北京京东尚科信息技术有限公司 The session information method of abstracting and system of communication software, equipment and storage medium
CN110287291A (en) * 2019-07-03 2019-09-27 桂林电子科技大学 A kind of unsupervised English short essay sentence is digressed from the subject analysis method
CN110348133A (en) * 2019-07-15 2019-10-18 西南交通大学 A kind of bullet train three-dimensional objects structure technology effect figure building system and method
CN110413761A (en) * 2019-08-06 2019-11-05 浩鲸云计算科技股份有限公司 A kind of method that the territoriality in knowledge based library is individually talked with
CN110765360A (en) * 2019-11-01 2020-02-07 新华网股份有限公司 Text topic processing method and device, electronic equipment and computer storage medium
CN110895656A (en) * 2018-09-13 2020-03-20 武汉斗鱼网络科技有限公司 Text similarity calculation method and device, electronic equipment and storage medium
CN110990537A (en) * 2019-12-11 2020-04-10 中山大学 Sentence similarity calculation method based on edge information and semantic information
CN110990451A (en) * 2019-11-15 2020-04-10 浙江大华技术股份有限公司 Data mining method, device and equipment based on sentence embedding and storage device
CN111008783A (en) * 2019-12-05 2020-04-14 浙江工业大学 Factory processing flow recommendation method based on singular value decomposition
CN111078849A (en) * 2019-12-02 2020-04-28 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN111125301A (en) * 2019-11-22 2020-05-08 泰康保险集团股份有限公司 Text method and device, electronic equipment and computer readable storage medium
CN111209375A (en) * 2020-01-13 2020-05-29 中国科学院信息工程研究所 Universal clause and document matching method
CN111368532A (en) * 2020-03-18 2020-07-03 昆明理工大学 Topic word embedding disambiguation method and system based on LDA
CN112650836A (en) * 2020-12-28 2021-04-13 成都网安科技发展有限公司 Text analysis method and device based on syntax structure element semantics and computing terminal
CN112686025A (en) * 2021-01-27 2021-04-20 浙江工商大学 Chinese choice question interference item generation method based on free text
CN113536907A (en) * 2021-06-06 2021-10-22 南京理工大学 Social relationship identification method and system based on deep supervised feature selection
CN116756347A (en) * 2023-08-21 2023-09-15 中国标准化研究院 Semantic information retrieval method based on big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778256A (en) * 2015-04-20 2015-07-15 江苏科技大学 Rapid incremental clustering method for domain question-answering system consultations
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
CN105573985A (en) * 2016-03-04 2016-05-11 北京理工大学 Sentence expression method based on Chinese sentence meaning structural model and topic model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778256A (en) * 2015-04-20 2015-07-15 江苏科技大学 Rapid incremental clustering method for domain question-answering system consultations
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
CN105573985A (en) * 2016-03-04 2016-05-11 北京理工大学 Sentence expression method based on Chinese sentence meaning structural model and topic model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TOMAS等: "UWB at SemEval-2016 Task 1:Semantic Textual Similarity using Lexical,Syntactic,and Semantic Information", 《PROCEEDINGS OF SEMEVAL-2016》 *
YUHUA LI 等: "Sentence Similarity Based on Semantic Nets and Corpus Statistics", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 *
林萌 等: "融合句义结构模型的微博话题摘要算法", 《浙江大学学报(工学版)》 *

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509408B (en) * 2017-02-27 2019-11-22 芋头科技(杭州)有限公司 A kind of sentence similarity judgment method
CN108509408A (en) * 2017-02-27 2018-09-07 芋头科技(杭州)有限公司 A kind of sentence similarity judgment method
CN107273474A (en) * 2017-06-08 2017-10-20 成都数联铭品科技有限公司 Autoabstract abstracting method and system based on latent semantic analysis
CN108009152A (en) * 2017-12-04 2018-05-08 陕西识代运筹信息科技股份有限公司 A kind of data processing method and device of the text similarity analysis based on Spark-Streaming
CN110020421A (en) * 2018-01-10 2019-07-16 北京京东尚科信息技术有限公司 The session information method of abstracting and system of communication software, equipment and storage medium
CN108287824A (en) * 2018-03-07 2018-07-17 北京云知声信息技术有限公司 Semantic similarity calculation method and device
CN108363816A (en) * 2018-03-21 2018-08-03 北京理工大学 Open entity relation extraction method based on sentence justice structural model
CN109101489A (en) * 2018-07-18 2018-12-28 武汉数博科技有限责任公司 A kind of text automatic abstracting method, device and a kind of electronic equipment
CN109101489B (en) * 2018-07-18 2022-05-20 武汉数博科技有限责任公司 Text automatic summarization method and device and electronic equipment
CN109145299A (en) * 2018-08-16 2019-01-04 北京金山安全软件有限公司 Text similarity determination method, device, equipment and storage medium
CN109145299B (en) * 2018-08-16 2022-06-21 北京金山安全软件有限公司 Text similarity determination method, device, equipment and storage medium
CN110895656A (en) * 2018-09-13 2020-03-20 武汉斗鱼网络科技有限公司 Text similarity calculation method and device, electronic equipment and storage medium
CN110895656B (en) * 2018-09-13 2023-12-29 北京橙果转话科技有限公司 Text similarity calculation method and device, electronic equipment and storage medium
CN109857990B (en) * 2018-12-18 2022-11-25 重庆邮电大学 Financial bulletin information extraction method based on document structure and deep learning
CN109857990A (en) * 2018-12-18 2019-06-07 重庆邮电大学 A kind of financial class notice information abstracting method based on file structure and deep learning
CN110008465A (en) * 2019-01-25 2019-07-12 网经科技(苏州)有限公司 The measure of sentence semantics distance
CN110287291B (en) * 2019-07-03 2021-11-02 桂林电子科技大学 Unsupervised method for analyzing running questions of English short sentences
CN110287291A (en) * 2019-07-03 2019-09-27 桂林电子科技大学 A kind of unsupervised English short essay sentence is digressed from the subject analysis method
CN110348133A (en) * 2019-07-15 2019-10-18 西南交通大学 A kind of bullet train three-dimensional objects structure technology effect figure building system and method
CN110348133B (en) * 2019-07-15 2022-08-19 西南交通大学 System and method for constructing high-speed train three-dimensional product structure technical effect diagram
CN110413761A (en) * 2019-08-06 2019-11-05 浩鲸云计算科技股份有限公司 A kind of method that the territoriality in knowledge based library is individually talked with
CN110765360B (en) * 2019-11-01 2022-08-02 新华网股份有限公司 Text topic processing method and device, electronic equipment and computer storage medium
CN110765360A (en) * 2019-11-01 2020-02-07 新华网股份有限公司 Text topic processing method and device, electronic equipment and computer storage medium
CN110990451A (en) * 2019-11-15 2020-04-10 浙江大华技术股份有限公司 Data mining method, device and equipment based on sentence embedding and storage device
CN110990451B (en) * 2019-11-15 2023-05-12 浙江大华技术股份有限公司 Sentence embedding-based data mining method, device, equipment and storage device
CN111125301A (en) * 2019-11-22 2020-05-08 泰康保险集团股份有限公司 Text method and device, electronic equipment and computer readable storage medium
CN111078849A (en) * 2019-12-02 2020-04-28 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN111008783B (en) * 2019-12-05 2022-03-18 浙江工业大学 Factory processing flow recommendation method based on singular value decomposition
CN111008783A (en) * 2019-12-05 2020-04-14 浙江工业大学 Factory processing flow recommendation method based on singular value decomposition
CN110990537B (en) * 2019-12-11 2023-06-27 中山大学 Sentence similarity calculation method based on edge information and semantic information
CN110990537A (en) * 2019-12-11 2020-04-10 中山大学 Sentence similarity calculation method based on edge information and semantic information
CN111209375A (en) * 2020-01-13 2020-05-29 中国科学院信息工程研究所 Universal clause and document matching method
CN111209375B (en) * 2020-01-13 2023-01-17 中国科学院信息工程研究所 Universal clause and document matching method
CN111368532A (en) * 2020-03-18 2020-07-03 昆明理工大学 Topic word embedding disambiguation method and system based on LDA
CN111368532B (en) * 2020-03-18 2022-12-09 昆明理工大学 Topic word embedding disambiguation method and system based on LDA
CN112650836A (en) * 2020-12-28 2021-04-13 成都网安科技发展有限公司 Text analysis method and device based on syntax structure element semantics and computing terminal
CN112686025A (en) * 2021-01-27 2021-04-20 浙江工商大学 Chinese choice question interference item generation method based on free text
CN112686025B (en) * 2021-01-27 2023-09-19 浙江工商大学 Chinese choice question interference item generation method based on free text
CN113536907A (en) * 2021-06-06 2021-10-22 南京理工大学 Social relationship identification method and system based on deep supervised feature selection
CN116756347A (en) * 2023-08-21 2023-09-15 中国标准化研究院 Semantic information retrieval method based on big data
CN116756347B (en) * 2023-08-21 2023-10-27 中国标准化研究院 Semantic information retrieval method based on big data

Similar Documents

Publication Publication Date Title
CN106445920A (en) Sentence similarity calculation method based on sentence meaning structure characteristics
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
CN103617280B (en) Method and system for mining Chinese event information
CN104391942B (en) Short essay eigen extended method based on semantic collection of illustrative plates
CN103136359B (en) Single document abstraction generating method
CN110020189A (en) A kind of article recommended method based on Chinese Similarity measures
CN107247780A (en) A kind of patent document method for measuring similarity of knowledge based body
CN103455562A (en) Text orientation analysis method and product review orientation discriminator on basis of same
CN106156272A (en) A kind of information retrieval method based on multi-source semantic analysis
CN108549634A (en) A kind of Chinese patent text similarity calculating method
CN109858028A (en) A kind of short text similarity calculating method based on probabilistic model
CN104199972A (en) Named entity relation extraction and construction method based on deep learning
CN106484664A (en) Similarity calculating method between a kind of short text
CN105243152A (en) Graph model-based automatic abstracting method
CN103324700B (en) Noumenon concept attribute learning method based on Web information
CN103646112B (en) Dependency parsing field self-adaption method based on web search
CN105653518A (en) Specific group discovery and expansion method based on microblog data
CN103399901A (en) Keyword extraction method
CN102880723A (en) Searching method and system for identifying user retrieval intention
CN103970730A (en) Method for extracting multiple subject terms from single Chinese text
CN104008090A (en) Multi-subject extraction method based on concept vector model
CN105528437A (en) Question-answering system construction method based on structured text knowledge extraction
CN104778204A (en) Multi-document subject discovery method based on two-layer clustering
CN108038205A (en) For the viewpoint analysis prototype system of Chinese microblogging
CN105975475A (en) Chinese phrase string-based fine-grained thematic information extraction method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170222

WD01 Invention patent application deemed withdrawn after publication