CN106445920A - Sentence similarity calculation method based on sentence meaning structure characteristics - Google Patents
Sentence similarity calculation method based on sentence meaning structure characteristics Download PDFInfo
- Publication number
- CN106445920A CN106445920A CN201610867254.2A CN201610867254A CN106445920A CN 106445920 A CN106445920 A CN 106445920A CN 201610867254 A CN201610867254 A CN 201610867254A CN 106445920 A CN106445920 A CN 106445920A
- Authority
- CN
- China
- Prior art keywords
- sentence
- rightarrow
- word
- topic
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a sentence similarity calculation method based on sentence meaning structure characteristics, aiming to solve the problem of characteristic sparsity in social short-text sentence similarity calculation. The sentence similarity calculation method includes analyzing the meaning of a sentence according to a sentence meaning structure model, digging potential thematic knowledge according to a thematic model, expanding sentence characteristics according to theme-word distribution to obtain a sentence vector based on the sentence characteristics, introducing a Paragraph Vector deep study model to study the context characteristics of the sentence, acquiring a sentence vector based on context information, and weighing sentence similarity obtained from calculation of the two sentence vectors. The sentence similarity calculation has the advantages that semantic information and the context information of the sentence are dug deeply, so that internal relations among sentences are described comprehensively and accurately, and accuracy in similarity calculation is improved.
Description
Technical field
The sentence similarity computational methods of sentence justice architectural feature are the present invention relates to the use of, belongs to computer science and natural language
Speech process field.
Background technology
Sentence similarity calculates the semantic similitude degree for weighing two content of text, is information in natural language processing
The basic link of the tasks such as retrieval, autoabstract.With the fast development of social network sites, the social short text with microblogging as representative
Emerge in multitude, its length is short and small, representation is diversified, as the structured message for lacking lengthy document causes traditional sentence phase
The sentence similarity that such short text cannot be directly applied for like degree computational methods is calculated.
At present, according to the depth difference to sentence semantic analysis, for the Similarity Measure side of sentence in social short text
Method mainly includes word-based feature, is based on meaning of a word feature and based on three class of syntactic analysis feature.
The method of word-based feature is the sentence similarity computational methods of early stage, and sentence is mainly considered as word by the method
Linear combination, using the word surface layer informations such as the word frequency of the means calculating sentence of statistics, part of speech, the long, word order of sentence, typical method bag
Jaccard Similarity Coefficient string matching is included, which passes through to count the same words number for including in two sentences
Sentence is characterized as vector as the similarity of sentence, TF-IDF word frequency statisticses method by mesh, calculates COS distance as similarity
As a result.
Method based on meaning of a word feature from the angle of semantic analysis, by catching the semanteme of word by semantic knowledge resource
Information.According to the resource difference for utilizing, the method being divided into based on semantic dictionary and the method based on corpus.Based on semantic dictionary
Method be mainly lexical data base by WordNet, HowNet etc. based on meaning of a word organized words information, in conjunction with word sense disambiguation
Technology mining sentence in the expressed connotation under given context of co-text of word, the semanteme so as to improve whole sentence divides
Resolution.Method based on corpus is pushed away with the common probability size for occurring of two words mainly by introducing language model framework
Break its similarity, and conventional technology is will using potential linguistic analysiss (Latent Semantic Analysis, LSA) method
Word-document matrix carries out singular value decomposition and realizes the space reflection that high dimensional feature represents that low-dimensional latent semantic space represents.
Method based on syntactic analysis feature, should by carrying out, to sentence, the similarity that overall structural analyses judge sentence
Method advocates that sentence center word aroused in interest is the key for arranging other compositions, and core verb itself is not arranged by any composition, and
Other sentence constituents are arranged by core verb, excavate the semantic letter of sentence by analyzing the dependence between word
Breath, generally only calculates the effectively word such as verb, nouns and adjectives and is directly attached to what effective word was constituted in practical application
Arrange in pairs or groups between similarity estimating the similarity of sentence, avoid increasing the deviation that noise data brings result with this.
Although above-mentioned all kinds of methods calculate the similarity of sentence from different analysis levels, the notional word of social short text is relatively
Few, any sentence structure analysis and the excavation to sentence semantic information is not added with, only leans on the statistics to surface layer informations such as word frequency, morphologies
The deep information of word cannot be distinguished.Method based on meaning of a word feature is although it is contemplated that the semantic information of word, but the method is received
Be limited to external semantic resource, social short text is ageing strong comprising substantial amounts of unregistered word, content, due to dictionary comprehensively and
The sparse impact of feature often results in the indefinite problem of semantic information.Method based on syntactic feature receives current syntactic analysis skill
The jejune restriction of art, does not account for sentence contextual information and Deep Semantics information, and the disappearance of information is tied to Similarity Measure
The accuracy of fruit brings unpredictable impact.
Content of the invention
The present invention is that to solve feature that social short text sentence similarity calculates sparse and do not account for Deep Semantics information
Problem, propose using sentence justice architectural feature sentence similarity computational methods.Consider sentence semantic information and on
On the premise of context information, by multiple information Weighted Fusion, make sentence information more comprehensive, sentence semantics letter is excavated by depth
Breath, makes sentence similarity result of calculation not affected by form of presentation, calculates the association journey of sentence semantics more comprehensively, exactly
Degree.
The design principle of the present invention is:1) sentence justice structural model (Chinese Semantic Structure is based on
Model, CSM) parsing sentence semanteme, extracts sentence justice composition, using Latent Dirichlet Allocation (LDA) theme
Model excavates potential thematic knowledge, expands, according to knowledge base, the feature 1112 that sentence justice composition corresponds to dimension, obtains based on sentence
The sentence vector of semantic information itself;2) Paragraph Vector (PV) deep learning model adaptation ground learning text is introduced
Feature, obtains the sentence vector based on sentence contextual information;3) it is calculated between sentence respectively using two kinds of sentence vectors
Similarity, and linear weighted function is carried out, optimized coefficients are adjusted by gridding method, makes sentence similarity result of calculation more accurate.
Comprise the following steps that:
Step 1, carries out pretreatment to social assigned short text set, first carries out subordinate sentence, then carries out participle and part-of-speech tagging, goes to stop
Word.
Step 2, based on using CSM to the sentence justice results of structural analysis per bar sentence and using LDA topic model to short essay
This collection is analyzed the theme for obtaining and word distribution, carries out feature expansion to sentence, and calculates sentence similarity.
Step 2.1, on the basis of step 1, to carrying out sentence justice structural analyses per bar sentence, extracts the topic of sentence, states
Topic, elemental term, general term.The semantic expressiveness of whole sentence justice is the form of structure tree by CSM, is embodied as sentence pattern layer, description
Layer, object layer and four levels of levels of detail.Sentence pattern layer indicates the sentence justice type of sentence, adopted, compound including simple sentence justice, complex sentence
Type in adopted, the multiple sentence justice four of sentence;Include topic in describing layer and topic is stated, topic is the adopted Preliminary division of distich with topic is stated, and is
Essential sentence justice composition in the adopted structure of sentence, topic is defined as the topic for being described object, stating that topic is defined as in sentence justice in sentence justice
Description content;Comprising predicate, elemental term, general term, semantic lattice in object layer, semantic lattice are the semantic taggers to word, bag
7 kinds of fundamental mesh and 12 kinds of general lattice are included, elemental term is defined as in sentence justice having the composition for directly contacting with predicate, constitutes a sentence
Subsemantic trunk, its corresponding semanteme lattice is the ornamental equivalent that fundamental mesh, general term is defined as in sentence justice, its corresponding semanteme
Lattice are general lattice;Amplification implication comprising sentence in levels of detail.
Step 2.2, is analyzed to assigned short text set using LDA topic model, is excavated the potential thematic knowledge in text, is carried
The theme in text and the distribution of the word under theme is taken, obtains text-theme matrix and theme-word matrix.LDA topic model
The theme in text can be obtained, can be used to the word in text is divided, the word under same subject has identical
Or similar semanteme.
Step 2.3, carries out feature expansion according to topic to sentence, obtains the sentence vector based on topic.If two phases
Same word each acts as topic in sentence and states a part for topic, then it is assumed that the two words have different semantemes, fixed
The two words adopted are different words, according to this definition, when carrying out feature to sentence and expanding, according to topic and should state topic respectively
Part carries out feature expansion to sentence.The feature of the topic part of sentence expands concrete grammar:Base topic under is extracted first
This and the corresponding word of general term, then according to the theme for obtaining in step 2.2-word matrix, compare word different main
Probability under topic, chooses probability highest theme, other words under the theme is added in sentence, as one of sentence
Point, finally use all words of sentence as feature, construction feature vector representation sentence, wherein in sentence corresponding to original word
Dimension on value for word the occurrence number in sentence, and expand word corresponding to dimension on value press formula
(1) calculated,
V=n*w (1)
V is to expand the value that word is corresponded in dimension, and n is to expand the number of times that word occurs in sentence, and w is for expanding word
Probit under corresponding theme.
Step 2.4, by the method for step 2.3, carries out feature expansion according to topic is stated to sentence, obtains based on the sentence for stating topic
Vector.
Step 2.5, is based respectively on step 2.3 and the 2.4 two kinds of sentence vectors for obtaining calculate sentence similarity, to two phases
It is weighted like angle value, the final Similarity value between sentence is obtained, specific formula for calculation is as follows,
Wherein, SAAnd SBRepresent any two sentence, sim1 (SA,SB) represent two sentences Similarity value,With
Represent sentence S respectivelyAAnd SBBased on topic sentence vector,WithRepresent sentence S respectivelyAAnd SBExpression based on stating topic
Sentence vector, ω be adjustable parameter, span be [0,1], for adjusting the weight coefficient of two kinds of similarities.
Step 3, will be through the pretreated all sentence inputting of step 1 to PV deep learning model, using PV model
Text feature is practised, sentence vector is obtained, and based on the COS distance between the sentence vector calculating sentence as similar between sentence
Degree, computing formula is as follows,
Wherein, SAAnd SBRepresent any two sentence, sim2 (SA,SB) represent two sentences Similarity value,With
Represent the sentence vector for being obtained with PV model learning respectively.PV model is a kind of non-supervisory learning style, and input is arbitrarily long
The text (text can be the arbitrary forms such as article, paragraph, sentence here, be referred to as text) of degree, output is then corresponding text
Continuous distribution formula vector representation, be similar to the principle of word2vec term vector, on the basis of semantic and word order information is retained
By obtaining the vector representation of effective sentence or chapter to feature learning, PV model energy effectively solving bag of words are not examined
Consider the problem of the meaning of a word and word order, the vector dimension of generation is dense, also can effectively overcome the feature of the sentence expression of short text sparse
Problem.
Step 4, the Similarity value between the sentence that step 2 and step 3 are obtained carries out linear weighted function, is adjusted by gridding method
Parameter, finds one group of optimum parameter value, exports the Similarity value between final sentence pair, and computing formula is as follows,
sim(SA,SB)=θ * sim1 (SA,SB)+(1-θ)*sim2(SA,SB) (4)
Wherein, SAAnd SBRepresent any two sentence, sim (SA,SB) represent two sentences Similarity value, θ be adjustable ginseng
Number, span is [0,1], sim1 (SA,SB) and sim2 (SA,SB) be calculated by formula (2) and (3) respectively.According to formula
(4), in conjunction with formula (2) and (3), complete sentence similarity computing formula is:
ω and θ are adjustable parameters, and span is all [0,1], using gridding method according to the calculating of sentence similarity or
Application result carries out tuning to parameter, takes optimal value of the parameter.
Beneficial effect
The sentence similarity computational methods of the present invention effectively reduce the loss of semantic information, more comprehensively, exactly
The internal relation between sentence is featured, and the context for sentence being excavated by depth makes sentence similar with inherent semantic structure feature
Degree calculates the form of presentation for being not directly dependent on sentence, improves the accuracy rate of result of calculation.
Specific embodiment
In order to better illustrate objects and advantages of the present invention, with reference to embodiment party of the instantiation to the inventive method
Formula is described in further details.
Test using the NLP&&CC meeting language for extracting towards Chinese microblogging viewpoint key element disclosed in evaluation and test task of 2013
Material.Therefrom 5 topics of random choose, totally 10896 sentences are used as short essay collection, using sentence similarity is calculated be applied to short
Text cluster simultaneously evaluates the mode of Clustering Effect, and the effect of sentence Similarity Measure is evaluated.Commenting for Clustering Effect
Valency, is weighed using silhouette coefficient (Silhouette Coefficient) index, and this concept of silhouette coefficient is earliest by Peter
J.Pousseeuw was proposed in 1986, and it judges Clustering Effect with reference to cohesion degree and two kinds of factors of separating degree.
The calculation procedure of silhouette coefficient is as follows:
(1) for i-th object, the average distance of its other object in affiliated cluster is calculated, is designated as ai.
(2) for i-th object, the object is calculated to the average departure of all objects in any cluster not comprising the object
From the minima that finds out in each cluster is designated as bi.
(3) for i-th object, silhouette coefficient is designated as si, shown in computational methods such as formula (6).
Silhouette coefficient span is [- 1,1], from formula (6) if as can be seen that si< 0, show i-th object and
The average distance between element inside same cluster is less than other clusters, and Clustering Effect is inaccurate.If aiValue tend to 0,
Or biSufficiently large, then siValue is closer to 1, illustrates after cluster in cluster that data are tightr, overstepping the bounds of propriety from obvious difference between cluster,
Clustering Effect is better.
Specific implementation step is:
Step 1, carries out subordinate sentence to social assigned short text set, then carries out participle to each sentence using ICTCLAS2015
And part-of-speech tagging, according to the deactivation vocabulary that downloads from Internet, remove the stop words in text.
Step 2, using CSM to carrying out sentence justice structural analyses in assigned short text set per bar sentence, and utilizes LDA topic model
Assigned short text set is analyzed, and theme and the word distribution of short text is obtained, feature rich is carried out to sentence, and calculates sentence phase
Like degree.
Step 2.1, on the basis of step 1, to carrying out sentence justice structural analyses per bar sentence, extracts the topic of sentence, states
Topic, elemental term, general term.
Step 2.2, is analyzed to assigned short text set using LDA topic model, is extracted under the theme and theme in text
Word is distributed, and obtains theme-word matrix.
Step 2.3, carries out feature expansion according to topic to sentence, obtains the sentence vector based on topic.Concrete grammar is:
Elemental term and general term corresponding word topic under is extracted first, then according to the theme for obtaining in step 2.2-word square
Battle array, compares probability of the word under different themes, chooses probability highest theme, other words under the theme are added to sentence
In son, as a part for sentence, all words of sentence are finally used as feature, construction feature vector representation sentence, its
The value in dimension in middle sentence corresponding to original word is the occurrence number in sentence of word, and corresponding to the word for expanding
Dimension on value calculated by formula (1),
Step 2.4, by the method for step 2.3, carries out feature expansion according to topic is stated to sentence, obtains based on the sentence for stating topic
Vector.
Step 2.5, is based respectively on step 2.3 and the 2.4 two kinds of sentence vectors for obtaining calculate sentence similarity, to two phases
Be weighted like angle value, the final Similarity value between sentence is obtained by formula (2).
Step 3, will be through the pretreated all sentence inputting of step 1 to PV deep learning model, using PV model
Text feature is practised, sentence vector is obtained, and based on the COS distance between the sentence vector calculating sentence as similar between sentence
Degree, wherein the parameter in PV model is all using the default value in instrument.
Step 4, the Similarity value between the sentence that step 2 and step 3 are obtained carries out linear weighted function, is adjusted by gridding method
Parameter ω and θ, select one group of optimum parameter.
Clustering Effect to 5 topics, the vector length size=100 in PV model, length of window window=5,
When ω takes 0.33, θ takes 0.25, and silhouette coefficient reaches optimal effectiveness 0.45;When θ takes 0, i.e., only consider based on CSM sentence justice structure
The Similarity Measure result that analysis is obtained, silhouette coefficient reaches 0.42;When θ takes 1, i.e., only consideration relies on the sentence that PV analysis is obtained
Sub- similarity result, silhouette coefficient reaches 0.31.Test result indicate that the sentence vector for being obtained using CSM can include deeper
The internal semantic information of secondary sentence, PV model makes sentence vector obtain abundant contextual information, has both considered that itself is semantic
Information is while the also sentence similarity computational methods comprising contextual information more can accurately weigh the similarity degree between sentence.
Claims (3)
1., using the sentence similarity computational methods of sentence justice architectural feature, the method comprising the steps of:
Step 1, carries out pretreatment to assigned short text set, first carries out subordinate sentence, then carries out participle and part-of-speech tagging, remove stop words;
Step 2, in conjunction with sentence justice architectural feature and theme-word distribution characteristicss, carries out feature expansion, and calculates sentence phase to sentence
Like degree;
Step 2.1, on the basis of step 1, to carrying out sentence justice structural analyses per bar sentence, extracts the topic of sentence, states topic, base
This, general term;
Step 2.2, is analyzed to assigned short text set, is carried using LDA (Latent Dirichlet Allocation) topic model
The theme in text and the distribution of the word under theme is taken, obtains theme-word matrix;
Step 2.3, carries out feature expansion according to topic to sentence, obtains the sentence vector based on topic;
Step 2.4, carries out feature expansion according to topic is stated to sentence, obtains based on the sentence vector for stating topic;
Step 2.5, is based respectively on step 2.3 and the 2.4 two kinds of sentence vectors for obtaining calculate sentence similarity, to two similarities
Value is weighted, and obtains the final Similarity value between sentence, and specific formula for calculation is as follows,
Wherein, SAAnd SBRepresent any two sentence, sim1 (SA,SB) represent two sentences Similarity value,WithRespectively
Represent sentence SAAnd SBBased on topic sentence vector,WithRepresent sentence S respectivelyAAnd SBExpression based on the sentence for stating topic
Subvector, it is [0,1] that ω is adjustable parameter, span;
Step 3, will be through the pretreated all sentence inputting of step 1 to PV (Paragraph Vector) deep learning mould
Type, using PV model learning text feature, is obtained sentence vector, and is calculated the COS distance work between sentence based on the sentence vector
For the similarity between sentence, computing formula is as follows,
Wherein, SAAnd SBRepresent any two sentence, sim2 (SA,SB) represent two sentences Similarity value,WithRespectively
Represent the sentence vector for being obtained with PV model learning;
Step 4, the Similarity value between the sentence that step 2 and step 3 are obtained carries out linear weighted function, adjusts ginseng by gridding method
Number, finds one group of optimum parameter value, exports the Similarity value between final sentence pair.
2. sentence similarity computational methods of utilization sentence according to claim 1 justice architectural feature, it is characterised in that step
Feature expansion concrete grammar being carried out to sentence based on topic in 2.3 is:The elemental term that extracts under topic first is corresponding with general term
Word, then theme-word matrix for obtaining of short essay collection is analyzed according to LDA, compares probability of the word under different themes, choosing
Probability highest theme is taken, other words under the theme are added in sentence, as a part for sentence, finally, use
All words of sentence as feature, on construction feature vector representation sentence, the wherein dimension in sentence corresponding to original word
Value is the occurrence number in sentence of word, and the value in the dimension corresponding to the word for expanding is counted as follows
Calculate,
V=n*w
V is to expand the value that word is corresponded in dimension, and n is to expand the number of times that word occurs in sentence, and w is for expanding word right
Answer the probit under theme;
In step 2.4 based on state topic carry out feature expansion to sentence method similar to the side for sentence being expanded based on topic
Method.
3. sentence similarity computational methods of utilization sentence according to claim 1 justice architectural feature, it is characterised in that step
The Similarity-Weighted for being obtained by the similarity for being obtained based on CSM and based on PV in 4 is merged, and specific formula for calculation is:
sim(SA,SB)=θ * sim1 (SA,SB)+(1-θ)*sim2(SA,SB)
Wherein, SAAnd SBRepresent any two sentence, sim (SA,SB) represent two sentences Similarity value, θ be adjustable parameter, take
Value scope is [0,1], and in conjunction with the formula in the step 2.5 in claim 1 and step 3, complete sentence similarity calculates public
Formula is:
ω and θ are adjustable parameters, and span is all [0,1].
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610867254.2A CN106445920A (en) | 2016-09-29 | 2016-09-29 | Sentence similarity calculation method based on sentence meaning structure characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610867254.2A CN106445920A (en) | 2016-09-29 | 2016-09-29 | Sentence similarity calculation method based on sentence meaning structure characteristics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106445920A true CN106445920A (en) | 2017-02-22 |
Family
ID=58172480
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610867254.2A Pending CN106445920A (en) | 2016-09-29 | 2016-09-29 | Sentence similarity calculation method based on sentence meaning structure characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106445920A (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107273474A (en) * | 2017-06-08 | 2017-10-20 | 成都数联铭品科技有限公司 | Autoabstract abstracting method and system based on latent semantic analysis |
CN108009152A (en) * | 2017-12-04 | 2018-05-08 | 陕西识代运筹信息科技股份有限公司 | A kind of data processing method and device of the text similarity analysis based on Spark-Streaming |
CN108287824A (en) * | 2018-03-07 | 2018-07-17 | 北京云知声信息技术有限公司 | Semantic similarity calculation method and device |
CN108363816A (en) * | 2018-03-21 | 2018-08-03 | 北京理工大学 | Open entity relation extraction method based on sentence justice structural model |
CN108509408A (en) * | 2017-02-27 | 2018-09-07 | 芋头科技(杭州)有限公司 | A kind of sentence similarity judgment method |
CN109101489A (en) * | 2018-07-18 | 2018-12-28 | 武汉数博科技有限责任公司 | A kind of text automatic abstracting method, device and a kind of electronic equipment |
CN109145299A (en) * | 2018-08-16 | 2019-01-04 | 北京金山安全软件有限公司 | Text similarity determination method, device, equipment and storage medium |
CN109857990A (en) * | 2018-12-18 | 2019-06-07 | 重庆邮电大学 | A kind of financial class notice information abstracting method based on file structure and deep learning |
CN110008465A (en) * | 2019-01-25 | 2019-07-12 | 网经科技(苏州)有限公司 | The measure of sentence semantics distance |
CN110020421A (en) * | 2018-01-10 | 2019-07-16 | 北京京东尚科信息技术有限公司 | The session information method of abstracting and system of communication software, equipment and storage medium |
CN110287291A (en) * | 2019-07-03 | 2019-09-27 | 桂林电子科技大学 | A kind of unsupervised English short essay sentence is digressed from the subject analysis method |
CN110348133A (en) * | 2019-07-15 | 2019-10-18 | 西南交通大学 | A kind of bullet train three-dimensional objects structure technology effect figure building system and method |
CN110413761A (en) * | 2019-08-06 | 2019-11-05 | 浩鲸云计算科技股份有限公司 | A kind of method that the territoriality in knowledge based library is individually talked with |
CN110765360A (en) * | 2019-11-01 | 2020-02-07 | 新华网股份有限公司 | Text topic processing method and device, electronic equipment and computer storage medium |
CN110895656A (en) * | 2018-09-13 | 2020-03-20 | 武汉斗鱼网络科技有限公司 | Text similarity calculation method and device, electronic equipment and storage medium |
CN110990537A (en) * | 2019-12-11 | 2020-04-10 | 中山大学 | Sentence similarity calculation method based on edge information and semantic information |
CN110990451A (en) * | 2019-11-15 | 2020-04-10 | 浙江大华技术股份有限公司 | Data mining method, device and equipment based on sentence embedding and storage device |
CN111008783A (en) * | 2019-12-05 | 2020-04-14 | 浙江工业大学 | Factory processing flow recommendation method based on singular value decomposition |
CN111078849A (en) * | 2019-12-02 | 2020-04-28 | 百度在线网络技术(北京)有限公司 | Method and apparatus for outputting information |
CN111125301A (en) * | 2019-11-22 | 2020-05-08 | 泰康保险集团股份有限公司 | Text method and device, electronic equipment and computer readable storage medium |
CN111209375A (en) * | 2020-01-13 | 2020-05-29 | 中国科学院信息工程研究所 | Universal clause and document matching method |
CN111368532A (en) * | 2020-03-18 | 2020-07-03 | 昆明理工大学 | Topic word embedding disambiguation method and system based on LDA |
CN112650836A (en) * | 2020-12-28 | 2021-04-13 | 成都网安科技发展有限公司 | Text analysis method and device based on syntax structure element semantics and computing terminal |
CN112686025A (en) * | 2021-01-27 | 2021-04-20 | 浙江工商大学 | Chinese choice question interference item generation method based on free text |
CN113536907A (en) * | 2021-06-06 | 2021-10-22 | 南京理工大学 | Social relationship identification method and system based on deep supervised feature selection |
CN116756347A (en) * | 2023-08-21 | 2023-09-15 | 中国标准化研究院 | Semantic information retrieval method based on big data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104778256A (en) * | 2015-04-20 | 2015-07-15 | 江苏科技大学 | Rapid incremental clustering method for domain question-answering system consultations |
CN104834747A (en) * | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | Short text classification method based on convolution neutral network |
CN105573985A (en) * | 2016-03-04 | 2016-05-11 | 北京理工大学 | Sentence expression method based on Chinese sentence meaning structural model and topic model |
-
2016
- 2016-09-29 CN CN201610867254.2A patent/CN106445920A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104778256A (en) * | 2015-04-20 | 2015-07-15 | 江苏科技大学 | Rapid incremental clustering method for domain question-answering system consultations |
CN104834747A (en) * | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | Short text classification method based on convolution neutral network |
CN105573985A (en) * | 2016-03-04 | 2016-05-11 | 北京理工大学 | Sentence expression method based on Chinese sentence meaning structural model and topic model |
Non-Patent Citations (3)
Title |
---|
TOMAS等: "UWB at SemEval-2016 Task 1:Semantic Textual Similarity using Lexical,Syntactic,and Semantic Information", 《PROCEEDINGS OF SEMEVAL-2016》 * |
YUHUA LI 等: "Sentence Similarity Based on Semantic Nets and Corpus Statistics", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 * |
林萌 等: "融合句义结构模型的微博话题摘要算法", 《浙江大学学报(工学版)》 * |
Cited By (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108509408B (en) * | 2017-02-27 | 2019-11-22 | 芋头科技(杭州)有限公司 | A kind of sentence similarity judgment method |
CN108509408A (en) * | 2017-02-27 | 2018-09-07 | 芋头科技(杭州)有限公司 | A kind of sentence similarity judgment method |
CN107273474A (en) * | 2017-06-08 | 2017-10-20 | 成都数联铭品科技有限公司 | Autoabstract abstracting method and system based on latent semantic analysis |
CN108009152A (en) * | 2017-12-04 | 2018-05-08 | 陕西识代运筹信息科技股份有限公司 | A kind of data processing method and device of the text similarity analysis based on Spark-Streaming |
CN110020421A (en) * | 2018-01-10 | 2019-07-16 | 北京京东尚科信息技术有限公司 | The session information method of abstracting and system of communication software, equipment and storage medium |
CN108287824A (en) * | 2018-03-07 | 2018-07-17 | 北京云知声信息技术有限公司 | Semantic similarity calculation method and device |
CN108363816A (en) * | 2018-03-21 | 2018-08-03 | 北京理工大学 | Open entity relation extraction method based on sentence justice structural model |
CN109101489A (en) * | 2018-07-18 | 2018-12-28 | 武汉数博科技有限责任公司 | A kind of text automatic abstracting method, device and a kind of electronic equipment |
CN109101489B (en) * | 2018-07-18 | 2022-05-20 | 武汉数博科技有限责任公司 | Text automatic summarization method and device and electronic equipment |
CN109145299A (en) * | 2018-08-16 | 2019-01-04 | 北京金山安全软件有限公司 | Text similarity determination method, device, equipment and storage medium |
CN109145299B (en) * | 2018-08-16 | 2022-06-21 | 北京金山安全软件有限公司 | Text similarity determination method, device, equipment and storage medium |
CN110895656A (en) * | 2018-09-13 | 2020-03-20 | 武汉斗鱼网络科技有限公司 | Text similarity calculation method and device, electronic equipment and storage medium |
CN110895656B (en) * | 2018-09-13 | 2023-12-29 | 北京橙果转话科技有限公司 | Text similarity calculation method and device, electronic equipment and storage medium |
CN109857990B (en) * | 2018-12-18 | 2022-11-25 | 重庆邮电大学 | Financial bulletin information extraction method based on document structure and deep learning |
CN109857990A (en) * | 2018-12-18 | 2019-06-07 | 重庆邮电大学 | A kind of financial class notice information abstracting method based on file structure and deep learning |
CN110008465A (en) * | 2019-01-25 | 2019-07-12 | 网经科技(苏州)有限公司 | The measure of sentence semantics distance |
CN110287291B (en) * | 2019-07-03 | 2021-11-02 | 桂林电子科技大学 | Unsupervised method for analyzing running questions of English short sentences |
CN110287291A (en) * | 2019-07-03 | 2019-09-27 | 桂林电子科技大学 | A kind of unsupervised English short essay sentence is digressed from the subject analysis method |
CN110348133A (en) * | 2019-07-15 | 2019-10-18 | 西南交通大学 | A kind of bullet train three-dimensional objects structure technology effect figure building system and method |
CN110348133B (en) * | 2019-07-15 | 2022-08-19 | 西南交通大学 | System and method for constructing high-speed train three-dimensional product structure technical effect diagram |
CN110413761A (en) * | 2019-08-06 | 2019-11-05 | 浩鲸云计算科技股份有限公司 | A kind of method that the territoriality in knowledge based library is individually talked with |
CN110765360B (en) * | 2019-11-01 | 2022-08-02 | 新华网股份有限公司 | Text topic processing method and device, electronic equipment and computer storage medium |
CN110765360A (en) * | 2019-11-01 | 2020-02-07 | 新华网股份有限公司 | Text topic processing method and device, electronic equipment and computer storage medium |
CN110990451A (en) * | 2019-11-15 | 2020-04-10 | 浙江大华技术股份有限公司 | Data mining method, device and equipment based on sentence embedding and storage device |
CN110990451B (en) * | 2019-11-15 | 2023-05-12 | 浙江大华技术股份有限公司 | Sentence embedding-based data mining method, device, equipment and storage device |
CN111125301A (en) * | 2019-11-22 | 2020-05-08 | 泰康保险集团股份有限公司 | Text method and device, electronic equipment and computer readable storage medium |
CN111078849A (en) * | 2019-12-02 | 2020-04-28 | 百度在线网络技术(北京)有限公司 | Method and apparatus for outputting information |
CN111008783B (en) * | 2019-12-05 | 2022-03-18 | 浙江工业大学 | Factory processing flow recommendation method based on singular value decomposition |
CN111008783A (en) * | 2019-12-05 | 2020-04-14 | 浙江工业大学 | Factory processing flow recommendation method based on singular value decomposition |
CN110990537B (en) * | 2019-12-11 | 2023-06-27 | 中山大学 | Sentence similarity calculation method based on edge information and semantic information |
CN110990537A (en) * | 2019-12-11 | 2020-04-10 | 中山大学 | Sentence similarity calculation method based on edge information and semantic information |
CN111209375A (en) * | 2020-01-13 | 2020-05-29 | 中国科学院信息工程研究所 | Universal clause and document matching method |
CN111209375B (en) * | 2020-01-13 | 2023-01-17 | 中国科学院信息工程研究所 | Universal clause and document matching method |
CN111368532A (en) * | 2020-03-18 | 2020-07-03 | 昆明理工大学 | Topic word embedding disambiguation method and system based on LDA |
CN111368532B (en) * | 2020-03-18 | 2022-12-09 | 昆明理工大学 | Topic word embedding disambiguation method and system based on LDA |
CN112650836A (en) * | 2020-12-28 | 2021-04-13 | 成都网安科技发展有限公司 | Text analysis method and device based on syntax structure element semantics and computing terminal |
CN112686025A (en) * | 2021-01-27 | 2021-04-20 | 浙江工商大学 | Chinese choice question interference item generation method based on free text |
CN112686025B (en) * | 2021-01-27 | 2023-09-19 | 浙江工商大学 | Chinese choice question interference item generation method based on free text |
CN113536907A (en) * | 2021-06-06 | 2021-10-22 | 南京理工大学 | Social relationship identification method and system based on deep supervised feature selection |
CN116756347A (en) * | 2023-08-21 | 2023-09-15 | 中国标准化研究院 | Semantic information retrieval method based on big data |
CN116756347B (en) * | 2023-08-21 | 2023-10-27 | 中国标准化研究院 | Semantic information retrieval method based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106445920A (en) | Sentence similarity calculation method based on sentence meaning structure characteristics | |
CN103544255B (en) | Text semantic relativity based network public opinion information analysis method | |
CN103617280B (en) | Method and system for mining Chinese event information | |
CN104391942B (en) | Short essay eigen extended method based on semantic collection of illustrative plates | |
CN103136359B (en) | Single document abstraction generating method | |
CN110020189A (en) | A kind of article recommended method based on Chinese Similarity measures | |
CN107247780A (en) | A kind of patent document method for measuring similarity of knowledge based body | |
CN103455562A (en) | Text orientation analysis method and product review orientation discriminator on basis of same | |
CN106156272A (en) | A kind of information retrieval method based on multi-source semantic analysis | |
CN108549634A (en) | A kind of Chinese patent text similarity calculating method | |
CN109858028A (en) | A kind of short text similarity calculating method based on probabilistic model | |
CN104199972A (en) | Named entity relation extraction and construction method based on deep learning | |
CN106484664A (en) | Similarity calculating method between a kind of short text | |
CN105243152A (en) | Graph model-based automatic abstracting method | |
CN103324700B (en) | Noumenon concept attribute learning method based on Web information | |
CN103646112B (en) | Dependency parsing field self-adaption method based on web search | |
CN105653518A (en) | Specific group discovery and expansion method based on microblog data | |
CN103399901A (en) | Keyword extraction method | |
CN102880723A (en) | Searching method and system for identifying user retrieval intention | |
CN103970730A (en) | Method for extracting multiple subject terms from single Chinese text | |
CN104008090A (en) | Multi-subject extraction method based on concept vector model | |
CN105528437A (en) | Question-answering system construction method based on structured text knowledge extraction | |
CN104778204A (en) | Multi-document subject discovery method based on two-layer clustering | |
CN108038205A (en) | For the viewpoint analysis prototype system of Chinese microblogging | |
CN105975475A (en) | Chinese phrase string-based fine-grained thematic information extraction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170222 |
|
WD01 | Invention patent application deemed withdrawn after publication |