CN104090918A - Sentence similarity calculation method based on information amount - Google Patents

Sentence similarity calculation method based on information amount Download PDF

Info

Publication number
CN104090918A
CN104090918A CN201410268361.4A CN201410268361A CN104090918A CN 104090918 A CN104090918 A CN 104090918A CN 201410268361 A CN201410268361 A CN 201410268361A CN 104090918 A CN104090918 A CN 104090918A
Authority
CN
China
Prior art keywords
sentence
centerdot
word
formula
concept
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410268361.4A
Other languages
Chinese (zh)
Other versions
CN104090918B (en
Inventor
吴昊
黄河燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201410268361.4A priority Critical patent/CN104090918B/en
Publication of CN104090918A publication Critical patent/CN104090918A/en
Application granted granted Critical
Publication of CN104090918B publication Critical patent/CN104090918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention relates to a sentence similarity calculation method based on information amount. The method comprises the following steps: firstly, confirming the sense of a word according to the concept with the maximum information amount in words of two sentences, subsequently, calculating the information amount of the word and the public information amount of multiple words according to an hierarchical structure and corpus statistics of a semantic net, calculating the total information amount of multiple words by using the inclusion-exclusion principle in combinatorial mathematics so as to respectively obtain respective information amount of two sentences and the total information amount of two sentences, and finally defining and calculating the similarity of the sentences according to the Jaccard similarity principle. By adopting the method, the judgment of human beings on the similarity degree of sentences can be authentically simulated, moreover other natural language processing techniques such as corpus training parameters or experience parameters, dependence on scale of corpus, part-of-speech tagging and the like are not needed, the time performance is excellent, and quasi real-time calculation efficiency can be obtained on a conventional main current multi-core PC (Personal Computer) for sentence pairs of normal lengths.

Description

A kind of sentence similarity computing method based on quantity of information
Technical field
The present invention relates to a kind of sentence similarity computing method, be specifically related to a kind of sentence similarity computing method based on quantity of information, belong to natural language processing technique field.
Background technology
It is an important research content of natural language processing that sentence or short text similarity are calculated, and the effect in the applications such as information retrieval, mechanical translation, question answering system, automatic abstract is more and more important in recent years.Traditional method adopts the computing method of Documents Similarity more, only sentence word is regarded as and is not mutually had related meaningless symbol, and the sentence that contains a small amount of word for calculating is accurate not.And at present conventional mixed method conventionally need to be in associated data set training parameter or use experience parameter, its shortcoming is to rely on training dataset, versatility is not strong.
Summary of the invention
The purpose of the method for the present invention is that addresses the above problem, a kind of sentence similarity computing method based on quantity of information are provided, by using the essential attribute of this language of quantity of information, use inclusion-exclusion principle to obtain the gross information content of accurate multiple words, thereby obtain the sentence similarity result obtaining closer to people's subjective judgement.
The thought of the inventive method is first to determine the meaning of a word of word by having the concept of maximum quantity of information between two sentence words; Then utilize the hierarchical structure of semantic net (such as WordNet) and corpus (such as BNC corpus or Brown corpus etc.) to calculate the public information amount between quantity of information and many words of word; Next apply inclusion-exclusion principle in combinatorics and calculate the gross information content of multiple words, thereby obtain respectively two sentences quantity of information separately, and two sentences quantity of information altogether; Finally define and calculate the similarity of sentence according to Jaccard similarity principle.
For achieving the above object, the technical solution used in the present invention is:
Sentence similarity computing method based on quantity of information, comprise the following steps:
Step 1: input two sentence s to be calculated aand s b, note sentence s aand s bbe respectively:
s a = { w i a | i = 1,2 , . . . , n }
s b = { w i b | i = 1,2 , . . . , m }
Wherein, with represent respectively sentence s aand s bi word, n and m represent respectively sentence s aand s bword number;
Step 2: the word in input sentence is carried out to meaning of a word selection, and process is as follows:
Word the meaning of a word determine according to formula 1:
[formula 1]
c i a = arg max c ∈ subsum ( c 1 , c 2 ) c 1 ∈ consept ( w i a ) c 2 ∈ consepts ( s b ) { - log P ( c ) }
Wherein, subsum (c 1, c 2) for comprise concept c in semantic net 1and c 2all concept set, be illustrated in all words that comprise in semantic net the set of concept, consepts (s b) represent to comprise sentence s in semantic net bin the set of concept of all words, P (c) is the frequency of concept c in corpus, special, if P (c) is 0, logP (c) is that the value of 0, P (c) is determined according to formula 2:
[formula 2]
P(c)=Σ w∈words(c)count(w)/N
The wherein set of all words in all sub-concept of concept c and concept c in words (c) expression semantic net, count (w) is the frequency of word w in corpus, N represents the frequency sum of all financial resourcess concept in semantic net, and the frequency of each concept is the total frequency sum of whole words in corpus in this concept;
In like manner, by formula 1 replace to consepts (s b) replace to consepts (s a), can obtain sentence s bin the meaning of a word of i word
Sentence s after meaning determination aand s bcan be designated as:
s a = { c i a | i = 1,2 , . . . , n } s b = { c i b | i = 1,2 , . . . , m }
Step 3: determine the sentence of the meaning of a word according to step 2 gained, the inclusion-exclusion principle in application combinatorics calculates sentence s aand s bquantity of information separately and the gross information content of the two, computation process is as follows:
Sentence s aquantity of information IC (s a) computing formula as shown in Equation 3:
[formula 3]
IC ( s a ) = &Sigma; k = 1 n ( - 1 ) k - 1 &Sigma; 1 &le; i 1 < i 2 < &CenterDot; &CenterDot; &CenterDot; < i k &le; n commonIC ( c i 1 a , c i 2 a , &CenterDot; &CenterDot; &CenterDot; , c i k a )
Wherein, represent hierarchical structure and the common semantic information space building of corpus statistics by semantic net, calculate according to formula 4:
[formula 4]
commonIC ( c i 1 a , c i 2 a , &CenterDot; &CenterDot; &CenterDot; , c i k a ) = max c &Element; subsum ( c i 1 a , c i 2 a , &CenterDot; &CenterDot; &CenterDot; , c i k a ) [ - log P ( c ) ]
Wherein, for comprise concept in semantic net the set of all concepts;
In like manner, in wushu 3 and formula 4, all alphabetical a replace to b, and n replaces to m, can obtain sentence s bquantity of information;
Sentence s aand s bin the set of all unduplicated words regard a new sentence as, through type 5 obtains sentence s aand s bgross information content IC (s a∪ s b):
[formula 5]
IC ( s a &cup; s b ) = &Sigma; k = 1 p ( - 1 ) k - 1 &Sigma; 1 &le; i 1 < &CenterDot; &CenterDot; &CenterDot; < i k &le; p commonIC ( c i 1 a , &CenterDot; &CenterDot; &CenterDot; , c i k b )
Wherein, p is sentence s aand s bthe sum of unduplicated word;
Step 4: two the sentence s of contextual definition by union and between occuring simultaneously aand s bpublic information amount COMMONIC (s a, s b), computing formula as shown in Equation 6:
[formula 6]
COMMONIC(s a,s b)=IC(s a)+IC(s b)-IC(s a∪s b)
Step 5: according to Jaccard correlation principle, definition sentence s aand s bsimilarity sim (s a, s b), computing formula as shown in Equation 7:
[formula 7]
sim ( s a , s b ) = COMMONIC ( s a , s b ) IC ( s a &cup; s b )
Step 6: the similarity sim (s of two sentences of output a, s b).
Beneficial effect
Contrast prior art, the inventive method has further improved the degree of accuracy of mixed method, the judgement of simulating human that can be true to nature to sentence similarity degree, and do not need to use language material training parameter or use experience parameter, do not rely on the scale of corpus, without other natural language processing techniques such as part-of-speech taggings, can be directly used in sentence similarity and calculate, there is good versatility; Time performance is outstanding, to the sentence pair of general length, on current main-stream multinuclear PC, obtains quasi real time counting yield.
Brief description of the drawings
Fig. 1 is the process flow diagram of the inventive method.
Fig. 2 is the inventive method and additive method comparison diagram.
Embodiment
Below in conjunction with accompanying drawing, implementation process of the present invention is elaborated.
As shown in Figure 1, the inventive method is mainly divided into 5 steps:
Step 1: input two sentences to be calculated.Note sentence s aand s bbe respectively:
s a = { w i a | i = 1,2 , . . . , n }
s b = { w i b | i = 1,2 , . . . , m }
Wherein, with represent respectively sentence s aand s bi word, n and m represent respectively sentence s aand s bword number.
Step 2: because of the polysemy ubiquity of word, the word of input sentence is carried out to meaning of a word selection, can eliminate the uncertainty of sentence semantics, thereby prepare for subsequent calculations sentence similarity.Detailed process is as follows:
(1), in two sentences of input, respectively choose a word composition word pair;
(2) use the meaning of a word (or concept) of word in semantic net (such as WordNet) to replace word, this word is to converting multiple meaning of a word pair to;
(3) calculate the meaning of a word between public information amount;
(4) choose maximum public information amount as the right public information amount of this word;
(5) public information amount right all words according to from big to small sequence;
(6) two arrays of definition, are used for recording the meaning of a word of word in each sentence, and all elements value in initialization array is definite meaning of a word;
(6) process successively according to the order of (5), if the meaning of a word of word is also not definite, word corresponding element value in array is still recorded to the meaning of a word meaning of a word of word corresponding this quantity of information in meaning of a word array for not determining;
(7) until array element value is not determined the meaning of a word, all the meaning determination of word is complete.
Word the meaning of a word (or concept) according to suc as formula 1 determine:
[formula 1]
c i a = arg max c &Element; subsum ( c 1 , c 2 ) c 1 &Element; consept ( w i a ) c 2 &Element; consepts ( s b ) { - log P ( c ) }
Wherein
[formula 2]
P(c)=Σ w∈words(c)count(w)/N
Here subsum (c, 1, c 2) for comprise concept c in semantic net 1and c 2all concept set, be illustrated in all words that comprise in semantic net the set of concept, consepts (s b) represent to comprise sentence s in semantic net bin the set of concept of all words, P (c) is the frequency of concept c in certain corpus.Special, if P (c) is 0, logP (c) is 0;
The set of all words in words (c) expression semantic net in all sub-concept of concept c and concept c, count (w) is the frequency of word w in a certain corpus, N represents the frequency sum of all financial resourcess concept in semantic net, and the frequency of each concept is the total frequency sum of whole words in corpus in this concept.
In like manner can obtain sentence s bin the meaning of a word of i word sentence s after meaning determination aand s bcan be designated as:
s a = { c i a | i = 1,2 , . . . , n }
s b = { c i b | i = 1,2 , . . . , m }
Step 3: determine the sentence of the meaning of a word according to step 2 gained, the inclusion-exclusion principle in application combinatorics calculates sentence s aand s bquantity of information separately, i.e. sentence s aquantity of information computing formula as shown in Equation 3:
[formula 3]
IC ( s a ) = &Sigma; k = 1 n ( - 1 ) k - 1 &Sigma; 1 &le; i 1 < i 2 < &CenterDot; &CenterDot; &CenterDot; < i k &le; n commonIC ( c i 1 a , c i 2 a , &CenterDot; &CenterDot; &CenterDot; , c i k a )
Wherein
[formula 4]
commonIC ( c i 1 a , c i 2 a , &CenterDot; &CenterDot; &CenterDot; , c i k a ) = max c &Element; subsum ( c i 1 a , c i 2 a , &CenterDot; &CenterDot; &CenterDot; , c i k a ) [ - log P ( c ) ]
Here, represent by concept in the hierarchical structure of semantic net and the semantic information space of the common structure of corpus statistics common factor, for comprise concept in semantic net the set of all concepts, any k for value between 1 to n, represent sentence s ain get k word one combination.
In like manner, in wushu 3 and formula 4, all alphabetical a replace to b, and n replaces to m, can obtain sentence s bquantity of information.
Sentence s aand s bin the set of all unduplicated words regard a new sentence as, through type 5 obtains sentence s aand s bgross information content:
[formula 5]
IC ( s a &cup; s b ) = &Sigma; k = 1 p ( - 1 ) k - 1 &Sigma; 1 &le; i 1 < &CenterDot; &CenterDot; &CenterDot; < i k &le; p commonIC ( c i 1 a , &CenterDot; &CenterDot; &CenterDot; , c i k b )
Wherein, p is two unduplicated word sums of sentence.
Step 4: two sentences quantity of information separately and the gross information content of two sentences that obtain by step 3.Relation by union and between occuring simultaneously obtains two sentence s aand s bpublic information amount COMMONIC (s a, s b) as shown in Equation 6:
[formula 6]
COMMONIC(s a,s b)=IC(s a)+IC(s b)-IC(s a∪s b)
Step 5: the gross information content of two sentences that obtain by step 3 and step 4 and public information amount, according to Jaccard correlation principle, two sentence s aand s bsimilarity can calculate by through type 7:
[formula 7]
sim ( s a , s b ) = COMMONIC ( s a , s b ) IC ( s a &cup; s b )
As mentioned above, the invention provides a kind of sentence similarity computing method based on quantity of information.The true sentence pair of inputting by user, system will calculate result sentence similarity being judged in the mankind true to nature automatically.
Provide the comparative result of the inventive method and existing other 4 kinds of sentence similarity computing method below.The semantic net WordNet and the corpus BNC that in experiment, have used.Assessment adopts the Pearson correlation coefficient (PCC) of linear dependence, Spearman rank related coefficient (SRCC) the Kendall rank related coefficient (KRCC) relevant with probability sorting that generally sorts relevant.Table 1 is to adopting the marking result of the whole bag of tricks to the corresponding sentence of word.
The artificial marking that table 1 sentence is right and the whole bag of tricks marking
Can obtain Fig. 2 from table 1.As can see from Figure 2, this paper method is all better than other 4 kinds of methods on PCC, SRCC and KRCC, illustrates that model more approaches the mankind's subjective judgement to the judgement of sentence similarity.In addition, single volunteer's marking of artificial data collection and the PCC average that all volunteer gives a mark between average are 0.825, and maximal value is 0.921; And the PCC of the inventive method is higher than the mean value of single marking and lower than the higher limit of single marking; Illustrate model to the determined level of sentence similarity the average level higher than people, and credible result.
Above-described specific descriptions; object, technical scheme and beneficial effect to invention further describe; institute is understood that; the foregoing is only specific embodiments of the invention; the protection domain being not intended to limit the present invention; within the spirit and principles in the present invention all, any amendment of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (1)

1. the sentence similarity computing method based on quantity of information, is characterized in that, comprise the following steps:
Step 1: input two sentence s to be calculated aand s b, note sentence s aand s bbe respectively:
s a = { w i a | i = 1,2 , . . . , n }
s b = { w i b | i = 1,2 , . . . , m }
Wherein, with represent respectively sentence s aand s bi word, n and m represent respectively sentence s aand s bword number;
Step 2: the word in input sentence is carried out to meaning of a word selection, and process is as follows:
Word the meaning of a word determine according to formula 1:
[formula 1]
c i a = arg max c &Element; subsum ( c 1 , c 2 ) c 1 &Element; consept ( w i a ) c 2 &Element; consepts ( s b ) { - log P ( c ) }
Wherein, subsum (c 1, c 2) for comprise concept c in semantic net 1and c 2all concept set, be illustrated in all words that comprise in semantic net the set of concept, consepts (s b) represent to comprise sentence s in semantic net bin the set of concept of all words, P (c) is the frequency of concept c in corpus, special, if P (c) is 0, logP (c) is that the value of 0, P (c) is determined according to formula 2:
[formula 2]
P(c)=Σ w∈words(c)count(w)/N
The wherein set of all words in all sub-concept of concept c and concept c in words (c) expression semantic net, count (w) is the frequency of word w in corpus, N represents the frequency sum of all financial resourcess concept in semantic net, and the frequency of each concept is the total frequency sum of whole words in corpus in this concept;
In like manner, by formula 1 replace to consepts (s b) replace to consepts (s a), can obtain sentence s bin the meaning of a word of i word
Sentence s after meaning determination aand s bcan be designated as:
s a = { c i a | i = 1,2 , . . . , n }
s b = { c i b | i = 1,2 , . . . , m }
Step 3: determine the sentence of the meaning of a word according to step 2 gained, the inclusion-exclusion principle in application combinatorics calculates sentence s aand s bquantity of information separately and the gross information content of the two, computation process is as follows:
Sentence s aquantity of information IC (s a) computing formula as shown in Equation 3:
[formula 3]
IC ( s a ) = &Sigma; k = 1 n ( - 1 ) k - 1 &Sigma; 1 &le; i 1 < i 2 < &CenterDot; &CenterDot; &CenterDot; < i k &le; n commonIC ( c i 1 a , c i 2 a , &CenterDot; &CenterDot; &CenterDot; , c i k a )
Wherein, represent hierarchical structure and the common semantic information space building of corpus statistics by semantic net, calculate according to formula 4:
[formula 4]
commonIC ( c i 1 a , c i 2 a , &CenterDot; &CenterDot; &CenterDot; , c i k a ) = max c &Element; subsum ( c i 1 a , c i 2 a , &CenterDot; &CenterDot; &CenterDot; , c i k a ) [ - log P ( c ) ]
Wherein, for comprise concept in semantic net the set of all concepts;
In like manner, in wushu 3 and formula 4, all alphabetical a replace to b, and n replaces to m, can obtain sentence s bquantity of information;
Sentence s aand s bin the set of all not dittographs regard a new sentence as, through type 5 obtains sentence s aand s bgross information content IC (s a∪ s b):
[formula 5]
IC ( s a &cup; s b ) = &Sigma; k = 1 p ( - 1 ) k - 1 &Sigma; 1 &le; i 1 < &CenterDot; &CenterDot; &CenterDot; < i k &le; p commonIC ( c i 1 a , &CenterDot; &CenterDot; &CenterDot; , c i k b )
Wherein, p is sentence s aand s bthe sum of unduplicated word;
Step 4: two the sentence s of contextual definition by union and between occuring simultaneously aand s bpublic information amount COMMONIC (s a, s b), computing formula as shown in Equation 6:
[formula 6]
COMMONIC(s a,s b)=IC(s a)+IC(s b)-IC(s a∪s b)
Step 5: according to Jaccard correlation principle, definition sentence s aand s bsimilarity sim (s a, s b), computing formula as shown in Equation 7:
[formula 7]
sim ( s a , s b ) = COMMONIC ( s a , s b ) IC ( s a &cup; s b )
Step 6: the similarity sim (s of two sentences of output a, s b).
CN201410268361.4A 2014-06-16 2014-06-16 Sentence similarity calculation method based on information amount Active CN104090918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410268361.4A CN104090918B (en) 2014-06-16 2014-06-16 Sentence similarity calculation method based on information amount

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410268361.4A CN104090918B (en) 2014-06-16 2014-06-16 Sentence similarity calculation method based on information amount

Publications (2)

Publication Number Publication Date
CN104090918A true CN104090918A (en) 2014-10-08
CN104090918B CN104090918B (en) 2017-02-22

Family

ID=51638634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410268361.4A Active CN104090918B (en) 2014-06-16 2014-06-16 Sentence similarity calculation method based on information amount

Country Status (1)

Country Link
CN (1) CN104090918B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354186A (en) * 2015-11-05 2016-02-24 同济大学 News event extraction method and system
CN106780061A (en) * 2016-11-30 2017-05-31 华南师范大学 Social network user analysis method and device based on comentropy
CN108628883A (en) * 2017-03-20 2018-10-09 北京搜狗科技发展有限公司 A kind of data processing method, device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030081811A1 (en) * 2000-10-02 2003-05-01 Hiroyuki Shimizu Apparatus and method for text segmentation based on coherent units
CN102945228A (en) * 2012-10-29 2013-02-27 广西工学院 Multi-document summarization method based on text segmentation
CN103136359A (en) * 2013-03-07 2013-06-05 宁波成电泰克电子信息技术发展有限公司 Generation method of single document summaries
CN103699529A (en) * 2013-12-31 2014-04-02 哈尔滨理工大学 Method and device for fusing machine translation systems by aid of word sense disambiguation
US20140101171A1 (en) * 2012-10-10 2014-04-10 Abbyy Infopoisk Llc Similar Document Search

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030081811A1 (en) * 2000-10-02 2003-05-01 Hiroyuki Shimizu Apparatus and method for text segmentation based on coherent units
US20140101171A1 (en) * 2012-10-10 2014-04-10 Abbyy Infopoisk Llc Similar Document Search
CN102945228A (en) * 2012-10-29 2013-02-27 广西工学院 Multi-document summarization method based on text segmentation
CN103136359A (en) * 2013-03-07 2013-06-05 宁波成电泰克电子信息技术发展有限公司 Generation method of single document summaries
CN103699529A (en) * 2013-12-31 2014-04-02 哈尔滨理工大学 Method and device for fusing machine translation systems by aid of word sense disambiguation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李海芳等: "一种基于含糊同义词的查询扩展方法", 《计算机应用与软件》 *
程传鹏等: "一种基于知网的句子相似度计算方法", 《计算机工程与科学》 *
金春霞: "多层次结构句子相似计算的应用研究", 《计算机应用与软件》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354186A (en) * 2015-11-05 2016-02-24 同济大学 News event extraction method and system
WO2017075912A1 (en) * 2015-11-05 2017-05-11 同济大学 News events extracting method and system
CN106780061A (en) * 2016-11-30 2017-05-31 华南师范大学 Social network user analysis method and device based on comentropy
CN108628883A (en) * 2017-03-20 2018-10-09 北京搜狗科技发展有限公司 A kind of data processing method, device and electronic equipment
CN108628883B (en) * 2017-03-20 2021-03-16 北京搜狗科技发展有限公司 Data processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN104090918B (en) 2017-02-22

Similar Documents

Publication Publication Date Title
CN105468605B (en) Entity information map generation method and device
CN103984681B (en) News event evolution analysis method based on time sequence distribution information and topic model
Li et al. Key word extraction for short text via word2vec, doc2vec, and textrank
CN106610955A (en) Dictionary-based multi-dimensional emotion analysis method
CN107122413A (en) A kind of keyword extracting method and device based on graph model
CN103473380B (en) A kind of computer version sensibility classification method
CN105183833A (en) User model based microblogging text recommendation method and recommendation apparatus thereof
CN104268197A (en) Industry comment data fine grain sentiment analysis method
CN104679738B (en) Internet hot words mining method and device
CN107992542A (en) A kind of similar article based on topic model recommends method
CN104915446A (en) Automatic extracting method and system of event evolving relationship based on news
CN106970910A (en) A kind of keyword extracting method and device based on graph model
CN103455562A (en) Text orientation analysis method and product review orientation discriminator on basis of same
CN102033919A (en) Method and system for extracting text key words
CN106407280A (en) Query target matching method and device
CN109670039A (en) Sentiment analysis method is commented on based on the semi-supervised electric business of tripartite graph and clustering
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN104899188A (en) Problem similarity calculation method based on subjects and focuses of problems
CN106874258A (en) A kind of text similarity computational methods and system based on Hanzi attribute vector representation
CN108073571A (en) A kind of multi-language text method for evaluating quality and system, intelligent text processing system
CN110781681A (en) Translation model-based elementary mathematic application problem automatic solving method and system
CN106503256B (en) A kind of hot information method for digging based on social networks document
CN104346408A (en) Method and equipment for labeling network user
CN109086355A (en) Hot spot association relationship analysis method and system based on theme of news word
CN104881399A (en) Event identification method and system based on probability soft logic PSL

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant