CN103336806B - A kind of key word sort method that the inherent of spacing and external pattern entropy difference occur based on word - Google Patents

A kind of key word sort method that the inherent of spacing and external pattern entropy difference occur based on word Download PDF

Info

Publication number
CN103336806B
CN103336806B CN201310253678.6A CN201310253678A CN103336806B CN 103336806 B CN103336806 B CN 103336806B CN 201310253678 A CN201310253678 A CN 201310253678A CN 103336806 B CN103336806 B CN 103336806B
Authority
CN
China
Prior art keywords
word
text
spacing
occurs
entropy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310253678.6A
Other languages
Chinese (zh)
Other versions
CN103336806A (en
Inventor
杨震
司书勇
雷建军
范科峰
赖英旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201310253678.6A priority Critical patent/CN103336806B/en
Publication of CN103336806A publication Critical patent/CN103336806A/en
Application granted granted Critical
Publication of CN103336806B publication Critical patent/CN103336806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The present invention proposes a kind of based on being there is the method that the comentropy difference of the inherent of spacing and external pattern carries out key word sequence by word, belongs to Text extraction field.This method thinks that the appearance of key word is affected by two patterns: (1) inherent pattern, the statistical property of the key word position being described in a topic;(2) external pattern, describes the statistical attribute that in text, topic bunch occurs.On real text, experimental result finds, a word occurs that the interior external schema of spacing and external pattern information entropy difference are the biggest, then he is that the probability of key word is the biggest.

Description

A kind of key word sequence that the inherent of spacing and external pattern entropy difference occur based on word Method
Technical field
The present invention relates to keyword extraction and the sort method of a kind of novel text, belong to Text extraction field.
Background technology
Along with the deep development of the Internet, the quantity of information on network is increasing, and the means simultaneously obtaining information are the most more come The most convenient.But meanwhile Internet user suffers from a difficult problem for information explosion.In order to solve such difficult problem, it would be desirable to energy Enough quickly from magnanimity information, find interested part.This just requires that we can extract key from text message Word.
Traditional method is thought, if a word is identified as key word, then he certainly exists significant statistical nature.H. P. Luhn proposes original keyword extracting method.In the method for Luhn, after rejecting common word and rare word, close Keyword is ranked up by word frequency.Since then, method based on word frequency and improvement thereof are by extensive discussions.But, the method for word frequency But cannot by the most dramatically different close for word frequency word separately.M. Ortuno, J. P. Herrera and P. Carpena carries Go out to utilize the method for word spatial characteristics to detect key word.But the method being equally based on distribution cannot be by spatial distribution phase But the near word that importance is the most dramatically different separates.
The present invention proposes a kind of based on being occurred that by word the comentropy difference of the inherent of spacing and external pattern carries out key word The method of sequence.This method thinks that the appearance of key word is affected by two patterns: (1) inherent pattern, is described in words The statistical property of the key word position in topic;(2) external pattern, describes the statistical attribute that in text, topic bunch occurs.True literary composition In basis, experimental result finds, a word occurs that the interior external schema of spacing and external pattern information entropy difference are the biggest, then he is crucial The probability of word is the biggest.
Summary of the invention
Step (1) obtains text
Obtaining text, text is made up of the sentence of some numbers.
Step (2) Text Pretreatment
Step (2.1) removes all of punctuation mark, and all of letter is converted to small letter.
Step (2.2), for English text, carries out participle based on simple space.English different morphologies are as different It is two different words that word, such as " organ " are treated as with " organs ".
Step (2.3), for Chinese text, uses conventional participle software to carry out participle.
Step (3) word occurs that the inherent of spacing finds with external external schema
There is position in step (3.1) mark word.Assuming that text size is N, a word A occurs i.e. m word the most altogether Frequently, its position is represented by t1, t2, t3... tm, represent the word A t at text respectively1, t2, t3... tmPosition occurs.
Step (3.2) calculates word and location gap occurs.Word A occurs that spacing can be expressed as di=ti+1-ti.μ is expressed as di's Meansigma methods, i.e. average headway.
Step (3.3) divides word and the inherent of spacing and external pattern occurs.If di≤ μ, then diIt is classified as inherent mould Formula.In other words, certain for given word occurs, if it and word spacing d that position occurs next timeiLess than or equal to flat All spacing μ, then just diIt is classified as inherent pattern.Similarly, if di> μ, then diIt is classified as external pattern.
Step (3.4) calculates word and the inherent of spacing and external pattern entropy occurs.dA ={di |di≤ μ } represent all di The set of≤μ.So one word occurs that the entropy of the inherent pattern of spacing is defined as:
H ( d A ) = - Σ d ∈ d A P d log 2 P d - - - ( 1 )
Here d is the spacing of word, d belong to 1,2,3 ... N}, and PdRepresent is at dAIt is general that middle d occurs Rate.At dAThe word number that middle d occurs is nd, dAMiddle data amount check is SA, Pd = nd /SA
dB= {di |di> μ represent all di> set of μ.So one word occurs that the entropy of the external pattern of spacing is fixed Justice is:
H ( d B ) = - Σ d ∈ d B P d log 2 P d - - - ( 2 )
Here d is also spacing, d belong to 1,2,3 ... N}, and PdRepresent is at dBIt is general that middle d occurs Rate.At dBThe word number that middle d occurs is nd, dBMiddle data amount check is SB, Pd = nd /SB
Step (3.5) calculates word and occurs that the inherent of spacing and external pattern entropy are poor
EDq(d)=(H(dA))q-(H(dB))q(3)
Wherein, q ∈ N+.Q takes positive integer, can be appreciated that can to obtain the effect when q is 2 best by experiment below.
The difference normalization of step (3.6) entropy calculates.Normalized entropy difference EDnorIt is defined as follows:
ED nor q ( d ) = ED q ( d ) | ED geo q ( d ) | - - - ( 4 )
Wherein,
ED geo q ( d ) = ( - Σ d ≤ N / m p ( 1 - p ) d - 1 p A log p ( 1 - p ) d - 1 p A ) q - ( - Σ d ≥ N / m p ( 1 - p ) d - 1 p B log p ( 1 - p ) d - 1 p B ) q - - - ( 5 )
Wherein p A = Σ d ≤ N / m p ( 1 - p ) d - 1 , p B = Σ d > N / m p ( 1 - p ) d - 1 . D is that spacing takes positive integer.What N/m represented is The expectation of spacing.What p=m/N represented is word probability in the text, and m is the word frequency of corresponding words, and it is vocabulary sum that N represents.p(1- p)d-1Be equivalent to d heavily Bernoulli trials.
Normalized purpose is to compare in order to the word of different p can be put under same standard, prevents due to p Difference cause entropy difference to have significantly difference (i.e. in order to eliminate the impact on experiment effect of the p factor).
Formula meaningWord occurs being smaller than the probability sum of average headway in the text.pB Similar.Represent is under conditions of word is smaller than average headway, and the conditional probability that spacing is d occurs in word.
Vocabulary is ranked up by step (4) according to entropy difference relative size
Accompanying drawing explanation
Fig. 1: in text, word occurs that the internal schema of location gap divides schematic diagram with external schema.
The schematic diagram of Fig. 2: various boundary.A) boundary condition C-1.Assume that adding word position-1 and N occurs.B) border Condition C0.Assume that adding word position 0 and N+1 occurs.C) boundary condition Cc.Assume that joining end to end of text, each grid represent one The position of individual word.
Fig. 3:At boundary condition C-1,C0And CcUnder the conditions of work as q=1,2 ... top-n accuracy rate when 5.
Fig. 4: key word detectsAt q=1,2 ... 5 and boundary condition C-1,C0And CcUnder Average Accuracy (AP)。
Detailed description of the invention
Step (1) obtains text
Obtaining text, text is made up of the sentence of some numbers.
Testing material collection is Charles Darwin " The Origin of Species ", uses W.S. Dallas to carry The key word annex of confession is as evaluation and test foundation.
Step (2) Text Pretreatment
Step (2.1) removes all of punctuation mark, and all of letter is converted to small letter;Catalogue in literary composition, vocabulary, And index all removes from text.
Step (2.2), for English text, carries out participle based on simple space.First remove stop words, English different words Shape is as different words, and such as " organ " treats as with " organs " is two different words.Count word frequency m of each word, And the most total word quantity N.Calculate the Probability p=m/N of the appearance of each word.
Step (2.3), for Chinese text, uses conventional participle software to carry out participle.Use general segmentation methods to Chinese Text carries out participle.Count word frequency m of each word, and the most total word quantity N.Calculate the appearance of each word Probability p=m/N.
Step (3) word occurs that the inherent of spacing finds with external external schema
There is position in step (3.1) mark word.
Assuming that text size is N, a word A occurs m time the most altogether, and its position is represented by t1, t2, t3... tm, represent the word A t at text respectively1, t2, t3... tmThere is (as shown in Figure 1) in position.
Step (3.2) calculates word and location gap occurs
Word A occurs that spacing can be expressed as di=ti+1-ti.Assume that the location tables that in text, word occurs for m time is shown as: t1, t2, t3... tm.Word occurs in the alternate position spike on adjacent position can be write as such di=ti+1-ti, word spacing collection is combined into d1, d2,……dm-1.Compare three kinds of different boundary condition C-1,C0And Cc(as shown in Figure 2).A) boundary condition C-1.Assume to add There is position-1 and N in word.Namely assuming to occur in that uncorrelated word on-1 and N, word frequency is not added up in this appearance of twice.B) Boundary condition C0.Assume that adding word position 0 and N+1 occurs.Namely assuming to occur in that uncorrelated word on-1 and N, this is twice Appearance do not add up word frequency.C) boundary condition Cc.Assume that joining end to end of text, each grid represent the position of a word, i.e. Assume the word " indirectly " (as shown in Figure 2) of first word of beginning and ending.For C-1Boundary condition, distance set is modified to d0 -1, d1,……dm-1,dm -1,For C0Boundary condition, distance set is modified to d0 0,d1,…… dm-1,dm 0, whereinFor Cc Boundary condition, distance set is modified to d1,d2,……dm-1,dm c AndWherein d1,……dm-1, meaning is as represented spacing, t abovemIt is still that the position that word occurs for the m time Putting, N represents text size.
Step (3.3) divides word and the inherent of spacing and external pattern occurs
Location gap set according to each word above, calculates the average value mu of the spacing of word, with this meansigma methods conduct The foundation of external schema in dividing.Such as d={1,2,1,2,3,4,5,50,2,1,3,1,2,3,2,3,100,2,1,3,1,4, The meansigma methods of 3,2,1,2,1,1,1} is μ=7.1379.If di≤ μ is so di It is classified as internal schema, di > μ is so di Return For external schema.Such as 1 < μ, then 1 is classified as internal schema, and 50 and 100 are more than μ, and they are classified as external schema (such as Fig. 1 institute Show).Thus the spacing of word is divided into the set of inside and outside two patterns.The set of internal schema is designated as dA, the set of external schema It is designated as dB
D in the example of topA={1,2,1,2,3,4,5,2,1,3,1,2,3,2,3,2,1,3,1,4,3,2,1,2,1,1, 1}, dB ={50,100}。
Step (3.4) calculates word and the inherent of spacing and external pattern entropy occurs
The set d of inherent patternA ={di |di≤ μ } represent all diThe set of≤μ.There is spacing in so one word The entropy of inherent pattern be defined as:
H ( d A ) = - &Sigma; d &Element; d A P d log 2 P d - - - ( 6 )
Here d is also spacing, and that it represents is dA In an element, and PdRepresent is at dAMiddle d occurs Probability.At dAThe word number that middle d occurs is nd, dAMiddle data amount check is SA, Pd=nd/SA
The entropy of internal schema is calculated according to formula (6).Such as above example P1=10/27, P2=8/27, P3=6/27, P4=2/ 27, P5=1/27.Substitute into formula (6) and obtain H (dA)=1.98。
The set d of external schemaB= {di |di> μ represent all di> set of μ.So one word occurs outside spacing Entropy in pattern is defined as:
H ( d B ) = - &Sigma; d &Element; d B P d log 2 P d - - - ( 7 )
Here d is also spacing, and that it represents is dBIn an element, and PdRepresent is at dBIt is general that middle d occurs Rate.At dBThe word number that middle d occurs is nd, dBMiddle data amount check is SB, Pd=nd/SB
For above example P50=1/2, P100=1/2.Substitute into formula (7) and obtain H (dB)=1。
According to formula (7) even if going out the entropy of external schema.
Step (3.5) calculates word and occurs that the inherent of spacing and external pattern entropy are poor
EDq(d)=(H(dA))q-(H(dB))q(8)
Wherein, q ∈ N+.Such as q=1,2 ..., 5.The example such as step (3.3) is given, as q=2, EDq(d)= (1.98)2-(1)2=3.9204。
Step (3.6) calculates entropy difference normalization
Normalized entropy difference EDnorIt is defined as follows:
ED nor q ( d ) = ED q ( d ) | ED geo q ( d ) | - - - ( 9 )
Wherein,
ED geo q ( d ) = ( - &Sigma; d &le; N / m p ( 1 - p ) d - 1 p A log p ( 1 - p ) d - 1 p A ) q - ( - &Sigma; d &GreaterEqual; N / m p ( 1 - p ) d - 1 p B log p ( 1 - p ) d - 1 p B ) q - - - ( 10 )
Wherein p A = &Sigma; d &le; N / m p ( 1 - p ) d - 1 , p B = &Sigma; d > N / m p ( 1 - p ) d - 1 .
The example such as step (3.3) is given, it is assumed that text size N=1000, when symbol occurrence number m=29, So average word spacing is μ=N/m=34.5, frequency of occurrences p=m/N=0.029 of symbol.
p A = &Sigma; d = 1 34 0.029 * ( 1 - 0.029 ) d - 1 = 0.6323 , pB=1-pA=0.3677。
ED geo q ( d ) = ( - &Sigma; d = 1 34 0.029 ( 1 - 0.029 ) d - 1 p A log 0.029 ( 1 - 0.029 ) d - 1 p A ) q
- ( - &Sigma; d = 35 1000 0.029 ( 1 - 0.029 ) d - 1 p B log 0.029 ( 1 - 0.029 ) d - 1 p B ) q
= ( 5.0288 ) q - ( 6.4575 ) q
Typically, ED geo 2 ( d ) = - 16.4105 . So ED nor 2 ( d ) = 3.9204 | - 16.4105 | = 0.2389 .
Vocabulary is ranked up by step (4) according to entropy difference
It is according to the word divided in step (2), poor according to the entropy that the formula (6) of top calculates each word successively to (10), After calculating completes, all words are ranked up according to entropy difference is descending.Fig. 3 gives key word Testing index? Boundary condition C-1,C0And CcUnder the conditions of work as q=1,2 ... top-n accuracy rate when 5.Assume that an algorithm is by marking to article In word sequence, wherein before in n result key word correct numerical statement be shown as key (n), then the accurate calibration of algorithm top-n Justice is p (n)=key (n)/n.Average Accuracy (AP) is defined asWherein p (n) is top-n Accuracy rate, if the word of ranking n-th is key word r (n)=1, if not key word, r (n)=0.L is the number of all words, R It it is the number of key word.Fig. 4 gives key word Testing indexAt q=1,2 ... 5 and boundary condition C-1,C0And CcUnder Average Accuracy (AP).In terms of result, when q is 2, the performance of algorithm is more stable than during other values and performance is more excellent.

Claims (1)

1. the key word sort method that the inherent of spacing and external pattern entropy difference occur based on word, it is characterised in that step is such as Under:
Step (1) obtains text
Obtaining text, text is made up of the sentence of some numbers;
Step (2) Text Pretreatment
Step (2.1) removes all of punctuation mark, and all of letter is converted to small letter;Catalogue in literary composition, vocabulary, and Index all removes from text;
Step (2.2), for English text, carries out participle based on simple space;First removing stop words, English different morphologies are worked as Become different words;Count word frequency m of each word, and the most total word quantity N;Calculate appearance general of each word Rate p=m/N;
Step (2.3), for Chinese text, uses conventional participle software to carry out participle;Use general segmentation methods to Chinese text Carry out participle;Count word frequency m of each word, and the most total word quantity N;Calculate the probability of the appearance of each word P=m/N;
Step (3) word occurs that the inherent of spacing finds with external external schema
There is position in step (3.1) mark word;
Assuming that text size is N, the word quantity that i.e. full text in step (2) is total, a word A occurs m time the most altogether, i.e. walks Suddenly the word frequency in (2), its positional representation occurred is t1, t2, t3... ..tm, represent the word A t at text respectively1, t2, t3... ..tmPosition occurs;
Step (3.2) calculates word and location gap occurs
In text, the location tables of m the appearance of word A is shown as: t1, t2, t3... ..tm;Wherein d1,......dm-1Represent spacing, tm It is still that the position that word occurs for the m time;Word occurs in alternate position spike d on adjacent positioni=ti+1-ti, word spacing collection is combined into d1,d2,......dm-1;For C-1Boundary condition, it is assumed that text border is in-1 and N the two position, then distance set correction ForFor C0Boundary condition, it is assumed that text border is in 0 and N+1 the two Position, then text distance set is modified to d0 0,d1,......dm-1,dm 0, whereinFor Cc Boundary condition, it is assumed that joining end to end of text, distance set is modified to d1,d2,......dm-1,dm c,It is ring-type to be that text is linked to be State under, the last distance occurred and occur for the first time of word;And
Step (3.3) divides word and the inherent of spacing and external pattern occurs
Location gap set according to each word above, calculates the average value mu of the spacing of word, by this meansigma methods as division The foundation of interior external schema;If di≤ μ is so diIt is classified as internal schema, di> μ is so diIt is classified as external schema;Foundation according to this, this Sample is just divided into the spacing of word the set of inside and outside two patterns;The set of internal schema is designated as dA, the set of external schema is designated as dB
Step (3.4) calculates word and the entropy of the inherent of spacing and external pattern occurs
The set d of inherent patternA={ di|di≤ μ } represent all diThe set of≤μ;There is the inherent mould of spacing in so one word The entropy of formula is defined as:
H ( d A ) = - &Sigma; d &Element; d A P d log 2 P d - - - ( 6 )
Here d is also spacing, d belong to 1,2,3 ... N}, and PdRepresent is at dAThe probability that middle d occurs;At dA The word number that middle d occurs is nd, dAMiddle data amount check is SA, Pd=nd/SA
The entropy of internal schema is calculated according to formula (6);
The set d of external schemaB={ di|di> μ represent all di> set of μ;There is the external pattern of spacing in so one word Entropy is defined as:
H ( d B ) = - &Sigma; d &Element; d B P d log 2 P d - - - ( 7 )
Here d is also spacing, d belong to 1,2,3 ... N}, and PdRepresent is at dBThe probability that middle d occurs;At dB The word number that middle d occurs is nd, dBMiddle data amount check is SB, Pd=nd/SB
According to formula (7) even if going out the entropy of external schema;
Step (3.5) calculates word and occurs that the inherent of spacing and external pattern entropy are poor
ED2(d)=(H (dA))2-(H(dB))2 (8)
Step (3.6) calculates entropy difference normalization
Normalized entropy difference EDnorIt is defined as follows:
ED n o r q ( d ) = ED q ( d ) | ED g e o q ( d ) | - - - ( 9 )
Wherein,
ED g e o q ( d ) = ( - &Sigma; d &le; N / m p ( 1 - p ) d - 1 p A l o g p ( 1 - p ) d - 1 p A ) q - ( - &Sigma; d > N / m p ( 1 - p ) d - 1 p B l o g p ( 1 - p ) d - 1 p B ) q - - - ( 10 )
WhereinIn formula (10), q=2, d are word spacing, represent dAOr Person dBIn an element;What N/m represented is the expectation of spacing, the most above average headway value μ;What p=m/N represented is Word probability in the text, m is the word frequency of corresponding words, and it is the most total word quantity that N represents;p(1-p)d-1Be equivalent to d weight uncle exert Profit test;
Vocabulary is ranked up by step (4) according to entropy difference
According to the word divided in step (2), poor according to the entropy that the formula (6) of top calculates each word successively to (10), calculate After completing, all words are ranked up according to entropy difference is descending.
CN201310253678.6A 2013-06-24 2013-06-24 A kind of key word sort method that the inherent of spacing and external pattern entropy difference occur based on word Active CN103336806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310253678.6A CN103336806B (en) 2013-06-24 2013-06-24 A kind of key word sort method that the inherent of spacing and external pattern entropy difference occur based on word

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310253678.6A CN103336806B (en) 2013-06-24 2013-06-24 A kind of key word sort method that the inherent of spacing and external pattern entropy difference occur based on word

Publications (2)

Publication Number Publication Date
CN103336806A CN103336806A (en) 2013-10-02
CN103336806B true CN103336806B (en) 2016-08-10

Family

ID=49244971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310253678.6A Active CN103336806B (en) 2013-06-24 2013-06-24 A kind of key word sort method that the inherent of spacing and external pattern entropy difference occur based on word

Country Status (1)

Country Link
CN (1) CN103336806B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103744900A (en) * 2013-12-26 2014-04-23 合一网络技术(北京)有限公司 Visual discrimination difficulty combined text string weight calculation method and device
CN109033166B (en) * 2018-06-20 2022-01-07 国家计算机网络与信息安全管理中心 Character attribute extraction training data set construction method
CN110348497B (en) * 2019-06-28 2021-09-10 西安理工大学 Text representation method constructed based on WT-GloVe word vector

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101963972A (en) * 2010-07-01 2011-02-02 深港产学研基地产业发展中心 Method and system for extracting emotional keywords
CN102253996A (en) * 2011-07-08 2011-11-23 北京航空航天大学 Multi-visual angle stagewise image clustering method
CN102662936A (en) * 2012-04-09 2012-09-12 复旦大学 Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456058B (en) * 2010-11-02 2014-03-19 阿里巴巴集团控股有限公司 Method and device for providing category information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101963972A (en) * 2010-07-01 2011-02-02 深港产学研基地产业发展中心 Method and system for extracting emotional keywords
CN102253996A (en) * 2011-07-08 2011-11-23 北京航空航天大学 Multi-visual angle stagewise image clustering method
CN102662936A (en) * 2012-04-09 2012-09-12 复旦大学 Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning

Also Published As

Publication number Publication date
CN103336806A (en) 2013-10-02

Similar Documents

Publication Publication Date Title
CN103617157A (en) Text similarity calculation method based on semantics
CN103729474B (en) Method and system for recognizing forum user vest account
CN106708966A (en) Similarity calculation-based junk comment detection method
CN107122352A (en) A kind of method of the extracting keywords based on K MEANS, WORD2VEC
CN104778256B (en) A kind of the quick of field question answering system consulting can increment clustering method
CN105653518A (en) Specific group discovery and expansion method based on microblog data
CN108376133A (en) The short text sensibility classification method expanded based on emotion word
CN107992542A (en) A kind of similar article based on topic model recommends method
CN104199840B (en) Intelligent place name identification technology based on statistical model
CN107562831A (en) A kind of accurate lookup method based on full-text search
CN102411563A (en) Method, device and system for identifying target words
CN102298632B (en) Character string similarity computing method and device and material classification method and device
CN106407484A (en) Video tag extraction method based on semantic association of barrages
CN104298665A (en) Identification method and device of evaluation objects of Chinese texts
CN106021329A (en) A user similarity-based sparse data collaborative filtering recommendation method
CN104484380A (en) Personalized search method and personalized search device
CN103955703A (en) Medical image disease classification method based on naive Bayes
WO2017075912A1 (en) News events extracting method and system
CN100543735C (en) File similarity measure method based on file structure
CN103336806B (en) A kind of key word sort method that the inherent of spacing and external pattern entropy difference occur based on word
CN105787662A (en) Mobile application software performance prediction method based on attributes
CN107526792A (en) A kind of Chinese question sentence keyword rapid extracting method
CN103886077A (en) Short text clustering method and system
CN109214445A (en) A kind of multi-tag classification method based on artificial intelligence
CN103116573A (en) Field dictionary automatic extension method based on vocabulary annotation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant