CN101667194A - Automatic abstracting method and system based on user comment text feature - Google Patents

Automatic abstracting method and system based on user comment text feature Download PDF

Info

Publication number
CN101667194A
CN101667194A CN200910093409A CN200910093409A CN101667194A CN 101667194 A CN101667194 A CN 101667194A CN 200910093409 A CN200910093409 A CN 200910093409A CN 200910093409 A CN200910093409 A CN 200910093409A CN 101667194 A CN101667194 A CN 101667194A
Authority
CN
China
Prior art keywords
comment
feature
sentence
user
user comment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910093409A
Other languages
Chinese (zh)
Inventor
张铭
章彦星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN200910093409A priority Critical patent/CN101667194A/en
Publication of CN101667194A publication Critical patent/CN101667194A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides automatic abstracting method and system based on a user comment text feature. The method comprises the following steps: crawling and analyzing a user comment webpage and carrying out a series of pretreatments on user comments; identifying features commented by users from the user comments; classifying user comment sentences according to the commented features and filtering the feathers according to the classifying result of the comment sentences; calculating the score of the comment sentences and extracting several abstracting sentences to generate an abstract. The invention can accurately identify the features interested by the users from a large quantity of user comments, classify the comment sentences according to the features of the comments, and then automatically generate the concise and comprehensive abstract by using a test abstracting method based on sentence extraction, thereby helping users obviously improve the efficiency and the quality for acquiringknowledge. The invention can shorten the time of picking commodities for users, increase the shopping efficiency and improve the shopping experience when being used in the field of electronic commerce.

Description

Auto-abstracting method and autoabstract system thereof based on user comment text feature
Technical field
The present invention relates to auto-abstracting method and autoabstract system thereof that a kind of text feature at user comment carries out text summarization, belong to the knowledge excavation technical field.
Background technology
Be to use computer technology automatically to generate " one section than original text short and small and contained the text of important information in the original text " based on the autoabstract technology of text feature for electronic document.Along with the deep development of internet, the information explosion formula increases makes the text summarization The Application of Technology more and more widely.According to the difference of process object, text snippet can be divided into single document autoabstract and multi-document auto-abstracting two classes.
Single document autoabstract technology is the technology that generates summary for single document automatically, the main method that extracts based on sentence that adopts, promptly at first calculate the score of sentence according to factors such as the position of word frequency, sentence, syntactic structure, file structures, choose the highest some sentences of score then as the digest sentence, all digest sentences are become digest by their sequential organizations in original text.In addition, single document autoabstract can also be adopted the digest generation method based on natural language understanding, utilize linguistic knowledge to analyze the deep layer language construction of text, and utilize domain knowledge to semanteme judge, reasoning, obtain the semantic expressiveness of document, the semantic expressiveness according to document generates summary then.Compare, the method that sentence extracts is simpler, applied widely; And very complicated based on the digest generation method of natural language understanding, and depend on domain knowledge base, have the limited characteristic in strict field.Therefore, the method based on the sentence extraction is still taked in the single document autoabstract of main flow at present.
Multi-document auto-abstracting is for a plurality of documents under the same theme generate summary automatically, need to consider on the different document content redundancy with conflict.Multi-document auto-abstracting mainly contains three class methods: (1) use information extraction technique extracts the important information in each document, and manually customization or the semi-automatically template of generation summary are inserted template generation summary with the information that extracts; (2) at first use single document digest technology to generate summary, filter the content of wherein redundant and conflict then, will remain the content tissue and generate summary for each document; (3) at first all sentences of forming document are classified or cluster, the sentence of choosing the performance theme then from each set is organized into digest.Adopting an exemplary tool of the third method is MEAD, specifically referring to Radev D R, Jing H, Stys M, et al.Centroid-based summarization of multiple documents.Information Processing and Management, 2004,40:919-938.MEAD is a multi-document auto-abstracting system based on clustering documents and collection of document feature, MEAD carries out cluster to the sentence in many documents earlier, use statistical method to choose speech and the phrase that word frequency is the highest in each sentence set and form " barycenter " of pseudo-sentence as set, then in the set of computations similarity of other sentences and barycenter as the score of sentence, choose in each set the highest sentence of score at last and the generation documentation summary organized in the digest sentence as the digest sentence.
Along with the development of Web2.0, the internet becomes the platform that people can freely communicate one's views gradually, begins to occur a large amount of texts that comprises abundant subjective opinion on the network, as user comment etc.At present, the research object of text summarization mainly is that scientific and technical literature and news etc. have that rigorous file structure, diction are relatively uniform, the text of statement objective fact; And user comment is expressed the text of subjective opinion often at the things particular aspects, and it has, and structure is loose flexibly, the diversified characteristics of diction.Consider the above characteristics of user comment, the present invention has adopted the sorting technique based on feature, promptly at first analyzes a large amount of comments, therefrom identifies all features of user comment, according to the feature of sentence evaluation single comment sentence is classified then.At present the emotion analysis field has proposed the method for some recognition features from user comment, as the frequent item set mining method, and based on the method for probabilistic language model, the method for mode discovery and pattern match, and based on the unsupervised learning method of heuristic rule etc.
These subjectivities this paper enormous amount and the relatively dispersion that distributes are so the rich knowledge that will obtain wherein to comprise often will spend a large amount of time and efforts.It is that user comment generates summary that the present invention mainly adopts the 3rd class methods, proposes a kind of feature identification and filter algorithm, and precision ratio and the F1 value discerned by contrast experiment's characterization all are greatly improved.
Summary of the invention
In order to overcome the deficiency of prior art structure, the invention provides a kind of auto-abstracting method and autoabstract system thereof based on user comment text feature, it can be that a large amount of user comments generates succinct, comprehensive summary automatically, obtains knowledge to help people from user comment faster and betterly.The precision ratio of feature identification of the present invention and F1 value all have raising more significantly.The technical solution adopted for the present invention to solve the technical problems is:
A kind of auto-abstracting method based on user comment text feature, it may further comprise the steps:
Step 1, user comment pre-service: climb and get and resolve the user comment webpage, obtain user comment, then described user comment is carried out pre-service, obtain pretreated user comment;
Step 2, feature identification: by analyzing described pretreated user comment, therefrom identify the feature of being estimated by the user, from the described feature of being estimated by the user, use statistical method to identify candidate feature then;
Step 3, comment sentence classification: described candidate feature classification pressed in described pretreated user comment sentence, thereby obtain the comment sentence class of corresponding candidate feature;
Step 4, feature are filtered: according to a described comment class candidate feature is filtered, thereby obtain final feature and pairing candidate comments on a class;
Step 5, summary generates: calculate the score that described candidate comments on each sentence in the class, extract some digest sentences and generate summary.
Further comprise in the above-mentioned steps 1, described climbing got and resolved the user comment webpage and be meant, climbs all user comment webpages of getting this things at the specific things of choosing, and obtains climbing the user comment of getting, resolve the described user comment of getting of climbing then, obtain user comment text.
In the above-mentioned steps 1, described user comment is carried out pre-service be meant, the part of speech of institute all words of mark in the user comment is removed stop words wherein, and the residue word is carried out the stem extraction, obtains pretreated user comment text.
In the above-mentioned steps 2, the described feature of being estimated by the user is meant certain side, certain details, certain attribute or certain ingredient that the user has in mind when estimating certain part things.
In the above-mentioned steps 2, described use statistical method identifies candidate feature and is meant: extract all nouns in the described corresponding user comment sentence of being estimated by the user of feature, calculate the frequency that single noun occurs and the frequency of any two noun co-occurrences; Choose the highest noun of the highest single noun of the frequency of occurrences and co-occurrence frequency as candidate feature.
Candidate feature is filtered described in the above-mentioned steps 4 be meant, the relative position that occurs in the comment sentence according to the noun of composition characteristic, and the extensive and specialization relation on the meaning between each feature filter meaningless and redundant candidate feature.
Further comprise in the above-mentioned steps 5: adopt statistical method to calculate the keyword of each comment sentence class theme of performance, then according to the compatible degree of comment sentence content and theme, the length of comment sentence and the position that the comment sentence occurs in the entire chapter comment, calculate the score of comment sentence, extract the some former comment sentence tissue that score is the highest in the user comment sentence class then and generate summary.
A kind of auto-abstracting method based on user comment text feature, comprise that further described employing statistical method calculates each keyword of commenting on sentence class theme of performance and is meant, on basis to the classification of comment sentence, use the method for adding up to find out the keyword of each class, pseudo-sentence one barycenter of this comment sentence class theme of structure expression calculates based on the similarity of comment sentence with barycenter; The compatible degree of described comment sentence content and theme is meant the similarity of comment and barycenter.
A kind of autoabstract system based on user comment text feature, it comprises:
The user comment pretreatment module: it is used to climb gets and resolves user comment, then described user comment is carried out pre-service;
The feature identification module: it identifies the feature of being estimated by the user by analyzing described pretreated user comment from user comment, from the described feature of being estimated by the user, uses statistical method to identify candidate feature then.
Comment sentence sort module: it is classified the user comment sentence by described candidate feature, thereby obtains the comment sentence class of corresponding candidate feature;
The feature filtering module: it further filters candidate feature according to a comment sentence sorting result, thereby obtains interested candidate feature as final feature, and obtains pairing candidate and comment on a class;
The summary generation module: it is used for calculating the score that described candidate comments on a class, extracts some digest sentences and generates summary.
Wherein, the user comment pretreatment module sends to the feature identification module with the pre-service result, the candidate feature that obtains identifying, to send into a comment sentence sort module through the candidate feature that the pretreated user comment text of described user comment module and feature identification module identify and classify, obtain a comment sentence class; Described candidate feature is filtered the candidate who obtains final feature and correspondence thereof comment on a class; The summary generation module comments on a class with described candidate and described final feature is carried out statistical study and generated summary as input.
Beneficial effect of the present invention:
The present invention proposes a kind of auto-abstracting method, for the first time the text summarization technology is applied to comprise the user comment that enriches subjective information, and proposed sorting technique based on feature at the characteristics of user comment based on user comment text.
The inventive method can generate succinctly, comprehensively user comment is made a summary, and shortens the user greatly and reads the time that useful information is obtained in comment, improves the knowledge utilization rate; This method based on feature suits the user comment own characteristic, and the feature identification that the present invention proposes and the precision ratio of feature filter algorithm can reach more than 81%, and recall ratio can reach 52%, and the contrast algorithm that precision ratio and F1 value are chosen all is greatly improved.Under the background that information explosion cybertimes formula increases, user comment auto-abstracting method according to the present invention is significant, and can be widely used in numerous areas such as ecommerce, can significantly improve quality and the efficient of obtaining knowledge from magnanimity information.
Description of drawings
Fig. 1 is the general flow chart according to the auto-abstracting method based on user comment text feature of the present invention;
Fig. 2 is the process flow diagram according to the comment sentence classification of the inventive method;
Fig. 3 is the process flow diagram according to the summary generation of the inventive method.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in further detail:
Embodiment 1:
Below in conjunction with an example that in ecommerce, generates summary, describe the specific embodiment of the present invention in detail for user comment.
Ecommerce is that Web important on the internet uses, e-commerce website often allows the user that commodity are made comments, these are comprising the comment that the user experiences the subjectivity of commodity purchasing and use, usually can be used as other users and select the reference of businessman and commodity, also can be used as the foundation that businessman improves service.Much-sought-after item on the large-scale website often comprises hundreds and thousands of user comments, reads very consuming time.The present invention can generate succinct, comprehensive summary automatically for a large number of users comment, improves the efficient of knowledge acquisition greatly.
As shown in Figure 1, the user comment auto-abstracting method based on feature mainly comprises following step:
Step 1 user comment pre-service: climb and get and resolve user comment, then described user comment is carried out pre-service.
The user comment that is commodity in the ecommerce generates summary, at first needs to swash from e-commerce website to get all user comment webpages at these commodity.In the present embodiment, swash from www.amazon.com and to get all user comment webpages at commodity Apple iPod touch, analyzing web page obtains 939 user comments.
Before beginning autoabstract, need carry out series of preprocessing to user comment.Use StanfordPart-of-Speech Tagger that user comment is carried out part-of-speech tagging, Stanford Part-of-SpeechTagger is a part-of-speech tagging device that uses maximum entropy model, and accuracy rate can reach 96.86%.In addition, the deletion stop words in the deletion user comment, using Porter Stemmer is that remaining word extracts stem.Comment sentence employing vector space model after the processing is represented and is stored.
Step 2 feature identification: by analyzing a large number of users comment, from user comment, identify the feature of being estimated by the user, from the described feature of being estimated by the user, use statistical method to identify candidate feature then.
As previously mentioned, the feature of things is certain side, certain details or certain attribute, certain ingredient that the user has in mind when estimating certain part things.In e-commerce field, the attribute or the ingredient of the commodity often that the user has in mind itself, perhaps certain side or the details in the shopping process, these are collectively referred to as feature.These features are noun or two phrases that noun is formed often; Because different user adopts identical word representation feature, and often expresses their shopping and user experience with different words, so the frequency that the word of representation feature occurs is higher than other words.Based on this, the present invention adopts a kind of statistical method based on frequent item set mining to carry out feature identification, can discern the feature of extensive stock adaptively.
The problem description of frequent item set mining is: D=<S 1, S 2..., S NBe a set that contains N collection, wherein I=1,2 ..., N is a N iThe item collection, t j, j=1,2 ..., n iIt is an item.There is N*minsupport collection S in a given parameter minimum support minsupport, frequent item set mining at least in order to find out among all item collection S:D that meet the following conditions k, make
Figure A20091009340900112
The Apriori algorithm is one of classic algorithm of frequent item set mining.It adopts the breadth-first search strategy, utilizes Apriori character, and the m item collection that promptly satisfies the minimum support requirement must be the subclass (m>n), dwindled the search volume effectively of the union of all n item collection that meet the demands.
Different with the Apriori algorithm, the characteristics algorithm of this paper is only discerned individual character and double word feature, and promptly 1-item collection and 2-item collection especially are in particular both and have specified different minimum support minsupport1 and minsupport2.Why like this, be will be far below the frequency of individual character feature appearance because form the frequency of two word co-occurrences of double word feature.If both adopt same minimum support, then parameter too conference cause effectively discerning the double word feature, the too little meeting of parameter causes discerning the individual character feature of a large amount of mistakes.This algorithm steps is as follows:
1) extracts all nouns generation transaction files in the user comment, the noun that occurs in comment sentence of the every behavior of file;
2) traversal transaction file, the support of adding up each noun; Total line number of transaction file, promptly the comment sentence adds up to N;
3) choosing the noun that support is not less than minsupport1 is the individual character feature;
4) be not less than the noun of minsupport2 as double word feature Candidate Set with all supports;
5) traversal transaction file, the support of adding up the phrase that any two nouns form is chosen support and is not less than the phrase of minsupport2 as the double word feature.
As algorithm steps 4) shown in, choose support and be not less than the noun of minsupport2 rather than all nouns as double word feature Candidate Set, be to utilize Apriori character to dwindle the search volume.Two parameter m insupport1 in the algorithm and minsupport2 obtain by some row experimental learnings, and wherein individual character feature support minsupport1 is 0.012, and double word feature support minsupport2 is 0.005.
Step 3 comment sentence classification: described candidate feature classification pressed in the user comment sentence, thereby obtain the comment sentence class of corresponding candidate feature.
After identifying all features of being estimated of commodity, analyze the feature that each comment sentence is estimated successively, the comment sentence is assigned in the comment sentence class of this feature correspondence.So obtain a series of comment sentence classes, wherein the corresponding feature of each class comprises all comment sentences of estimating this feature.
Step 4 feature is filtered: according to a comment sentence sorting result candidate feature is filtered, thereby obtain interested candidate feature and pairing candidate comments on sentence.
After finishing the classification of comment sentence, need be according to sorting result, the position that two words of consideration composition double word feature occur in the comment sentence and the number of times of appearance filter insignificant double word feature; Filter redundant individual character feature according to candidate's individual character feature in conceptive relation of inclusion then with the double word feature.
Filter for the double word feature, observe the appearance of two words in the comment sentence close together often of forming the double word feature, and the relative order that is consistent.Defined the notion of effective double word feature for this reason.
Define 1 one effective double word feature f=<w 1, w 2Should meet the following conditions:
(1) f=<w 1, w 2Co-occurrence in comment sentence s, w 1And w 2Keep w 1At preceding w 2After relative order, and both the distance of position occurs less than given threshold value windowsize;
(2) upgrade the number of the support of double word feature for the comment sentence of satisfy condition (1), the support of double word feature must be greater than given threshold value minsupp.
If double word feature f=<w 1, w 2Support less than given threshold value, then this double word feature is insignificant.
Filter for the individual character feature, defined the notion of the pure support (pure support) of individual character feature.
Define 2 known all double word feature f 1, f 2...,
Figure A20091009340900121
The pure support of an individual character feature w is meant that w occurs and f 1, f 2...,
Figure A20091009340900122
The sum of absent variable comment sentence.
Effectively the individual character feature is meant that pure support is not less than the individual character feature of given threshold value minpsupp, and pure support is redundant less than the individual character feature of minpsupp.
For example, battery life and life are the feature that algorithm 1 identifies, and the support of battery life is 20, and the support of life is 30, and then the pure support of life is 30-20=10.If given minpsupp=20, then life is redundant individual character feature.
The classification of comment sentence is as follows with the specific descriptions of feature filter algorithm:
Input: through pretreated user comment, and the candidate feature that identifies of algorithm 1
Output: through the feature of filtering, and the comment sentence class of each feature correspondence
Process: Classifier (windowsize, minsupp, minpsupp)
Figure A20091009340900131
2while reads in a comment sentence s i
3for s iIn each word w j
4if w jBe the individual character feature then that algorithm 1 identifies
5off j=w jAt s iThe middle position that occurs
6nouns=nouns∪(w j,off j)
7 will comment on sentence s iAssign to individual character feature w jCorresponding comment sentence class c j
Among the 8for nouns each is to noun (w j, off j), (w k, off k)
9if<w j, w kBe double word Te Zheng ﹠amp; ﹠amp; Off k-off j<windowsize then
10 with s iAssign to double word feature<w j, w kCorresponding comment sentence class c Jk
11else if<w k, w jBe double word Te Zheng ﹠amp; ﹠amp; Off j-off k<windowsize then
12 with s iAssign to double word feature<w k, w jCorresponding comment sentence class c Kj
Each double word feature<w of 13for j, w k
14 according to definition 1 renewal<w j, w kSupport supp Jk
15if?supp jk<minsupp?then
16 deletion double word feature<w j, w k
Each noun w that 17for occurs in the double word feature j
18 according to definition 1 calculating w jPure support psupp j
19if?psupp j<minpsupp?then
20 deletions are word feature w early j
The classification of the capable one-tenth comment of algorithm 2 1-12 sentence, as shown in Figure 2, a given comment sentence, algorithm judges earlier whether each noun that wherein occurs is the individual character feature, the noun of judging per two individual character features composition then will be commented on the comment sentence class that corresponding individual character feature or double word feature correspondence assigned in sentence then to whether being the double word feature.Concrete comment sentence assorting process is as follows:
(1) reads in a comment sentence s, the noun w that record wherein occurs 1, w 2..., w t, judge w i(i=1 ... t) be the individual character feature? if not, continue to handle next noun w among the s I+1All nouns that in handling s, occur; (2) if w iBe the individual character feature, then s assigned to w iCorresponding class c i, with w iAdd nouns; To among the nouns each to noun<w j, w k, judgement<w j, w kBe the double word feature? if so s is assigned to<w j, w kCorresponding class c JkOtherwise, continue to get back to (1) and continue to handle next noun among the s.
Algorithm 2 13-16 are capable to carry out the filtration of double word feature according to definition 1, and 17-20 is capable to finish the filtration of individual character feature according to definition 2.Ultimate range between the position, the minimum pure support of the minimum support of double word feature and individual character feature appear in two nouns that three parameter windowsize, minsupp and minpsupp represent to form the double word feature respectively in the comment sentence.Through serial experiment study, the windowsize value is 2, and the value of minsupp and minpsupp is identical with minsupport2 and minsupport1 respectively, is 0.005 and 0.012.
Step 5 summary generates: calculate the score that described candidate comments on sentence, extract some digest sentences and generate summary.
On the basis of comment sentence classification, the method that the present invention uses sentence to extract generates summary.The process flow diagram that Fig. 3 generates for summary.As shown in Figure 3,, calculate the weight of forming comment sentence word earlier, extract the centroid vector that this comment sentence class theme of expression formed in the highest keyword of some weights for each comment sentence class; Based on the score of comment sentence, extract the digest sentence of the highest some comment sentences of score then as this classification according to compressibility with similarity, comment sentence length and the sentence position calculation comment sentence in the entire chapter comment of barycenter; Digest sentence according to each comment sentence class of certain series arrangement generates summary at last.
D=<s 1, s 2..., s NBe the comment sentence classification of certain certain feature of product, N is the number of comment sentence among the d.
Figure A20091009340900151
I=1,2 ..., N is comment sentence s iVector model represent that n is the sum of occurring words in the whole comment sentence classification, w IjMiddle i is the identifier of comment sentence, and j is the global identifier of word.
Figure A20091009340900152
I=1,2 ..., N, j=1,2 ..., n is word w jWeights.Especially, work as w jNot at s iIn when occurring
Figure A20091009340900153
The barycenter of comment sentence classification d is a pseudo-sentence that can reflect the theme of this classification, adopts vector model to represent equally,
Figure A20091009340900154
Wherein
Figure A20091009340900155
Be keyword w kWeight, computing method are: v w k = v w k * Σ j = 1 n v w j * 2 , J=1,2 ..., n, and v w k * = tf w k * idf w k , tf w k = Σ i = 1 N tf w k , s i ,
Figure A20091009340900159
For each comment sentence, calculate following three kinds of scores:
(1) score based on barycenter is as follows:
scor e c ( s i ) = Σ k = 1 n ( v w ik * w k ) , 0 ≤ score c ( s i ) ≤ 1
The vector of i.e. expression comment sentence and the cosine similarity of centroid vector.Because barycenter is the pseudo-sentence of expression collection of document theme, can reflect the theme of collection of document more to the similar more comment sentence of barycenter, so score is high more.
(2) score based on comment sentence length is as follows:
Figure A200910093409001511
The short more sentence score of length is high more, can make the summary of equal length comprise more sentence, thereby comprises abundant more information.
(3) score based on the first sentence of paragraph is as follows:
Figure A20091009340900161
According to the research of Baxendale, the position of sentence in document is great to the influence of sentence importance, and the probability that the first sentence of paragraph is this paragraph center sentence is 85%.Therefore, the first sentence of paragraph must be divided into 1.
For a comment sentence s i, it is initial must be divided into based on barycenter and based on the linearity of the score of length and, promptly
score 0(s i)=α*score c(s i)+β*score l(s i)+γ*score f(s i)
Wherein α is the weights based on the barycenter score, and β is based on the weights of the score of comment sentence length, and γ is based on the weights of the score of the first sentence of paragraph, 0<α, beta, gamma<1 and alpha+beta+γ=1.Consider quality and the actual application demand that generates summary by a series of experiments, choose α=0.5, β=0.3, γ=0.2.
After obtaining commenting on the initial score of sentence, from each classification, extract the highest sentence of score successively and add summary; If summary length does not reach the restriction of compressibility, then after each iteration, recomputate the score of residue comment sentence in each classification, extract the highest sentence of score then and add summary, iteration finishes when summary length reaches restriction.During (k+1) inferior iteration, a comment sentence s iThe score computing method be:
score k + 1 ( s i ) = score k ( s i ) - 1 N score k ( s k * )
S wherein k *It is the highest comment sentence of choosing after the k time iteration of score.The purpose that recomputates the sentence score after each iteration is for for to give higher score with the dissimilar sentence of having chosen of sentence content, the redundance of the summary that is generated with reduction.
The final relative order that generates between the digest sentence that to consider when making a summary from each comment sentence class, to choose.Here earlier with the descending sort of feature, choose a digest sentence successively in the comment sentence class of each feature correspondence and add summary by support.
Performance evaluating
User comment auto-abstracting method based on feature at first needs the analysis user comment to identify the feature of being estimated, then all comment sentences are classified according to the feature of being estimated, the method for using sentence to extract extracts digest and generates summary from each comment sentence class.Therefore, the quality of feature identification is most important for the quality that generates summary.
The index of the quality of evaluation and test feature identification mainly contains following three:
Recall ratio (Recall)
Precision ratio (Precision)
Figure A20091009340900172
F1 value (F1-measure)
Figure A20091009340900173
In the application of user comment summary, some feature is only quilt user's evaluation seldom often, and should pay the utmost attention to by the feature of user's common concern under the limited situation of summary length, so the precision ratio of feature identification is more important than recall ratio.
The selected contrast algorithm of experiment is Hu﹠amp; The Apriori algorithm that Liu adopts in emotion analytic system FBS research (Hu Minqing, Liu Bing.Mining and Summarizing Customer Reviews.SIGKDD, 2004,168-177).Experimental data is the English user comment from 5 kinds of commodity of e-commerce website amazon, cnet and epinions collection, comprise 2 sections of mobile phones, 1 section of notebook computer, 1 section of MP3 player and 1 amount of money sign indicating number camera, every kind of commodity have hundreds of user comments.
At first choose a mark person and read all user comments, manually mark out feature wherein, table 1 the 2nd is classified the characteristic number of the artificial mark of extensive stock as.The feature of comparison algorithm identification and manually mark feature then, the 3rd row and the 7th row are respectively the characteristic number that algorithm identified goes out; The characteristic number that statistic algorithm identification is correct is calculated precision ratio, recall ratio and F1 value.Experimental result shows that the feature identification that the present invention adopts and the recall ratio of filter algorithm are 51.9%, and precision ratio is 81.0%, and the F1 value is 62.7%, has improved 24% compared to contrast algorithm precision ratio, and the F1 value has improved 6%.
The quality assessment of table 1 feature identification
Figure A20091009340900181
Under the prerequisite of feature identification accurately, given compressibility (getting 1% in the experiment), user comment auto-abstracting method based on feature can generate the summary that covers all identified features (recall ratio is 51.9%), and can shorten reading time greatly (1%), thereby significantly improve the user obtains useful information from the mass users comment efficient, this has great practice significance and application prospect in the cybertimes that the information explosion formula increases.
Below only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and the inventive method is equally applicable to the expansion realm of sale of electronic product, e-book, mobile phone and the raising user degree of association.In addition, anyly be familiar with those skilled in the art in the technical scope that the present invention discloses, the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.

Claims (10)

1. auto-abstracting method based on user comment text feature, it may further comprise the steps:
Step 1, user comment pre-service: climb and get and resolve the user comment webpage, obtain user comment, then described user comment is carried out pre-service, obtain pretreated user comment;
Step 2, feature identification: by analyzing described pretreated user comment, therefrom identify the feature of being estimated by the user, from the described feature of being estimated by the user, use statistical method to identify candidate feature then;
Step 3, comment sentence classification: described candidate feature classification pressed in described pretreated user comment sentence, thereby obtain the comment sentence class of corresponding candidate feature;
Step 4, feature are filtered: according to a described comment class described candidate feature is filtered, thereby obtain final feature and pairing candidate comments on a class;
Step 5, summary generates: calculate the score that described candidate comments on each sentence in the class, extract some digest sentences and generate summary.
2. the auto-abstracting method based on user comment text feature according to claim 1, it is characterized in that: in the step 1, described climbing got and resolved the user comment webpage and be meant, climb all user comment webpages of getting this things at the specific things of choosing, obtain climbing the user comment of getting, resolve the described user comment of getting of climbing then, obtain user comment text.
3. the auto-abstracting method based on user comment text feature according to claim 1, it is characterized in that: in the step 1, described user comment is carried out pre-service to be meant, mark the part of speech of all words in the described user comment, remove stop words wherein, and the residue word is carried out stem extract, obtain described pretreated user comment.
4. the auto-abstracting method based on user comment text feature according to claim 1, it is characterized in that the feature of being estimated by the user described in the step 2 is meant certain side, certain details, certain attribute or certain ingredient that the user has in mind when estimating certain part things.
5. the auto-abstracting method based on user comment text feature according to claim 1, it is characterized in that, using statistical method to identify candidate feature described in the step 2 is meant: extracts all nouns in the described corresponding user comment sentence of being estimated by the user of feature, calculate the frequency of single noun appearance and the frequency of any two noun co-occurrences; Choose the highest noun of the highest single noun of the frequency of occurrences and co-occurrence frequency as candidate feature.
6. the auto-abstracting method based on user comment text feature according to claim 1, it is characterized in that: candidate feature is filtered described in the step 4 is meant, the relative position that in the comment sentence, occurs according to the noun of composition characteristic, and the extensive and specialization relation on the meaning between each feature, filter meaningless and redundant candidate feature.
7. the auto-abstracting method based on user comment text feature according to claim 1 is characterized in that: calculate score that described candidate comments on sentence described in the step 5 and be meant that length, position and the content of commenting on sentence according to described candidate calculate the score that described candidate comments on sentence.
8. according to claim 1 or 7 described auto-abstracting methods based on user comment text feature, it is characterized in that: step 5 further comprises: adopt statistical method to calculate the keyword of each comment sentence class theme of performance, then according to the compatible degree of comment sentence content and theme, the length of comment sentence and the position that the comment sentence occurs in the entire chapter comment, calculate the score of comment sentence, extract the some former comment sentence tissue that score is the highest in the user comment sentence class then and generate summary.
9. the auto-abstracting method based on user comment text feature according to claim 8, it is characterized in that: the keyword that described employing statistical method calculates each comment sentence class theme of performance is meant, on basis to the classification of comment sentence, use the method for adding up to find out the keyword of each class, structure calculates based on the similarity of comment sentence with barycenter as the barycenter of the pseudo-sentence of this comment sentence class theme of expression; The compatible degree of described comment sentence content and theme is meant the similarity of comment and barycenter.
10. autoabstract system based on user comment text feature, it comprises:
The user comment pretreatment module: it is used to climb gets and resolves the user comment webpage, obtains user comment, then described user comment is carried out pre-service, obtains pretreated user comment;
The feature identification module: it therefrom identifies the feature of being estimated by the user by analyzing described pretreated user comment, from the described feature of being estimated by the user, uses statistical method to identify candidate feature then;
Comment sentence sort module: it is classified described pretreated user comment sentence by described candidate feature, thereby obtains the comment sentence class of corresponding candidate feature;
The feature filtering module: its according to described comment sentence classification described candidate feature is filtered, thereby obtain final feature and pairing candidate comments on a class;
The summary generation module: calculate the score that described candidate comments on each sentence in the class, extract some digest sentences and generate summary,
Wherein, the user comment pretreatment module sends to the feature identification module with the pre-service result, the candidate feature that obtains identifying; To send into a comment sentence sort module through the candidate feature that the pretreated user comment text of described user comment module and feature identification module identify and classify, obtain a comment sentence class; Described candidate feature is filtered the candidate who obtains final feature and correspondence thereof comment on a class; The summary generation module comments on a class with described candidate and described final feature is carried out statistical study and generated summary as input.
CN200910093409A 2009-09-29 2009-09-29 Automatic abstracting method and system based on user comment text feature Pending CN101667194A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910093409A CN101667194A (en) 2009-09-29 2009-09-29 Automatic abstracting method and system based on user comment text feature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910093409A CN101667194A (en) 2009-09-29 2009-09-29 Automatic abstracting method and system based on user comment text feature

Publications (1)

Publication Number Publication Date
CN101667194A true CN101667194A (en) 2010-03-10

Family

ID=41803810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910093409A Pending CN101667194A (en) 2009-09-29 2009-09-29 Automatic abstracting method and system based on user comment text feature

Country Status (1)

Country Link
CN (1) CN101667194A (en)

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102214201A (en) * 2010-04-08 2011-10-12 微软公司 Deriving statement from product or service reviews
CN102236722A (en) * 2011-08-17 2011-11-09 广州索答信息科技有限公司 Method and system for generating user comment summaries based on triples
CN102682120A (en) * 2012-05-15 2012-09-19 合一网络技术(北京)有限公司 Method,device and system for acquiring essential article commented on network
CN102737017A (en) * 2011-03-31 2012-10-17 北京百度网讯科技有限公司 Method and apparatus for extracting page theme
CN102945268A (en) * 2012-10-25 2013-02-27 北京腾逸科技发展有限公司 Method and system for excavating comments on characteristics of product
CN103164473A (en) * 2011-12-08 2013-06-19 易搜比控股公司 Automatic abstract judgment method for file clusters
CN103246687A (en) * 2012-06-13 2013-08-14 苏州大学 Method for automatically abstracting Blog on basis of feature information
CN103324622A (en) * 2012-03-21 2013-09-25 北京百度网讯科技有限公司 Method and device for automatic generating of front page abstract
CN103389971A (en) * 2013-07-04 2013-11-13 北京卓易讯畅科技有限公司 Method and equipment for determining high-quality grade of comment content corresponding to application
CN103778235A (en) * 2014-01-26 2014-05-07 北京京东尚科信息技术有限公司 Method and device for processing commodity assessment information
CN103870973A (en) * 2012-12-13 2014-06-18 阿里巴巴集团控股有限公司 Information push and search method and apparatus based on electronic information keyword extraction
CN103970783A (en) * 2013-01-31 2014-08-06 百度在线网络技术(北京)有限公司 LBS (Location Based Service)-based information acquisition method and equipment
CN103970786A (en) * 2013-01-31 2014-08-06 百度在线网络技术(北京)有限公司 LBS (Location Based Service)-based information obtaining method and equipment
CN104182780A (en) * 2014-08-21 2014-12-03 五八同城信息技术有限公司 Method for automatically generating dining comments and terminal equipment
CN104462363A (en) * 2014-12-08 2015-03-25 百度在线网络技术(北京)有限公司 Aspect displaying method and device
CN104778184A (en) * 2014-01-15 2015-07-15 腾讯科技(深圳)有限公司 Feedback keyword determining method and device
CN105279272A (en) * 2015-10-30 2016-01-27 南京未来网络产业创新有限公司 Content aggregation method based on distributed web crawlers
CN105760471A (en) * 2016-02-06 2016-07-13 北京工业大学 Classification method for two types of texts based on multiconlitron
CN106126620A (en) * 2016-06-22 2016-11-16 北京鼎泰智源科技有限公司 Method of Chinese Text Automatic Abstraction based on machine learning
CN106294425A (en) * 2015-05-26 2017-01-04 富泰华工业(深圳)有限公司 The automatic image-text method of abstracting of commodity network of relation article and system
CN106663087A (en) * 2014-10-01 2017-05-10 株式会社日立制作所 Text generation system
CN106933864A (en) * 2015-12-30 2017-07-07 中国科学院深圳先进技术研究院 A kind of search engine system and its searching method
WO2017147785A1 (en) * 2016-03-01 2017-09-08 Microsoft Technology Licensing, Llc Automated commentary for online content
CN107302474A (en) * 2017-07-04 2017-10-27 四川无声信息技术有限公司 The feature extracting method and device of network data application
CN107749032A (en) * 2017-09-06 2018-03-02 广东中标数据科技股份有限公司 A kind of topic management platform based on content analysis, method and device
CN108021545A (en) * 2016-11-03 2018-05-11 北京国双科技有限公司 A kind of case of administration of justice document is by extracting method and device
CN108280688A (en) * 2018-01-29 2018-07-13 京东方科技集团股份有限公司 The comment information analysis method of object, apparatus and system
CN108550380A (en) * 2018-04-12 2018-09-18 北京深度智耀科技有限公司 A kind of drug safety information monitoring method and device based on public network
CN108681977A (en) * 2018-03-27 2018-10-19 成都律云科技有限公司 A kind of lawyer's information processing method and system
CN109035074A (en) * 2018-06-29 2018-12-18 贵安新区搜床科技有限公司 A kind of property method of state management, terminal device and computer readable storage medium
CN109284504A (en) * 2018-10-22 2019-01-29 平安科技(深圳)有限公司 It grinds to call the score using the security of deep learning model and analyses method and device
CN109388804A (en) * 2018-10-22 2019-02-26 平安科技(深圳)有限公司 Report core views extracting method and device are ground using the security of deep learning model
CN109683946A (en) * 2018-12-13 2019-04-26 南开大学 A kind of user comment recommended method based on Code Clones technology
CN109684473A (en) * 2018-12-28 2019-04-26 丹翰智能科技(上海)有限公司 A kind of automatic bulletin generation method and system
CN110019814A (en) * 2018-07-09 2019-07-16 暨南大学 A kind of news information polymerization based on data mining and deep learning
CN110019726A (en) * 2017-12-22 2019-07-16 百度在线网络技术(北京)有限公司 Generation method and device, the computer equipment and readable medium of books book review
CN110399547A (en) * 2018-04-17 2019-11-01 百度在线网络技术(北京)有限公司 For updating the method, apparatus, equipment and storage medium of model parameter
WO2019214236A1 (en) * 2018-05-11 2019-11-14 北京三快在线科技有限公司 User-generated content summary determining and user-generated content recommending
CN110597978A (en) * 2018-06-12 2019-12-20 北京京东尚科信息技术有限公司 Article abstract generation method and system, electronic equipment and readable storage medium
CN110704605A (en) * 2018-06-25 2020-01-17 北京京东尚科信息技术有限公司 Method, system and equipment for automatically generating article abstract and readable storage medium
CN111046252A (en) * 2019-11-20 2020-04-21 北京字节跳动网络技术有限公司 Information processing method, device, medium, electronic equipment and system
CN111199151A (en) * 2019-12-31 2020-05-26 联想(北京)有限公司 Data processing method and data processing device
CN112667812A (en) * 2020-12-30 2021-04-16 云南电网有限责任公司 Method for identifying power supply service customer electricity quantity and electricity charge demand
CN112818660A (en) * 2021-01-26 2021-05-18 山西三友和智慧信息技术股份有限公司 Product description generation method based on user evaluation
CN112883145A (en) * 2020-12-24 2021-06-01 浙江万里学院 Emotion multi-tendency classification method for Chinese comments
TWI772709B (en) * 2019-11-14 2022-08-01 雲拓科技有限公司 Automatic claim-element-noun-and-position-thereof obtaining equipment for no-space text
US11514242B2 (en) 2019-08-10 2022-11-29 Chongqing Sizai Information Technology Co., Ltd. Method for automatically summarizing internet web page and text information
CN115618852A (en) * 2022-11-22 2023-01-17 山东天成书业有限公司 Text digital automatic proofreading system

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102214201A (en) * 2010-04-08 2011-10-12 微软公司 Deriving statement from product or service reviews
CN102737017A (en) * 2011-03-31 2012-10-17 北京百度网讯科技有限公司 Method and apparatus for extracting page theme
CN102737017B (en) * 2011-03-31 2015-03-11 北京百度网讯科技有限公司 Method and apparatus for extracting page theme
CN102236722B (en) * 2011-08-17 2014-08-27 广州索答信息科技有限公司 Method and system for generating user comment summaries based on triples
CN102236722A (en) * 2011-08-17 2011-11-09 广州索答信息科技有限公司 Method and system for generating user comment summaries based on triples
CN103164473A (en) * 2011-12-08 2013-06-19 易搜比控股公司 Automatic abstract judgment method for file clusters
CN103324622A (en) * 2012-03-21 2013-09-25 北京百度网讯科技有限公司 Method and device for automatic generating of front page abstract
CN102682120A (en) * 2012-05-15 2012-09-19 合一网络技术(北京)有限公司 Method,device and system for acquiring essential article commented on network
CN103246687A (en) * 2012-06-13 2013-08-14 苏州大学 Method for automatically abstracting Blog on basis of feature information
CN103246687B (en) * 2012-06-13 2016-08-17 苏州大学 The Blog auto-abstracting method of feature based information
CN102945268A (en) * 2012-10-25 2013-02-27 北京腾逸科技发展有限公司 Method and system for excavating comments on characteristics of product
CN103870973A (en) * 2012-12-13 2014-06-18 阿里巴巴集团控股有限公司 Information push and search method and apparatus based on electronic information keyword extraction
CN103870973B (en) * 2012-12-13 2017-12-19 阿里巴巴集团控股有限公司 Information push, searching method and the device of keyword extraction based on electronic information
CN103970783A (en) * 2013-01-31 2014-08-06 百度在线网络技术(北京)有限公司 LBS (Location Based Service)-based information acquisition method and equipment
CN103970786A (en) * 2013-01-31 2014-08-06 百度在线网络技术(北京)有限公司 LBS (Location Based Service)-based information obtaining method and equipment
CN103389971A (en) * 2013-07-04 2013-11-13 北京卓易讯畅科技有限公司 Method and equipment for determining high-quality grade of comment content corresponding to application
CN104778184A (en) * 2014-01-15 2015-07-15 腾讯科技(深圳)有限公司 Feedback keyword determining method and device
CN103778235A (en) * 2014-01-26 2014-05-07 北京京东尚科信息技术有限公司 Method and device for processing commodity assessment information
CN104182780A (en) * 2014-08-21 2014-12-03 五八同城信息技术有限公司 Method for automatically generating dining comments and terminal equipment
CN104182780B (en) * 2014-08-21 2018-07-03 五八同城信息技术有限公司 A kind of method and terminal device for automatically generating comment of having dinner
CN106663087A (en) * 2014-10-01 2017-05-10 株式会社日立制作所 Text generation system
CN106663087B (en) * 2014-10-01 2019-08-16 株式会社日立制作所 Article generates system
CN104462363A (en) * 2014-12-08 2015-03-25 百度在线网络技术(北京)有限公司 Aspect displaying method and device
CN104462363B (en) * 2014-12-08 2018-10-23 百度在线网络技术(北京)有限公司 Comment point shows method and apparatus
CN106294425A (en) * 2015-05-26 2017-01-04 富泰华工业(深圳)有限公司 The automatic image-text method of abstracting of commodity network of relation article and system
CN106294425B (en) * 2015-05-26 2019-11-19 富泰华工业(深圳)有限公司 The automatic image-text method of abstracting and system of commodity network of relation article
CN105279272A (en) * 2015-10-30 2016-01-27 南京未来网络产业创新有限公司 Content aggregation method based on distributed web crawlers
CN106933864A (en) * 2015-12-30 2017-07-07 中国科学院深圳先进技术研究院 A kind of search engine system and its searching method
CN105760471A (en) * 2016-02-06 2016-07-13 北京工业大学 Classification method for two types of texts based on multiconlitron
CN105760471B (en) * 2016-02-06 2019-04-19 北京工业大学 Based on the two class text classification methods for combining convex linear perceptron
US11922300B2 (en) 2016-03-01 2024-03-05 Microsoft Technology Licensing, Llc. Automated commentary for online content
WO2017147785A1 (en) * 2016-03-01 2017-09-08 Microsoft Technology Licensing, Llc Automated commentary for online content
CN106126620A (en) * 2016-06-22 2016-11-16 北京鼎泰智源科技有限公司 Method of Chinese Text Automatic Abstraction based on machine learning
CN108021545A (en) * 2016-11-03 2018-05-11 北京国双科技有限公司 A kind of case of administration of justice document is by extracting method and device
CN107302474A (en) * 2017-07-04 2017-10-27 四川无声信息技术有限公司 The feature extracting method and device of network data application
CN107302474B (en) * 2017-07-04 2020-02-04 四川无声信息技术有限公司 Feature extraction method and device for network data application
CN107749032A (en) * 2017-09-06 2018-03-02 广东中标数据科技股份有限公司 A kind of topic management platform based on content analysis, method and device
CN110019726A (en) * 2017-12-22 2019-07-16 百度在线网络技术(北京)有限公司 Generation method and device, the computer equipment and readable medium of books book review
CN108280688A (en) * 2018-01-29 2018-07-13 京东方科技集团股份有限公司 The comment information analysis method of object, apparatus and system
US11017175B2 (en) 2018-01-29 2021-05-25 Boe Technology Group Co., Ltd. Method, device and system for analyzing comment data about target
CN108681977A (en) * 2018-03-27 2018-10-19 成都律云科技有限公司 A kind of lawyer's information processing method and system
CN108681977B (en) * 2018-03-27 2022-05-31 成都律云科技有限公司 Lawyer information processing method and system
CN108550380A (en) * 2018-04-12 2018-09-18 北京深度智耀科技有限公司 A kind of drug safety information monitoring method and device based on public network
CN110399547A (en) * 2018-04-17 2019-11-01 百度在线网络技术(北京)有限公司 For updating the method, apparatus, equipment and storage medium of model parameter
WO2019214236A1 (en) * 2018-05-11 2019-11-14 北京三快在线科技有限公司 User-generated content summary determining and user-generated content recommending
CN110597978B (en) * 2018-06-12 2023-12-08 北京京东尚科信息技术有限公司 Article abstract generation method, system, electronic equipment and readable storage medium
CN110597978A (en) * 2018-06-12 2019-12-20 北京京东尚科信息技术有限公司 Article abstract generation method and system, electronic equipment and readable storage medium
CN110704605A (en) * 2018-06-25 2020-01-17 北京京东尚科信息技术有限公司 Method, system and equipment for automatically generating article abstract and readable storage medium
CN109035074A (en) * 2018-06-29 2018-12-18 贵安新区搜床科技有限公司 A kind of property method of state management, terminal device and computer readable storage medium
CN110019814A (en) * 2018-07-09 2019-07-16 暨南大学 A kind of news information polymerization based on data mining and deep learning
CN110019814B (en) * 2018-07-09 2021-07-27 暨南大学 News information aggregation method based on data mining and deep learning
CN109388804A (en) * 2018-10-22 2019-02-26 平安科技(深圳)有限公司 Report core views extracting method and device are ground using the security of deep learning model
CN109284504A (en) * 2018-10-22 2019-01-29 平安科技(深圳)有限公司 It grinds to call the score using the security of deep learning model and analyses method and device
CN109683946B (en) * 2018-12-13 2021-12-03 南开大学 User comment recommendation method based on code cloning technology
CN109683946A (en) * 2018-12-13 2019-04-26 南开大学 A kind of user comment recommended method based on Code Clones technology
CN109684473A (en) * 2018-12-28 2019-04-26 丹翰智能科技(上海)有限公司 A kind of automatic bulletin generation method and system
US11514242B2 (en) 2019-08-10 2022-11-29 Chongqing Sizai Information Technology Co., Ltd. Method for automatically summarizing internet web page and text information
TWI772709B (en) * 2019-11-14 2022-08-01 雲拓科技有限公司 Automatic claim-element-noun-and-position-thereof obtaining equipment for no-space text
CN111046252A (en) * 2019-11-20 2020-04-21 北京字节跳动网络技术有限公司 Information processing method, device, medium, electronic equipment and system
CN111199151A (en) * 2019-12-31 2020-05-26 联想(北京)有限公司 Data processing method and data processing device
CN112883145A (en) * 2020-12-24 2021-06-01 浙江万里学院 Emotion multi-tendency classification method for Chinese comments
CN112667812A (en) * 2020-12-30 2021-04-16 云南电网有限责任公司 Method for identifying power supply service customer electricity quantity and electricity charge demand
CN112818660A (en) * 2021-01-26 2021-05-18 山西三友和智慧信息技术股份有限公司 Product description generation method based on user evaluation
CN115618852A (en) * 2022-11-22 2023-01-17 山东天成书业有限公司 Text digital automatic proofreading system

Similar Documents

Publication Publication Date Title
CN101667194A (en) Automatic abstracting method and system based on user comment text feature
CN110008311B (en) Product information safety risk monitoring method based on semantic analysis
Conrad et al. Opinion mining in legal blogs
Annett et al. A comparison of sentiment analysis techniques: Polarizing movie blogs
Tripathi et al. Feature selection and classification approach for sentiment analysis
US7912849B2 (en) Method for determining contextual summary information across documents
CN109299865B (en) Psychological evaluation system and method based on semantic analysis and information data processing terminal
Bhonde et al. Sentiment analysis based on dictionary approach
Sharoff In the garden and in the jungle: Comparing genres in the BNC and Internet
Wang et al. Customer-driven product design selection using web based user-generated content
Avasthi et al. Techniques, applications, and issues in mining large-scale text databases
Lo et al. A review of opinion mining and sentiment classification framework in social networks
Akther et al. Compilation, analysis and application of a comprehensive Bangla Corpus KUMono
Shah et al. An automatic text summarization on Naive Bayes classifier using latent semantic analysis
Jayawickrama et al. Seeking sinhala sentiment: Predicting facebook reactions of sinhala posts
Jardim et al. A Multilingual Lexicon-based Approach for Sentiment Analysis in Social and Cultural Information System Data
Setiawan et al. Implementation of Decision Tree C4. 5 for Big Five Personality Predictions with TF-RF and TF-CHI2 on Social Media Twitter
Ho-Dac et al. Exploring Wikipedia talk pages for conflict detection
Cheng et al. A model for age and gender profiling of social media accounts based on post contents
Kuzár Clustering on social web
Kalaiarasu et al. Sentiment analysis using improved novel convolutional neural network (SNCNN)
Makkar et al. Detecting Medical Reviews Using Sentiment Analysis
Dziczkowski et al. Social network-an autonomous system designed for radio recommendation
Rojas-Simon et al. Background of the ETS
Jayawickrama et al. Facebook for Sentiment Analysis: Baseline Models to Predict Facebook Reactions of Sinhala Posts

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Open date: 20100310