CN102254038A - System and method for analyzing network comment relevance - Google Patents

System and method for analyzing network comment relevance Download PDF

Info

Publication number
CN102254038A
CN102254038A CN 201110229617 CN201110229617A CN102254038A CN 102254038 A CN102254038 A CN 102254038A CN 201110229617 CN201110229617 CN 201110229617 CN 201110229617 A CN201110229617 A CN 201110229617A CN 102254038 A CN102254038 A CN 102254038A
Authority
CN
China
Prior art keywords
degree
comment
correlation
network comment
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201110229617
Other languages
Chinese (zh)
Other versions
CN102254038B (en
Inventor
王君泽
黄本雄
王超
胡广
温杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUHAN ANWEN TECHNOLOGY DEVELOPMENT CO LTD
Original Assignee
WUHAN ANWEN TECHNOLOGY DEVELOPMENT CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUHAN ANWEN TECHNOLOGY DEVELOPMENT CO LTD filed Critical WUHAN ANWEN TECHNOLOGY DEVELOPMENT CO LTD
Priority to CN 201110229617 priority Critical patent/CN102254038B/en
Publication of CN102254038A publication Critical patent/CN102254038A/en
Application granted granted Critical
Publication of CN102254038B publication Critical patent/CN102254038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a method for analyzing relevance between subject contents and comment contents thereof, which can be used for qualitatively distinguishing whether the comment contents are spam comments and can also be used for quantitatively analyzing the relevance of the comment contents so as to obtain a specific relevance value between 0 and 1, wherein the numerical value is bigger and the relevance is larger, and the relevance between comments and the article can be analyzed according to the relevance value. The invention has one advantage that two aspects of contents of the similarity between the comments and the subject article and the relevance among the comments are comprehensively considered, thus the relevance is analyzed more accurately. A system in the invention is based on a browser mode, is convenient for analyzing and using, and has a friendly interface.

Description

A kind of system and analytical approach thereof of the phase-split network comment degree of correlation
Technical field
The invention belongs to text processing/data mining field, internet, relate between numerous comment contents of utilizing theme article in the data mining correlation technique phase-split network and proposing at these articles aspects such as the degree of correlation.Specifically comprised and utilized vector space model, probability model and language model carry out at theme article and comment content similarity analysis, and the analysis of correlativity between the comment content itself.
Background technology
The Web2.0 epoch are burgeoning epoch of information, and netizens can free make various comments at diverse network news and blog, and these comment data had reached the scale of a magnanimity in recent years.At these data the research of many data minings aspect has been arranged now, extracted and the emotion analysis, the integration of user comment and abstract etc. as user comment.In numerous research fields, whether current to have a research focus to discern comment exactly relevant with theme, i.e. whether comment belongs to rubbish is commented on, and it helps people's better utilization to comment on resource.In research work at present, this identification generally all only is identification qualitatively: uncorrelated then is the rubbish comment, otherwise is that non-rubbish is commented on.In fact, rubbish comment and non-rubbish comment not significantly boundary before are so this identification qualitatively is often fuzzyyer.In addition, even be all non-rubbish comment, their value also often is not quite similar.So far, these deficiencies are not also considered by the general Study personnel more than.
Current, the review information research work mainly concentrates on the User Perspective that utilizes in natural language processing technique and data mining technology extraction and the summary user comment data, and promptly the suggestion in the comment is excavated (positive or negative) field.Specifically comprise: in user comment, sum up some function of this product and user these functional point of view at a certain product, discern the words and phrases that can reflect User Perspective in each comment, the user's emotion that comprises of discerning each comment is positive or passiveness etc.Aspect research comment presenter's individual character and behavior, also at the startup at present and obtain certain achievement in research, similarly also relevant for the research on the comment content reliability.
But in rubbish comment identification still is blank out basically, existingly now also only rests on the kind of the variety of issue of describing the rubbish comment and analysis and the comment of fixed rubbish about the research work in the rubbish comment identification on a small quantity.
Summary of the invention
The present invention is directed to the present situation that lacks review information degree of correlation analysis tool in the existing internet, the degree of correlation analytic system of a kind of network comment and its theme is provided.
For solving the problems of the technologies described above, the degree of correlation analytic system of network comment of the present invention and its theme is characterized in that, comprises webcrawler module, degree of correlation analysis module, and the web page display module,
Described webcrawler module is used for the content of text of intercepting page, generates the data acquisition of theme article and some related commentary, the described degree of correlation analysis module analyzing and processing of described data acquisition confession;
Described degree of correlation analysis module is used for the degree of correlation between quantitative Analysis network comment and the theme article;
Described web page display module is used for network comment degree of correlation result that degree of correlation analysis module is calculated, shows with form web page output.
Described degree of correlation analysis module comprises:
First device, being used for the overall network comment is node, generates non-directed graph;
Second device is used to calculate the similarity between certain network comment and the theme article;
The 3rd device is used to calculate the similarity that the described network comment of second device is adjacent the network comment that node refers to;
The 4th device, be used for the network comment that calculates according to second device and the similarity between the theme article, and the 3rd the network comment that calculates of device be adjacent the similarity of the network comment that node refers to, calculate the degree of correlation between this network comment and the theme article.
Described degree of correlation analysis module also comprises:
Step device is used to select the next network comment of not making relatedness computation, does not if there is no do the network comment of relatedness computation, then returns null value;
Call control device, be used for the selected network comment of described step device as input, and judge whether to be input as null value,
If non-NULL, then call described second, third, the 4th device, calculate the degree of correlation between current network comment and the theme article; Turn back to step device;
If be null value, then stop.
Also comprise degree of correlation judge module, be used for certain network comment and the degree of correlation between the theme article and the size between the preset threshold that more described degree of correlation analysis module calculates; When the described degree of correlation during less than preset threshold, then described web page display module show this network comment for and the irrelevant comment of theme article.
The present invention has proposed a kind of analytical approach of system of the above-mentioned phase-split network comment degree of correlation simultaneously, it is characterized in that, may further comprise the steps:
The content of text of described webcrawler module intercepting page, the data acquisition of generation theme article and some related commentary, described data acquisition is issued described degree of correlation analysis module;
The degree of correlation between described degree of correlation analysis module quantitative Analysis network comment and the theme article;
The network comment degree of correlation result that described web page display module calculates degree of correlation analysis module shows with form web page output.
The degree of correlation between described degree of correlation analysis module quantitative Analysis network comment and the theme article may further comprise the steps:
Step 5-1, calculate the similarity between certain network comment and the theme article;
Step 5-2, be node, generate non-directed graph with the overall network comment;
Step 5-3, calculate the similarity that described network comment is adjacent the network comment that node refers to;
Step 5-4, according to the similarity between this network comment and the theme article, and network comment is adjacent the similarity of the network comment that node refers to, and calculates the degree of correlation between this network comment and the theme article.
The degree of correlation between described degree of correlation analysis module quantitative Analysis network comment and the theme article, further comprising the steps of:
The network comment of relatedness computation is not if there is no done in step 6-1, the next network comment of not making relatedness computation of selection, then returns null value;
Step 6-2, with the selected network comment of described step device as input, and judge whether to be input as null value,
If non-NULL then returns step 5-2,5-3,5-4; Then, turn back to step 6-1;
If be null value, then stop.
Further comprising the steps of:
Certain network comment that calculates according to described degree of correlation analysis module and the degree of correlation between the theme article, when the described degree of correlation during less than preset threshold, then described web page display module show this network comment for and the irrelevant comment of theme article.
The present invention analyzes and draws a concrete relevance degree between 0 to 1 by comment content degree of correlation is carried out quantitative analysis, and the big more then degree of correlation of numerical value is big more, can analyze close and distant relation between comment and this article according to relevance degree.A remarkable advantage of the present invention is that core analysis has partly been taken all factors into consideration the similarity between comment and the theme article, and two the aspect contents of correlativity between the comment, so the analysis of the degree of correlation is more accurate.System of the present invention is based on browser model, analyze easy to use, friendly interface.
Description of drawings
Below in conjunction with the drawings and specific embodiments technical scheme of the present invention is further described in detail.
Fig. 1 is the structured flowchart of the system of the phase-split network comment degree of correlation of the present invention.
Fig. 2 is comment network node figure of the present invention.
Embodiment
The structured flowchart of system as shown in Figure 1, the system of the phase-split network comment degree of correlation of the present invention comprises webcrawler module, degree of correlation analysis module, the web page display module,
Described webcrawler module is used for the content of text of intercepting page, generates the data acquisition of theme article and some related commentary, the described degree of correlation analysis module analyzing and processing of described data acquisition confession;
Described degree of correlation analysis module is used for the degree of correlation between quantitative Analysis network comment and the theme article; The present invention is from the degree of correlation of quantitative angle analysis comment with article, and the degree of correlation here is a linear variable, can be numerical value arbitrarily from 0 to 1 the range of linearity.The degree of correlation that different comment contents calculates tends to difference: the relevance of high more then this comment content of numerical value and theme article is strong more, can clearly represent viewpoint and the view of reviewer for the article theme more; Otherwise the low more practical function that then should comment on of numerical value is worth more little.According to the difference of actual demand, when the degree of correlation is lower than some setting threshold values, can think that this comment is the rubbish comment.Analyzing the comment content with the process of the theme article degree of correlation is: extract (the comment 1 of the theme article A comment content corresponding with it, comment 2, comment 3, comment n), calculate A and the similarity K that comments on 1-n, calculate degree of correlation L between comment and the comment, comprehensive K and L add up by certain weight and draw the degree of correlation P of A and all comments i(i can be 1 to n between arbitrary integer).P iBe the relevance degree that finally draws.Can a given degree of correlation threshold values by actual demand, if being lower than this threshold values, any degree of correlation can be judged to be useless comment, P iHighly more reflect that then this comment is relevant more with theme.This degree of correlation analysis module has different characteristics with present all technology: not only with the statement similarity between comment text and the theme article as the factor that influences the degree of correlation, also the internal relation between numerous comment texts is also taken into account simultaneously.Its core concept is: if one section comment content and those analyzed and the theme article have between the comment of the high degree of correlation and have higher similarity, then this section comment should also have the higher degree of correlation with this theme article, and the degree of correlation of this section comment immediately and theme article is not high.
Described web page display module is used for network comment degree of correlation result that degree of correlation analysis module is calculated, shows with form web page output.The web page display module provides a user oriented interface, is mainly used in by customer requirements to show result, comprises that the degree of correlation of all comment content correspondences is showed, the positive backward of the degree of correlation is arranged functions such as displaying.The result of this module after with degree of correlation analysis module analyzing and processing is organized into the intelligible data structure of user, is shown to user interface with the form of webpage.
Holistic approach method of the present invention is: utilize theme and the comment data collection of web crawlers acquisition module collection at the particular webpage content by demand, afterwards this data set is submitted to degree of correlation analysis module and carries out degree of correlation analysis, at last analysis result is delivered to the web page display module and on web browser, plays frame displaying analysis result by the actual functional capability demand.
Webcrawler module mainly includes but not limited to the website selection mainly based on general web crawlers technique construction, and content of text is selected, and data grasp, the back-end data administrative section.Webcrawler module is an independently necessary preposition module, be mainly used in the content of text intercepting of the webpage that the user browses, and the theme body matter in these texts extracted with the form of comment content by 1:N (corresponding many of one section theme text is commented on contents), be organized into certain data set again for subsequent analysis, as: theme: XXXXX---comment 1:XXX, the form of comment 2:XXX comment 3:XXX.
Degree of correlation analysis module is a nucleus module of the present invention.This module realizes the degree of correlation between the theme article and review information in the automatic phase-split network information, and relevance degree high more then reflection between this comment and the theme text agrees with more, on the contrary then for irrelevant comment, as advertising message, occupy-place information etc.This module mainly is divided into two parts: theme-comment and analysis part, comment-comment and analysis part.Wherein theme-comment and analysis part is mainly considered the degree of correlation between theme text and the review information, and criterion is mainly the vocabulary similarity, aspects such as text repetition rate.Comment-comment part is mainly then mainly analyzed the calculating of the degree of correlation of similarity between all review information.The analysis result of comprehensive two parts can draw final degree of correlation analysis result.
The core analysis model generalization of the present invention's representative has been considered following two factors: similarity between comment and main body article, the internal relation between the comment will provide the implementation procedure of analytical calculation below in order.
(1) calculates similarity between comment and the quilt comment theme
The present invention adopts probabilistic language model to calculate similarity between comment and main body article, and for any one comment R and quilt comment theme article A, definition Sim (R|A) is the similarity between R and the A, can obtain from following formula:
Sim ( R , A ) ≈ P ( R | A ) = Π i = 1 n P ( q i | A ) = Π w ∈ R P ( w | A ) c ( w , R ) Formula (1)
Wherein P (R|A) represents the probability from R to A, and w is an occurring words among the R, c (w, the R) number of times of representing w to occur in R, the frequency probability that on behalf of w, P (w|A) occur in A.
Available maximum likelihood estimate (MLE) calculates P (w|A):
P ( w | A ) = P ML ( w | A ) = c ( w , A ) | A |
Wherein | A| is all word sums that occur among the A.This method has certain defective, shows that mainly if word w does not have the explicit A of appearing at, then P (w|A) directly can get null value.Do not have fully under the situation of identical word at R and A, the similarity of R and A will be judged as zero.
For the consideration that solves the null value problem, the present invention adopts a kind of improved method: the Jelinek-Mercer smoothing method, and as a typical linear interpolation smoothing method, computing method are as follows:
P(w|A)=λP ML(w|A)+(1-λ)P(w|C)
The probability that occurs in corpus C for word w of P (w|C) wherein, λ is a smoothing factor.As preferably, the value that the present invention gets its λ is 0.2.
Be the potential error of avoiding producing because of comment length, the present invention also introduces a length normalization method method and comes the standard original probability:
P norm ( Sim ( R , A ) ) ∞ exp ( log Sim ( R , A ) len ( R ) )
Wherein len (R) is the length of R, the total speech number that promptly comprises among the R.
(2) the overall calculation comment and the article degree of correlation
The present invention will be considered as a kind of network node figure at the set of all comments of concrete theme article, i.e. the set of all comments is modeled as a graphic structure, as shown in Figure 2:
By calculating cosine similarity between the comment non-directed graph between can obtaining commenting on, each node represent and is aly commented on the cosine similarity between the weight representation node of internodal line among this figure.Node is successively by subscript mark R 1, R 2, R 3, R 4, R 5..., R nKnow.
Based on above setting, can think that each node contains a correlation, this value can influence the correlation of node on every side of this node, gets node R among the figure arbitrarily i, R i∈ (R 1, R n), following formula is arranged:
Pertinence ( R i ) = Σ R j ∈ adj [ R i ] w ( R j , R i ) Σ R k ∈ adj [ R j ] w ( R i , R k ) Pertinence ( R j ) Formula (2)
Pertinence (R i) representative comment R iAnd the degree of correlation between the theme article A, adj[R i] represent all and comment on R iThe set of adjacent comment node, R jBe set adj[R i] middle comment.W (R j, R i) for commenting on R jWith comment R iBetween similarity.
Above-mentioned formula is only considered the influence that concerns between the comment, the formula (1) of comprehensive step 1 and the formula (2) of step 2, and last overall calculation comment is as follows with the formula of the theme article degree of correlation:
Pertinence ( R i ) = d × sim ( R i , A ) Σ R sim ( R , A ) + ( 1 - d ) [ Σ R j ∈ adj [ R i ] w ( R j , R i ) Σ k ∈ adj [ j ] w ( R i , R k ) Pertinence ( R j ) ]
This formula left half depends on formula (1), the right depends on formula (2), the comment relevance degree partly depends on the similarity of commenting between the theme article, partly depend on the association between the comment, so d represents trade-off value between the two in the following formula, any value between can getting 0 to 1 by actual conditions, the native system acquiescence is got d=0.7.
Provide the treatment scheme of review information analyzing and processing component system at last: may further comprise the steps:
The content of text of webcrawler module intercepting page, the data acquisition of generation theme article and some related commentary, described data acquisition is issued described degree of correlation analysis module;
Calculate the similarity between certain network comment and the theme article;
Steps A, be node, generate non-directed graph with the overall network comment;
Calculate the similarity that described network comment is adjacent the network comment that node refers to;
According to the similarity between this network comment and the theme article, and network comment is adjacent the similarity of the network comment that node refers to, and calculates the degree of correlation between this network comment and the theme article.
Select the next network comment of not making relatedness computation, if there is no do not do the network comment of relatedness computation, then return null value;
The selected network comment of described step device as input, and is judged whether to be input as null value,
If non-NULL then returns steps A; If be null value, then stop.
The network comment degree of correlation result that described web page display module calculates degree of correlation analysis module shows with form web page output.
Certain network comment that calculates according to described degree of correlation analysis module and the degree of correlation between the theme article, when the described degree of correlation during less than preset threshold, then described web page display module show this network comment for and the irrelevant comment of theme article.
It should be noted last that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that, can make amendment or be equal to replacement technical scheme of the present invention, and not breaking away from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims (8)

1. the system of a phase-split network comment degree of correlation is characterized in that, comprises webcrawler module, degree of correlation analysis module, and the web page display module,
Described webcrawler module is used for the content of text of intercepting page, generates the data acquisition of theme article and some related commentary, the described degree of correlation analysis module analyzing and processing of described data acquisition confession;
Described degree of correlation analysis module is used for the degree of correlation between quantitative Analysis network comment and the theme article;
Described web page display module is used for network comment degree of correlation result that degree of correlation analysis module is calculated, shows with form web page output.
2. the system of the phase-split network comment degree of correlation according to claim 1 is characterized in that described degree of correlation analysis module comprises:
First device, being used for the overall network comment is node, generates non-directed graph;
Second device is used to calculate the similarity between certain network comment and the theme article;
The 3rd device is used to calculate the similarity that the described network comment of second device is adjacent the network comment that node refers to;
The 4th device, be used for the network comment that calculates according to second device and the similarity between the theme article, and the 3rd the network comment that calculates of device be adjacent the similarity of the network comment that node refers to, calculate the degree of correlation between this network comment and the theme article.
3. the system of the phase-split network comment degree of correlation according to claim 2 is characterized in that described degree of correlation analysis module also comprises:
Step device is used to select the next network comment of not making relatedness computation, does not if there is no do the network comment of relatedness computation, then returns null value;
Call control device, be used for the selected network comment of described step device as input, and judge whether to be input as null value,
If non-NULL, then call described second, third, the 4th device, calculate the degree of correlation between current network comment and the theme article; Turn back to step device;
If be null value, then stop.
4. according to the system of the claim 1 or the 2 or 3 described phase-split networks comment degrees of correlation, it is characterized in that, also comprise degree of correlation judge module, be used for certain network comment and the degree of correlation between the theme article and the size between the preset threshold that more described degree of correlation analysis module calculates; When the described degree of correlation during less than preset threshold, then described web page display module show this network comment for and the irrelevant comment of theme article.
5. the analytical approach of the system of the described phase-split network comment of claim 1 degree of correlation is characterized in that, may further comprise the steps:
The content of text of described webcrawler module intercepting page, the data acquisition of generation theme article and some related commentary, described data acquisition is issued described degree of correlation analysis module;
The degree of correlation between described degree of correlation analysis module quantitative Analysis network comment and the theme article;
The network comment degree of correlation result that described web page display module calculates degree of correlation analysis module shows with form web page output.
6. comment on the analytical approach of the system of the degree of correlation according to the described phase-split network of claim 5, it is characterized in that the degree of correlation between described degree of correlation analysis module quantitative Analysis network comment and the theme article may further comprise the steps:
Step 5-1, calculate the similarity between certain network comment and the theme article;
Step 5-2, be node, generate non-directed graph with the overall network comment;
Step 5-3, calculate the similarity that described network comment is adjacent the network comment that node refers to;
Step 5-4, according to the similarity between this network comment and the theme article, and network comment is adjacent the similarity of the network comment that node refers to, and calculates the degree of correlation between this network comment and the theme article.
7. comment on the analytical approach of the system of the degree of correlation according to the described phase-split network of claim 6, it is characterized in that, the degree of correlation between described degree of correlation analysis module quantitative Analysis network comment and the theme article, further comprising the steps of:
The network comment of relatedness computation is not if there is no done in step 6-1, the next network comment of not making relatedness computation of selection, then returns null value;
Step 6-2, with the selected network comment of described step device as input, and judge whether to be input as null value,
If non-NULL then returns step 5-2,5-3,5-4; Then, turn back to step 6-1;
If be null value, then stop.
8. according to the analytical approach of the system of the claim 5 or the 6 or 7 described phase-split networks comment degrees of correlation, it is characterized in that, further comprising the steps of:
Certain network comment that calculates according to described degree of correlation analysis module and the degree of correlation between the theme article, when the described degree of correlation during less than preset threshold, then described web page display module show this network comment for and the irrelevant comment of theme article.
CN 201110229617 2011-08-11 2011-08-11 System and method for analyzing network comment relevance Active CN102254038B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110229617 CN102254038B (en) 2011-08-11 2011-08-11 System and method for analyzing network comment relevance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110229617 CN102254038B (en) 2011-08-11 2011-08-11 System and method for analyzing network comment relevance

Publications (2)

Publication Number Publication Date
CN102254038A true CN102254038A (en) 2011-11-23
CN102254038B CN102254038B (en) 2013-01-23

Family

ID=44981302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110229617 Active CN102254038B (en) 2011-08-11 2011-08-11 System and method for analyzing network comment relevance

Country Status (1)

Country Link
CN (1) CN102254038B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682120A (en) * 2012-05-15 2012-09-19 合一网络技术(北京)有限公司 Method,device and system for acquiring essential article commented on network
CN102915501A (en) * 2012-10-29 2013-02-06 江苏乐买到网络科技有限公司 Method for optimizing online shopping evaluating information
CN103020482A (en) * 2013-01-05 2013-04-03 南京邮电大学 Relation-based spam comment detection method
CN103577542A (en) * 2013-10-10 2014-02-12 北京智谷睿拓技术服务有限公司 Ranking fraud detection method and ranking fraud detection system of application program
CN103745001A (en) * 2014-01-24 2014-04-23 福州大学 System for detecting reviewers of negative comments on products
CN105279146A (en) * 2014-06-30 2016-01-27 邻客音公司 Context-aware approach to detection of short irrelevant texts
CN105975487A (en) * 2016-04-26 2016-09-28 昆明理工大学 Method for judging correlativity of user comments of APP software
CN106055664A (en) * 2016-06-03 2016-10-26 腾讯科技(深圳)有限公司 Method and system for filtering UGC (User Generated Content) spam based on user comments
CN106485507A (en) * 2015-09-01 2017-03-08 阿里巴巴集团控股有限公司 A kind of software promotes the detection method of cheating, apparatus and system
US9779074B2 (en) 2013-12-20 2017-10-03 International Business Machines Corporation Relevancy of communications about unstructured information
CN107491491A (en) * 2017-07-20 2017-12-19 西南财经大学 A kind of media article for adapting to user interest change recommends method
CN107704941A (en) * 2016-08-08 2018-02-16 华为软件技术有限公司 A kind of method and device for showing goods review
CN109618236A (en) * 2018-12-13 2019-04-12 连尚(新昌)网络科技有限公司 Video comments treating method and apparatus
CN109857838A (en) * 2019-02-12 2019-06-07 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN109885676A (en) * 2019-02-26 2019-06-14 苏州华盖信息科技有限公司 Generation method, big data system and the storage medium of housed device report
CN111382563A (en) * 2020-03-20 2020-07-07 腾讯科技(深圳)有限公司 Text relevance determining method and device
US11120218B2 (en) 2019-06-13 2021-09-14 International Business Machines Corporation Matching bias and relevancy in reviews with artificial intelligence
CN110287977B (en) * 2018-03-19 2021-09-21 阿里巴巴(中国)有限公司 Content clustering method and device
CN113656580A (en) * 2021-08-12 2021-11-16 北京锐安科技有限公司 Method, device, equipment and medium for identifying spam comments
CN114385902A (en) * 2020-10-22 2022-04-22 腾讯科技(深圳)有限公司 Content recommendation method and device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080215561A1 (en) * 2007-03-01 2008-09-04 Microsoft Corporation Scoring relevance of a document based on image text
CN101639856A (en) * 2009-09-11 2010-02-03 清华大学 Webpage correlation evaluation device for detecting internet information spreading
JP2010067005A (en) * 2008-09-10 2010-03-25 Yahoo Japan Corp Retrieval device, and method of controlling the same
CN101694658A (en) * 2009-10-20 2010-04-14 浙江大学 Method for constructing webpage crawler based on repeated removal of news

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080215561A1 (en) * 2007-03-01 2008-09-04 Microsoft Corporation Scoring relevance of a document based on image text
JP2010067005A (en) * 2008-09-10 2010-03-25 Yahoo Japan Corp Retrieval device, and method of controlling the same
CN101639856A (en) * 2009-09-11 2010-02-03 清华大学 Webpage correlation evaluation device for detecting internet information spreading
CN101694658A (en) * 2009-10-20 2010-04-14 浙江大学 Method for constructing webpage crawler based on repeated removal of news

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682120A (en) * 2012-05-15 2012-09-19 合一网络技术(北京)有限公司 Method,device and system for acquiring essential article commented on network
CN102915501A (en) * 2012-10-29 2013-02-06 江苏乐买到网络科技有限公司 Method for optimizing online shopping evaluating information
CN103020482A (en) * 2013-01-05 2013-04-03 南京邮电大学 Relation-based spam comment detection method
CN103577542A (en) * 2013-10-10 2014-02-12 北京智谷睿拓技术服务有限公司 Ranking fraud detection method and ranking fraud detection system of application program
CN103577542B (en) * 2013-10-10 2018-09-25 北京智谷睿拓技术服务有限公司 The ranking fraud detection method and ranking fraud detection system of application program
US9779074B2 (en) 2013-12-20 2017-10-03 International Business Machines Corporation Relevancy of communications about unstructured information
US9779075B2 (en) 2013-12-20 2017-10-03 International Business Machines Corporation Relevancy of communications about unstructured information
CN103745001A (en) * 2014-01-24 2014-04-23 福州大学 System for detecting reviewers of negative comments on products
CN103745001B (en) * 2014-01-24 2016-10-05 福州大学 A kind of product comment spam person's detecting system
CN105279146B (en) * 2014-06-30 2018-06-05 微软技术许可有限责任公司 For the context perception method of the detection of short uncorrelated text
US10037320B2 (en) 2014-06-30 2018-07-31 Microsoft Technology Licensing, Llc Context-aware approach to detection of short irrelevant texts
CN105279146A (en) * 2014-06-30 2016-01-27 邻客音公司 Context-aware approach to detection of short irrelevant texts
CN106485507A (en) * 2015-09-01 2017-03-08 阿里巴巴集团控股有限公司 A kind of software promotes the detection method of cheating, apparatus and system
CN106485507B (en) * 2015-09-01 2019-10-18 阿里巴巴集团控股有限公司 A kind of software promotes the detection method of cheating, apparatus and system
CN105975487B (en) * 2016-04-26 2019-07-16 昆明理工大学 A kind of APP software users comment pertinence judgment method
CN105975487A (en) * 2016-04-26 2016-09-28 昆明理工大学 Method for judging correlativity of user comments of APP software
CN106055664A (en) * 2016-06-03 2016-10-26 腾讯科技(深圳)有限公司 Method and system for filtering UGC (User Generated Content) spam based on user comments
CN106055664B (en) * 2016-06-03 2019-03-08 腾讯科技(深圳)有限公司 A kind of UGC filtering rubbish contents method and system based on user comment
CN107704941A (en) * 2016-08-08 2018-02-16 华为软件技术有限公司 A kind of method and device for showing goods review
CN107491491A (en) * 2017-07-20 2017-12-19 西南财经大学 A kind of media article for adapting to user interest change recommends method
CN110287977B (en) * 2018-03-19 2021-09-21 阿里巴巴(中国)有限公司 Content clustering method and device
CN109618236A (en) * 2018-12-13 2019-04-12 连尚(新昌)网络科技有限公司 Video comments treating method and apparatus
CN109857838A (en) * 2019-02-12 2019-06-07 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN109857838B (en) * 2019-02-12 2021-01-26 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN109885676A (en) * 2019-02-26 2019-06-14 苏州华盖信息科技有限公司 Generation method, big data system and the storage medium of housed device report
US11120218B2 (en) 2019-06-13 2021-09-14 International Business Machines Corporation Matching bias and relevancy in reviews with artificial intelligence
CN111382563A (en) * 2020-03-20 2020-07-07 腾讯科技(深圳)有限公司 Text relevance determining method and device
CN111382563B (en) * 2020-03-20 2023-09-08 腾讯科技(深圳)有限公司 Text relevance determining method and device
CN114385902A (en) * 2020-10-22 2022-04-22 腾讯科技(深圳)有限公司 Content recommendation method and device and storage medium
CN114385902B (en) * 2020-10-22 2024-01-30 腾讯科技(深圳)有限公司 Content recommendation method, device and storage medium
CN113656580A (en) * 2021-08-12 2021-11-16 北京锐安科技有限公司 Method, device, equipment and medium for identifying spam comments

Also Published As

Publication number Publication date
CN102254038B (en) 2013-01-23

Similar Documents

Publication Publication Date Title
CN102254038B (en) System and method for analyzing network comment relevance
US10867256B2 (en) Method and system to provide related data
CN104933027B (en) A kind of open Chinese entity relation extraction method of utilization dependency analysis
CN103678564B (en) Internet product research system based on data mining
CN103023714B (en) The liveness of topic Network Based and cluster topology analytical system and method
CN101127042A (en) Sensibility classification method based on language model
CN107832229A (en) A kind of system testing case automatic generating method based on NLP
CN103870973A (en) Information push and search method and apparatus based on electronic information keyword extraction
CN106776881A (en) A kind of realm information commending system and method based on microblog
US20130031113A1 (en) Query Parser Derivation Computing Device and Method for Making a Query Parser for Parsing Unstructured Search Queries
CN104268200A (en) Unsupervised named entity semantic disambiguation method based on deep learning
US20190171713A1 (en) Semantic parsing method and apparatus
CN105844424A (en) Product quality problem discovery and risk assessment method based on network comments
CN105389389B (en) A kind of network public-opinion propagation situation medium control analysis method
Lloret et al. A novel concept-level approach for ultra-concise opinion summarization
CN103544255A (en) Text semantic relativity based network public opinion information analysis method
CN102141868B (en) Method for quickly operating information interaction page, input method system and browser plug-in
CN101661513A (en) Detection method of network focus and public sentiment
CN102789449B (en) The method and apparatus that comment text is evaluated
CN104881402A (en) Method and device for analyzing semantic orientation of Chinese network topic comment text
CN105677857B (en) method and device for accurately matching keywords with marketing landing pages
CN103049470A (en) Opinion retrieval method based on emotional relevancy
CN104133855A (en) Smart association method and device for input method
CN103853834A (en) Text structure analysis-based Web document abstract generation method
CN101872350A (en) Web page text extracting method and device thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant