CN102254038A - System and method for analyzing network comment relevance - Google Patents
System and method for analyzing network comment relevance Download PDFInfo
- Publication number
- CN102254038A CN102254038A CN 201110229617 CN201110229617A CN102254038A CN 102254038 A CN102254038 A CN 102254038A CN 201110229617 CN201110229617 CN 201110229617 CN 201110229617 A CN201110229617 A CN 201110229617A CN 102254038 A CN102254038 A CN 102254038A
- Authority
- CN
- China
- Prior art keywords
- degree
- comment
- correlation
- network comment
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention relates to a method for analyzing relevance between subject contents and comment contents thereof, which can be used for qualitatively distinguishing whether the comment contents are spam comments and can also be used for quantitatively analyzing the relevance of the comment contents so as to obtain a specific relevance value between 0 and 1, wherein the numerical value is bigger and the relevance is larger, and the relevance between comments and the article can be analyzed according to the relevance value. The invention has one advantage that two aspects of contents of the similarity between the comments and the subject article and the relevance among the comments are comprehensively considered, thus the relevance is analyzed more accurately. A system in the invention is based on a browser mode, is convenient for analyzing and using, and has a friendly interface.
Description
Technical field
The invention belongs to text processing/data mining field, internet, relate between numerous comment contents of utilizing theme article in the data mining correlation technique phase-split network and proposing at these articles aspects such as the degree of correlation.Specifically comprised and utilized vector space model, probability model and language model carry out at theme article and comment content similarity analysis, and the analysis of correlativity between the comment content itself.
Background technology
The Web2.0 epoch are burgeoning epoch of information, and netizens can free make various comments at diverse network news and blog, and these comment data had reached the scale of a magnanimity in recent years.At these data the research of many data minings aspect has been arranged now, extracted and the emotion analysis, the integration of user comment and abstract etc. as user comment.In numerous research fields, whether current to have a research focus to discern comment exactly relevant with theme, i.e. whether comment belongs to rubbish is commented on, and it helps people's better utilization to comment on resource.In research work at present, this identification generally all only is identification qualitatively: uncorrelated then is the rubbish comment, otherwise is that non-rubbish is commented on.In fact, rubbish comment and non-rubbish comment not significantly boundary before are so this identification qualitatively is often fuzzyyer.In addition, even be all non-rubbish comment, their value also often is not quite similar.So far, these deficiencies are not also considered by the general Study personnel more than.
Current, the review information research work mainly concentrates on the User Perspective that utilizes in natural language processing technique and data mining technology extraction and the summary user comment data, and promptly the suggestion in the comment is excavated (positive or negative) field.Specifically comprise: in user comment, sum up some function of this product and user these functional point of view at a certain product, discern the words and phrases that can reflect User Perspective in each comment, the user's emotion that comprises of discerning each comment is positive or passiveness etc.Aspect research comment presenter's individual character and behavior, also at the startup at present and obtain certain achievement in research, similarly also relevant for the research on the comment content reliability.
But in rubbish comment identification still is blank out basically, existingly now also only rests on the kind of the variety of issue of describing the rubbish comment and analysis and the comment of fixed rubbish about the research work in the rubbish comment identification on a small quantity.
Summary of the invention
The present invention is directed to the present situation that lacks review information degree of correlation analysis tool in the existing internet, the degree of correlation analytic system of a kind of network comment and its theme is provided.
For solving the problems of the technologies described above, the degree of correlation analytic system of network comment of the present invention and its theme is characterized in that, comprises webcrawler module, degree of correlation analysis module, and the web page display module,
Described webcrawler module is used for the content of text of intercepting page, generates the data acquisition of theme article and some related commentary, the described degree of correlation analysis module analyzing and processing of described data acquisition confession;
Described degree of correlation analysis module is used for the degree of correlation between quantitative Analysis network comment and the theme article;
Described web page display module is used for network comment degree of correlation result that degree of correlation analysis module is calculated, shows with form web page output.
Described degree of correlation analysis module comprises:
First device, being used for the overall network comment is node, generates non-directed graph;
Second device is used to calculate the similarity between certain network comment and the theme article;
The 3rd device is used to calculate the similarity that the described network comment of second device is adjacent the network comment that node refers to;
The 4th device, be used for the network comment that calculates according to second device and the similarity between the theme article, and the 3rd the network comment that calculates of device be adjacent the similarity of the network comment that node refers to, calculate the degree of correlation between this network comment and the theme article.
Described degree of correlation analysis module also comprises:
Step device is used to select the next network comment of not making relatedness computation, does not if there is no do the network comment of relatedness computation, then returns null value;
Call control device, be used for the selected network comment of described step device as input, and judge whether to be input as null value,
If non-NULL, then call described second, third, the 4th device, calculate the degree of correlation between current network comment and the theme article; Turn back to step device;
If be null value, then stop.
Also comprise degree of correlation judge module, be used for certain network comment and the degree of correlation between the theme article and the size between the preset threshold that more described degree of correlation analysis module calculates; When the described degree of correlation during less than preset threshold, then described web page display module show this network comment for and the irrelevant comment of theme article.
The present invention has proposed a kind of analytical approach of system of the above-mentioned phase-split network comment degree of correlation simultaneously, it is characterized in that, may further comprise the steps:
The content of text of described webcrawler module intercepting page, the data acquisition of generation theme article and some related commentary, described data acquisition is issued described degree of correlation analysis module;
The degree of correlation between described degree of correlation analysis module quantitative Analysis network comment and the theme article;
The network comment degree of correlation result that described web page display module calculates degree of correlation analysis module shows with form web page output.
The degree of correlation between described degree of correlation analysis module quantitative Analysis network comment and the theme article may further comprise the steps:
Step 5-1, calculate the similarity between certain network comment and the theme article;
Step 5-2, be node, generate non-directed graph with the overall network comment;
Step 5-3, calculate the similarity that described network comment is adjacent the network comment that node refers to;
Step 5-4, according to the similarity between this network comment and the theme article, and network comment is adjacent the similarity of the network comment that node refers to, and calculates the degree of correlation between this network comment and the theme article.
The degree of correlation between described degree of correlation analysis module quantitative Analysis network comment and the theme article, further comprising the steps of:
The network comment of relatedness computation is not if there is no done in step 6-1, the next network comment of not making relatedness computation of selection, then returns null value;
Step 6-2, with the selected network comment of described step device as input, and judge whether to be input as null value,
If non-NULL then returns step 5-2,5-3,5-4; Then, turn back to step 6-1;
If be null value, then stop.
Further comprising the steps of:
Certain network comment that calculates according to described degree of correlation analysis module and the degree of correlation between the theme article, when the described degree of correlation during less than preset threshold, then described web page display module show this network comment for and the irrelevant comment of theme article.
The present invention analyzes and draws a concrete relevance degree between 0 to 1 by comment content degree of correlation is carried out quantitative analysis, and the big more then degree of correlation of numerical value is big more, can analyze close and distant relation between comment and this article according to relevance degree.A remarkable advantage of the present invention is that core analysis has partly been taken all factors into consideration the similarity between comment and the theme article, and two the aspect contents of correlativity between the comment, so the analysis of the degree of correlation is more accurate.System of the present invention is based on browser model, analyze easy to use, friendly interface.
Description of drawings
Below in conjunction with the drawings and specific embodiments technical scheme of the present invention is further described in detail.
Fig. 1 is the structured flowchart of the system of the phase-split network comment degree of correlation of the present invention.
Fig. 2 is comment network node figure of the present invention.
Embodiment
The structured flowchart of system as shown in Figure 1, the system of the phase-split network comment degree of correlation of the present invention comprises webcrawler module, degree of correlation analysis module, the web page display module,
Described webcrawler module is used for the content of text of intercepting page, generates the data acquisition of theme article and some related commentary, the described degree of correlation analysis module analyzing and processing of described data acquisition confession;
Described degree of correlation analysis module is used for the degree of correlation between quantitative Analysis network comment and the theme article; The present invention is from the degree of correlation of quantitative angle analysis comment with article, and the degree of correlation here is a linear variable, can be numerical value arbitrarily from 0 to 1 the range of linearity.The degree of correlation that different comment contents calculates tends to difference: the relevance of high more then this comment content of numerical value and theme article is strong more, can clearly represent viewpoint and the view of reviewer for the article theme more; Otherwise the low more practical function that then should comment on of numerical value is worth more little.According to the difference of actual demand, when the degree of correlation is lower than some setting threshold values, can think that this comment is the rubbish comment.Analyzing the comment content with the process of the theme article degree of correlation is: extract (the comment 1 of the theme article A comment content corresponding with it, comment 2, comment 3, comment n), calculate A and the similarity K that comments on 1-n, calculate degree of correlation L between comment and the comment, comprehensive K and L add up by certain weight and draw the degree of correlation P of A and all comments
i(i can be 1 to n between arbitrary integer).P
iBe the relevance degree that finally draws.Can a given degree of correlation threshold values by actual demand, if being lower than this threshold values, any degree of correlation can be judged to be useless comment, P
iHighly more reflect that then this comment is relevant more with theme.This degree of correlation analysis module has different characteristics with present all technology: not only with the statement similarity between comment text and the theme article as the factor that influences the degree of correlation, also the internal relation between numerous comment texts is also taken into account simultaneously.Its core concept is: if one section comment content and those analyzed and the theme article have between the comment of the high degree of correlation and have higher similarity, then this section comment should also have the higher degree of correlation with this theme article, and the degree of correlation of this section comment immediately and theme article is not high.
Described web page display module is used for network comment degree of correlation result that degree of correlation analysis module is calculated, shows with form web page output.The web page display module provides a user oriented interface, is mainly used in by customer requirements to show result, comprises that the degree of correlation of all comment content correspondences is showed, the positive backward of the degree of correlation is arranged functions such as displaying.The result of this module after with degree of correlation analysis module analyzing and processing is organized into the intelligible data structure of user, is shown to user interface with the form of webpage.
Holistic approach method of the present invention is: utilize theme and the comment data collection of web crawlers acquisition module collection at the particular webpage content by demand, afterwards this data set is submitted to degree of correlation analysis module and carries out degree of correlation analysis, at last analysis result is delivered to the web page display module and on web browser, plays frame displaying analysis result by the actual functional capability demand.
Webcrawler module mainly includes but not limited to the website selection mainly based on general web crawlers technique construction, and content of text is selected, and data grasp, the back-end data administrative section.Webcrawler module is an independently necessary preposition module, be mainly used in the content of text intercepting of the webpage that the user browses, and the theme body matter in these texts extracted with the form of comment content by 1:N (corresponding many of one section theme text is commented on contents), be organized into certain data set again for subsequent analysis, as: theme: XXXXX---comment 1:XXX, the form of comment 2:XXX comment 3:XXX.
Degree of correlation analysis module is a nucleus module of the present invention.This module realizes the degree of correlation between the theme article and review information in the automatic phase-split network information, and relevance degree high more then reflection between this comment and the theme text agrees with more, on the contrary then for irrelevant comment, as advertising message, occupy-place information etc.This module mainly is divided into two parts: theme-comment and analysis part, comment-comment and analysis part.Wherein theme-comment and analysis part is mainly considered the degree of correlation between theme text and the review information, and criterion is mainly the vocabulary similarity, aspects such as text repetition rate.Comment-comment part is mainly then mainly analyzed the calculating of the degree of correlation of similarity between all review information.The analysis result of comprehensive two parts can draw final degree of correlation analysis result.
The core analysis model generalization of the present invention's representative has been considered following two factors: similarity between comment and main body article, the internal relation between the comment will provide the implementation procedure of analytical calculation below in order.
(1) calculates similarity between comment and the quilt comment theme
The present invention adopts probabilistic language model to calculate similarity between comment and main body article, and for any one comment R and quilt comment theme article A, definition Sim (R|A) is the similarity between R and the A, can obtain from following formula:
Wherein P (R|A) represents the probability from R to A, and w is an occurring words among the R, c (w, the R) number of times of representing w to occur in R, the frequency probability that on behalf of w, P (w|A) occur in A.
Available maximum likelihood estimate (MLE) calculates P (w|A):
Wherein | A| is all word sums that occur among the A.This method has certain defective, shows that mainly if word w does not have the explicit A of appearing at, then P (w|A) directly can get null value.Do not have fully under the situation of identical word at R and A, the similarity of R and A will be judged as zero.
For the consideration that solves the null value problem, the present invention adopts a kind of improved method: the Jelinek-Mercer smoothing method, and as a typical linear interpolation smoothing method, computing method are as follows:
P(w|A)=λP
ML(w|A)+(1-λ)P(w|C)
The probability that occurs in corpus C for word w of P (w|C) wherein, λ is a smoothing factor.As preferably, the value that the present invention gets its λ is 0.2.
Be the potential error of avoiding producing because of comment length, the present invention also introduces a length normalization method method and comes the standard original probability:
Wherein len (R) is the length of R, the total speech number that promptly comprises among the R.
(2) the overall calculation comment and the article degree of correlation
The present invention will be considered as a kind of network node figure at the set of all comments of concrete theme article, i.e. the set of all comments is modeled as a graphic structure, as shown in Figure 2:
By calculating cosine similarity between the comment non-directed graph between can obtaining commenting on, each node represent and is aly commented on the cosine similarity between the weight representation node of internodal line among this figure.Node is successively by subscript mark R
1, R
2, R
3, R
4, R
5..., R
nKnow.
Based on above setting, can think that each node contains a correlation, this value can influence the correlation of node on every side of this node, gets node R among the figure arbitrarily
i, R
i∈ (R
1, R
n), following formula is arranged:
Pertinence (R
i) representative comment R
iAnd the degree of correlation between the theme article A, adj[R
i] represent all and comment on R
iThe set of adjacent comment node, R
jBe set adj[R
i] middle comment.W (R
j, R
i) for commenting on R
jWith comment R
iBetween similarity.
Above-mentioned formula is only considered the influence that concerns between the comment, the formula (1) of comprehensive step 1 and the formula (2) of step 2, and last overall calculation comment is as follows with the formula of the theme article degree of correlation:
This formula left half depends on formula (1), the right depends on formula (2), the comment relevance degree partly depends on the similarity of commenting between the theme article, partly depend on the association between the comment, so d represents trade-off value between the two in the following formula, any value between can getting 0 to 1 by actual conditions, the native system acquiescence is got d=0.7.
Provide the treatment scheme of review information analyzing and processing component system at last: may further comprise the steps:
The content of text of webcrawler module intercepting page, the data acquisition of generation theme article and some related commentary, described data acquisition is issued described degree of correlation analysis module;
Calculate the similarity between certain network comment and the theme article;
Steps A, be node, generate non-directed graph with the overall network comment;
Calculate the similarity that described network comment is adjacent the network comment that node refers to;
According to the similarity between this network comment and the theme article, and network comment is adjacent the similarity of the network comment that node refers to, and calculates the degree of correlation between this network comment and the theme article.
Select the next network comment of not making relatedness computation, if there is no do not do the network comment of relatedness computation, then return null value;
The selected network comment of described step device as input, and is judged whether to be input as null value,
If non-NULL then returns steps A; If be null value, then stop.
The network comment degree of correlation result that described web page display module calculates degree of correlation analysis module shows with form web page output.
Certain network comment that calculates according to described degree of correlation analysis module and the degree of correlation between the theme article, when the described degree of correlation during less than preset threshold, then described web page display module show this network comment for and the irrelevant comment of theme article.
It should be noted last that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that, can make amendment or be equal to replacement technical scheme of the present invention, and not breaking away from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.
Claims (8)
1. the system of a phase-split network comment degree of correlation is characterized in that, comprises webcrawler module, degree of correlation analysis module, and the web page display module,
Described webcrawler module is used for the content of text of intercepting page, generates the data acquisition of theme article and some related commentary, the described degree of correlation analysis module analyzing and processing of described data acquisition confession;
Described degree of correlation analysis module is used for the degree of correlation between quantitative Analysis network comment and the theme article;
Described web page display module is used for network comment degree of correlation result that degree of correlation analysis module is calculated, shows with form web page output.
2. the system of the phase-split network comment degree of correlation according to claim 1 is characterized in that described degree of correlation analysis module comprises:
First device, being used for the overall network comment is node, generates non-directed graph;
Second device is used to calculate the similarity between certain network comment and the theme article;
The 3rd device is used to calculate the similarity that the described network comment of second device is adjacent the network comment that node refers to;
The 4th device, be used for the network comment that calculates according to second device and the similarity between the theme article, and the 3rd the network comment that calculates of device be adjacent the similarity of the network comment that node refers to, calculate the degree of correlation between this network comment and the theme article.
3. the system of the phase-split network comment degree of correlation according to claim 2 is characterized in that described degree of correlation analysis module also comprises:
Step device is used to select the next network comment of not making relatedness computation, does not if there is no do the network comment of relatedness computation, then returns null value;
Call control device, be used for the selected network comment of described step device as input, and judge whether to be input as null value,
If non-NULL, then call described second, third, the 4th device, calculate the degree of correlation between current network comment and the theme article; Turn back to step device;
If be null value, then stop.
4. according to the system of the claim 1 or the 2 or 3 described phase-split networks comment degrees of correlation, it is characterized in that, also comprise degree of correlation judge module, be used for certain network comment and the degree of correlation between the theme article and the size between the preset threshold that more described degree of correlation analysis module calculates; When the described degree of correlation during less than preset threshold, then described web page display module show this network comment for and the irrelevant comment of theme article.
5. the analytical approach of the system of the described phase-split network comment of claim 1 degree of correlation is characterized in that, may further comprise the steps:
The content of text of described webcrawler module intercepting page, the data acquisition of generation theme article and some related commentary, described data acquisition is issued described degree of correlation analysis module;
The degree of correlation between described degree of correlation analysis module quantitative Analysis network comment and the theme article;
The network comment degree of correlation result that described web page display module calculates degree of correlation analysis module shows with form web page output.
6. comment on the analytical approach of the system of the degree of correlation according to the described phase-split network of claim 5, it is characterized in that the degree of correlation between described degree of correlation analysis module quantitative Analysis network comment and the theme article may further comprise the steps:
Step 5-1, calculate the similarity between certain network comment and the theme article;
Step 5-2, be node, generate non-directed graph with the overall network comment;
Step 5-3, calculate the similarity that described network comment is adjacent the network comment that node refers to;
Step 5-4, according to the similarity between this network comment and the theme article, and network comment is adjacent the similarity of the network comment that node refers to, and calculates the degree of correlation between this network comment and the theme article.
7. comment on the analytical approach of the system of the degree of correlation according to the described phase-split network of claim 6, it is characterized in that, the degree of correlation between described degree of correlation analysis module quantitative Analysis network comment and the theme article, further comprising the steps of:
The network comment of relatedness computation is not if there is no done in step 6-1, the next network comment of not making relatedness computation of selection, then returns null value;
Step 6-2, with the selected network comment of described step device as input, and judge whether to be input as null value,
If non-NULL then returns step 5-2,5-3,5-4; Then, turn back to step 6-1;
If be null value, then stop.
8. according to the analytical approach of the system of the claim 5 or the 6 or 7 described phase-split networks comment degrees of correlation, it is characterized in that, further comprising the steps of:
Certain network comment that calculates according to described degree of correlation analysis module and the degree of correlation between the theme article, when the described degree of correlation during less than preset threshold, then described web page display module show this network comment for and the irrelevant comment of theme article.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110229617 CN102254038B (en) | 2011-08-11 | 2011-08-11 | System and method for analyzing network comment relevance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110229617 CN102254038B (en) | 2011-08-11 | 2011-08-11 | System and method for analyzing network comment relevance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102254038A true CN102254038A (en) | 2011-11-23 |
CN102254038B CN102254038B (en) | 2013-01-23 |
Family
ID=44981302
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110229617 Active CN102254038B (en) | 2011-08-11 | 2011-08-11 | System and method for analyzing network comment relevance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102254038B (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682120A (en) * | 2012-05-15 | 2012-09-19 | 合一网络技术(北京)有限公司 | Method,device and system for acquiring essential article commented on network |
CN102915501A (en) * | 2012-10-29 | 2013-02-06 | 江苏乐买到网络科技有限公司 | Method for optimizing online shopping evaluating information |
CN103020482A (en) * | 2013-01-05 | 2013-04-03 | 南京邮电大学 | Relation-based spam comment detection method |
CN103577542A (en) * | 2013-10-10 | 2014-02-12 | 北京智谷睿拓技术服务有限公司 | Ranking fraud detection method and ranking fraud detection system of application program |
CN103745001A (en) * | 2014-01-24 | 2014-04-23 | 福州大学 | System for detecting reviewers of negative comments on products |
CN105279146A (en) * | 2014-06-30 | 2016-01-27 | 邻客音公司 | Context-aware approach to detection of short irrelevant texts |
CN105975487A (en) * | 2016-04-26 | 2016-09-28 | 昆明理工大学 | Method for judging correlativity of user comments of APP software |
CN106055664A (en) * | 2016-06-03 | 2016-10-26 | 腾讯科技(深圳)有限公司 | Method and system for filtering UGC (User Generated Content) spam based on user comments |
CN106485507A (en) * | 2015-09-01 | 2017-03-08 | 阿里巴巴集团控股有限公司 | A kind of software promotes the detection method of cheating, apparatus and system |
US9779074B2 (en) | 2013-12-20 | 2017-10-03 | International Business Machines Corporation | Relevancy of communications about unstructured information |
CN107491491A (en) * | 2017-07-20 | 2017-12-19 | 西南财经大学 | A kind of media article for adapting to user interest change recommends method |
CN107704941A (en) * | 2016-08-08 | 2018-02-16 | 华为软件技术有限公司 | A kind of method and device for showing goods review |
CN109618236A (en) * | 2018-12-13 | 2019-04-12 | 连尚(新昌)网络科技有限公司 | Video comments treating method and apparatus |
CN109857838A (en) * | 2019-02-12 | 2019-06-07 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN109885676A (en) * | 2019-02-26 | 2019-06-14 | 苏州华盖信息科技有限公司 | Generation method, big data system and the storage medium of housed device report |
CN111382563A (en) * | 2020-03-20 | 2020-07-07 | 腾讯科技(深圳)有限公司 | Text relevance determining method and device |
US11120218B2 (en) | 2019-06-13 | 2021-09-14 | International Business Machines Corporation | Matching bias and relevancy in reviews with artificial intelligence |
CN110287977B (en) * | 2018-03-19 | 2021-09-21 | 阿里巴巴(中国)有限公司 | Content clustering method and device |
CN113656580A (en) * | 2021-08-12 | 2021-11-16 | 北京锐安科技有限公司 | Method, device, equipment and medium for identifying spam comments |
CN114385902A (en) * | 2020-10-22 | 2022-04-22 | 腾讯科技(深圳)有限公司 | Content recommendation method and device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080215561A1 (en) * | 2007-03-01 | 2008-09-04 | Microsoft Corporation | Scoring relevance of a document based on image text |
CN101639856A (en) * | 2009-09-11 | 2010-02-03 | 清华大学 | Webpage correlation evaluation device for detecting internet information spreading |
JP2010067005A (en) * | 2008-09-10 | 2010-03-25 | Yahoo Japan Corp | Retrieval device, and method of controlling the same |
CN101694658A (en) * | 2009-10-20 | 2010-04-14 | 浙江大学 | Method for constructing webpage crawler based on repeated removal of news |
-
2011
- 2011-08-11 CN CN 201110229617 patent/CN102254038B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080215561A1 (en) * | 2007-03-01 | 2008-09-04 | Microsoft Corporation | Scoring relevance of a document based on image text |
JP2010067005A (en) * | 2008-09-10 | 2010-03-25 | Yahoo Japan Corp | Retrieval device, and method of controlling the same |
CN101639856A (en) * | 2009-09-11 | 2010-02-03 | 清华大学 | Webpage correlation evaluation device for detecting internet information spreading |
CN101694658A (en) * | 2009-10-20 | 2010-04-14 | 浙江大学 | Method for constructing webpage crawler based on repeated removal of news |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682120A (en) * | 2012-05-15 | 2012-09-19 | 合一网络技术(北京)有限公司 | Method,device and system for acquiring essential article commented on network |
CN102915501A (en) * | 2012-10-29 | 2013-02-06 | 江苏乐买到网络科技有限公司 | Method for optimizing online shopping evaluating information |
CN103020482A (en) * | 2013-01-05 | 2013-04-03 | 南京邮电大学 | Relation-based spam comment detection method |
CN103577542A (en) * | 2013-10-10 | 2014-02-12 | 北京智谷睿拓技术服务有限公司 | Ranking fraud detection method and ranking fraud detection system of application program |
CN103577542B (en) * | 2013-10-10 | 2018-09-25 | 北京智谷睿拓技术服务有限公司 | The ranking fraud detection method and ranking fraud detection system of application program |
US9779074B2 (en) | 2013-12-20 | 2017-10-03 | International Business Machines Corporation | Relevancy of communications about unstructured information |
US9779075B2 (en) | 2013-12-20 | 2017-10-03 | International Business Machines Corporation | Relevancy of communications about unstructured information |
CN103745001A (en) * | 2014-01-24 | 2014-04-23 | 福州大学 | System for detecting reviewers of negative comments on products |
CN103745001B (en) * | 2014-01-24 | 2016-10-05 | 福州大学 | A kind of product comment spam person's detecting system |
CN105279146B (en) * | 2014-06-30 | 2018-06-05 | 微软技术许可有限责任公司 | For the context perception method of the detection of short uncorrelated text |
US10037320B2 (en) | 2014-06-30 | 2018-07-31 | Microsoft Technology Licensing, Llc | Context-aware approach to detection of short irrelevant texts |
CN105279146A (en) * | 2014-06-30 | 2016-01-27 | 邻客音公司 | Context-aware approach to detection of short irrelevant texts |
CN106485507A (en) * | 2015-09-01 | 2017-03-08 | 阿里巴巴集团控股有限公司 | A kind of software promotes the detection method of cheating, apparatus and system |
CN106485507B (en) * | 2015-09-01 | 2019-10-18 | 阿里巴巴集团控股有限公司 | A kind of software promotes the detection method of cheating, apparatus and system |
CN105975487B (en) * | 2016-04-26 | 2019-07-16 | 昆明理工大学 | A kind of APP software users comment pertinence judgment method |
CN105975487A (en) * | 2016-04-26 | 2016-09-28 | 昆明理工大学 | Method for judging correlativity of user comments of APP software |
CN106055664A (en) * | 2016-06-03 | 2016-10-26 | 腾讯科技(深圳)有限公司 | Method and system for filtering UGC (User Generated Content) spam based on user comments |
CN106055664B (en) * | 2016-06-03 | 2019-03-08 | 腾讯科技(深圳)有限公司 | A kind of UGC filtering rubbish contents method and system based on user comment |
CN107704941A (en) * | 2016-08-08 | 2018-02-16 | 华为软件技术有限公司 | A kind of method and device for showing goods review |
CN107491491A (en) * | 2017-07-20 | 2017-12-19 | 西南财经大学 | A kind of media article for adapting to user interest change recommends method |
CN110287977B (en) * | 2018-03-19 | 2021-09-21 | 阿里巴巴(中国)有限公司 | Content clustering method and device |
CN109618236A (en) * | 2018-12-13 | 2019-04-12 | 连尚(新昌)网络科技有限公司 | Video comments treating method and apparatus |
CN109857838A (en) * | 2019-02-12 | 2019-06-07 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN109857838B (en) * | 2019-02-12 | 2021-01-26 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN109885676A (en) * | 2019-02-26 | 2019-06-14 | 苏州华盖信息科技有限公司 | Generation method, big data system and the storage medium of housed device report |
US11120218B2 (en) | 2019-06-13 | 2021-09-14 | International Business Machines Corporation | Matching bias and relevancy in reviews with artificial intelligence |
CN111382563A (en) * | 2020-03-20 | 2020-07-07 | 腾讯科技(深圳)有限公司 | Text relevance determining method and device |
CN111382563B (en) * | 2020-03-20 | 2023-09-08 | 腾讯科技(深圳)有限公司 | Text relevance determining method and device |
CN114385902A (en) * | 2020-10-22 | 2022-04-22 | 腾讯科技(深圳)有限公司 | Content recommendation method and device and storage medium |
CN114385902B (en) * | 2020-10-22 | 2024-01-30 | 腾讯科技(深圳)有限公司 | Content recommendation method, device and storage medium |
CN113656580A (en) * | 2021-08-12 | 2021-11-16 | 北京锐安科技有限公司 | Method, device, equipment and medium for identifying spam comments |
Also Published As
Publication number | Publication date |
---|---|
CN102254038B (en) | 2013-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102254038B (en) | System and method for analyzing network comment relevance | |
US10867256B2 (en) | Method and system to provide related data | |
CN104933027B (en) | A kind of open Chinese entity relation extraction method of utilization dependency analysis | |
CN103678564B (en) | Internet product research system based on data mining | |
CN103023714B (en) | The liveness of topic Network Based and cluster topology analytical system and method | |
CN101127042A (en) | Sensibility classification method based on language model | |
CN107832229A (en) | A kind of system testing case automatic generating method based on NLP | |
CN103870973A (en) | Information push and search method and apparatus based on electronic information keyword extraction | |
CN106776881A (en) | A kind of realm information commending system and method based on microblog | |
US20130031113A1 (en) | Query Parser Derivation Computing Device and Method for Making a Query Parser for Parsing Unstructured Search Queries | |
CN104268200A (en) | Unsupervised named entity semantic disambiguation method based on deep learning | |
US20190171713A1 (en) | Semantic parsing method and apparatus | |
CN105844424A (en) | Product quality problem discovery and risk assessment method based on network comments | |
CN105389389B (en) | A kind of network public-opinion propagation situation medium control analysis method | |
Lloret et al. | A novel concept-level approach for ultra-concise opinion summarization | |
CN103544255A (en) | Text semantic relativity based network public opinion information analysis method | |
CN102141868B (en) | Method for quickly operating information interaction page, input method system and browser plug-in | |
CN101661513A (en) | Detection method of network focus and public sentiment | |
CN102789449B (en) | The method and apparatus that comment text is evaluated | |
CN104881402A (en) | Method and device for analyzing semantic orientation of Chinese network topic comment text | |
CN105677857B (en) | method and device for accurately matching keywords with marketing landing pages | |
CN103049470A (en) | Opinion retrieval method based on emotional relevancy | |
CN104133855A (en) | Smart association method and device for input method | |
CN103853834A (en) | Text structure analysis-based Web document abstract generation method | |
CN101872350A (en) | Web page text extracting method and device thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |