CN109948138A - A kind of comment processing method and system - Google Patents

A kind of comment processing method and system Download PDF

Info

Publication number
CN109948138A
CN109948138A CN201711373565.4A CN201711373565A CN109948138A CN 109948138 A CN109948138 A CN 109948138A CN 201711373565 A CN201711373565 A CN 201711373565A CN 109948138 A CN109948138 A CN 109948138A
Authority
CN
China
Prior art keywords
comment
content
point
word
individual character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711373565.4A
Other languages
Chinese (zh)
Other versions
CN109948138B (en
Inventor
杨华涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Youku Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Youku Network Technology Beijing Co Ltd filed Critical Youku Network Technology Beijing Co Ltd
Priority to CN201711373565.4A priority Critical patent/CN109948138B/en
Publication of CN109948138A publication Critical patent/CN109948138A/en
Application granted granted Critical
Publication of CN109948138B publication Critical patent/CN109948138B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application embodiment discloses a kind of comment processing method and system, wherein the described method includes: handling the content of each comment of comment main body, obtains the content point of each comment;Using the mutual-action behavior of each comment of the comment main body, the interaction point of each comment is obtained;Using the temporal information of each comment of the comment main body, the time point of each comment is obtained;According to the content point of each comment, interaction point and time point, the gross mass point of each comment is obtained.The technical program solves the technical issues of high-quality comment is buried.

Description

A kind of comment processing method and system
Technical field
This application involves Internet technical field, in particular to a kind of comment processing method and system.
Background technique
With the rapid development of Internet technology, user carries out various interactions by internet.Such as: Yong Huke To make comments by the comment column below comment video, other users can be interacted the comment in comment area.
Comment is the information of the certain characteristic aspect situations of expression comment main body and user's individual to comment subject emotion.User Comment main body can be understood according to comment content, information friendship can also be carried out for same comment main body with other users Stream.Currently, there is a large amount of comment in comment area, the mode of list is mostly used to be ranked up when commenting on area and showing comment.
A certain item attribute when traditional sort method is most based on comment text is ranked up.Such as: it is sent out according to comment The chronological order of table sorts, according to the mutual momentum of comment according to user class of comment people etc..Due to commenting on ordering rule Unicity so that before comment area several pages there are the comment of the inclined water of a large amount of quality, a large amount of high-quality comment is buried, finally Cause user that can not effectively obtain useful information from existing comment, and affects the interaction between user.
Summary of the invention
The purpose of the application embodiment is to provide a kind of comment processing method and system, solves high-quality comment on and is buried Technical problem.
To achieve the above object, the application embodiment provides a kind of comment processing method, which comprises
The content of each comment of comment main body is handled, the content point of each comment is obtained;
Using the mutual-action behavior of each comment of the comment main body, the interaction point of each comment is obtained;
Using the temporal information of each comment of the comment main body, the time point of each comment is obtained;
According to the content point of each comment, interaction point and time point, the gross mass point of each comment is obtained.
To achieve the above object, the application embodiment also provides a kind of comment processing system, the system comprises: storage Device and processor store computer program in the memory, when the computer program is executed by the processor, realize with Lower function:
The content of each comment of comment main body is handled, the content point of each comment is obtained;
Using the mutual-action behavior of each comment of the comment main body, the interaction point of each comment is obtained;
Using the temporal information of each comment of the comment main body, the time point of each comment is obtained;
According to the content point of each comment, interaction point and time point, the gross mass point of each comment is obtained.
Therefore compared with prior art, technical solution provided by the present application can divide according to gross mass from magnanimity Comment in filter out the comment with premium content.Specifically, gross mass point is divided into base by content point, time point, interaction The various dimensions of plinth are scored, and avoiding ordering rule unicity causes the case where being brushed, while multiple dimensions fit scene Ying Xinggeng high, the accuracy for obtaining high-quality comment improve, and high-quality comment is allowed more to appear comment list head, His user understands comment main body in depth according to the content of high-quality comment, causes user to the interest of comment main body, and improve use Mutual momentum between family.
In addition, the technical program can be avoided effectively by generation the case where brush advertisement.
Detailed description of the invention
It, below will be to embodiment in order to illustrate more clearly of the application embodiment or technical solution in the prior art Or attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only It is some embodiments as described in this application, for those of ordinary skill in the art, in not making the creative labor property Under the premise of, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart for comment processing method that the embodiment of the present application proposes;
Fig. 2 is the curve graph of content point in the present embodiment;
Fig. 3 is the curve graph interacted point in the present embodiment;
Fig. 4 is the curve graph of time point in the present embodiment;
Fig. 5 is a kind of schematic diagram for comment processing system that the embodiment of the present application proposes.
Specific embodiment
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application reality The attached drawing in mode is applied, the technical solution in the application embodiment is clearly and completely described, it is clear that described Embodiment is only a part of embodiment of the application, rather than whole embodiments.Based on the embodiment party in the application Formula, every other embodiment obtained by those of ordinary skill in the art without making creative efforts, is all answered When the range for belonging to the application protection.
In current sequence, good comment content is issued if not the first time updated in comment main body, then It is difficult the position for having an opportunity forward in comment list, a large amount of high-quality comment content is buried.How magnanimity comment In filter out it is high-quality comment be the current technical issues that need to address.
In order to solve problem above, principle of ordering is set, comment is ranked up, so that high-quality comment is in comment list Several pages of front.User very easily can obtain effective information in the high-quality comment in comment area, be promoted in the page of comment main body The reading experience of user can be readability with whole comment content.
Currently, the scheme that the most common comment is sorted in industry are as follows: comment, which is divided into comment warmly, comments two regions with newest.Its In, it comments warmly and makees simple sequence according to interaction total amount, recent reviews are arranged according to time inverted order of making comments.For this routine For scheme, popular sort sections sort according to mutual momentum, and first hair comment advantage is big, occupy comment head for a long time and do not change, Newest time-sequencing part, the comment first delivered, which is quickly sunk to the bottom, has no chance to appear again, and is easy to cause rubbish by brush advertisement Rubbish comment accounting is excessively high, influences the whole quality for commenting on head zone.For the limitation of conventional sequencing schemes, and occur two kinds Common ordering rule makes up the short slab of conventional scheme.The first is according to comment interaction time sequence.The technical solution solution The problem of tradition of having determined is according to comment time and the sequence form solidification of mutual momentum, is not still avoided that by hair the case where brush advertisement It is raw, and comment spam can be by artificial top set.Be for second comment counts method sequence according to mutual momentum.The dimension that sorts is single, Only depend on top, the interaction mode stepped on is ranked up, be still easy the case where brush advertisement and occurred.
The shortcomings that based on the prior art, the application provide a kind of comment processing method, are to calculate model with a comment main body It encloses, step calculates the gross mass point of each comment according to Fig. 1, presses in lists to the comment under entire comment main body It is ranked up from high to low according to gross mass point.The method can be applied to have in the terminal device of data processing function.Institute It states terminal device for example and can be desktop computer, laptop, tablet computer, work station etc..The method may include with Lower step:
S11: the content of each comment of comment main body is handled, the content point of each comment is obtained.
In the present embodiment, each comment for commenting on main body is handled by segmenter, obtains the word sequence of comment content Column.In practice, segmenter can choose open source segmenter, such as word segmenter, IK segmenter.
In the present embodiment, comment emoticon is included in some comment contents.In practice, comment emoticon be by The special identifier symbol of operation personnel's maintenance belongs to the content of high frequency appearance for showing mood, will affect to comment content Word segmentation processing is as a result, it is desirable to remove.Other people contents are quoted or forwarded in some comment contents, content timesharing is being calculated, is needing It will quote or forward them to get rid of the part of content, the comment content part of user oneself.Therefore, before word segmentation processing, The content that the technical program comments on each pre-processes.
Processing is filtered to the sequence of terms of each comment, obtains the individual character word of each comment;According to each The quantity of the individual character word of comment obtains the content point of the comment.Wherein, the individual character word is after comment information filtering is handled Remaining word, for characterizing comment content.
In practice, the comment area for commenting on main body has much inputs meaningless comment content carelessly.Such as: " buy ratio The more good-looking close and numerous U.S. of Real Madrid and care and basic only tubercle bacillus several years substantially not to be gone home to plan carefully carefully A lot of starlets and avoid department according to domestic version.Can in a hurry,.?.'".Under normal conditions, this kind of meaningless Output will use bayesian algorithm and be filtered.But by analysis it is found that this kind of input and meaningless comment content carelessly It generally seldom extracts with characteristic token word string, is difficult to establish input and meaningless phrase data sample carelessly in this way This library eventually leads to the less effective filtered using bayesian algorithm.
Find after study, introduce neural network go calculate word the degree of correlation if, it is just very big a possibility that solve shellfish The insurmountable problem of this algorithm of leaf.In practice, a large amount of content of text data sample is instructed by neural network Practice, establishes identification model.The model can recognize that two words appear in the probability under the same context, this probability is exactly The degree of correlation between two words.For the degree of correlation, the highest degree of correlation is 1, i.e., two words are identical, minimum The degree of correlation is 0, i.e. two words did not all occur simultaneously in arbitrarily trained context, and one calculated in this way is complete Sentence in the degree of correlation between all words determine that the significant probability of sentence must be a value between 0-1.This Be worth higher, significant sentence probability is bigger, is worth lower, it is not intended to adopted sentence probability is bigger, as soon as then a threshold value is set, It can be used to filter meaningless sentence.
In the application, each sequence of terms of comment content to be filtered is input in identification model, if it is led with comment The unrelated word of body appears in same a word, and the context between word is different, causes the degree of correlation of word that will become smaller.Than Such as: with the comment that for the video " discriminate Huan pass " that plays in youku.com, " Real Madrid ", " tubercle bacillus " this kind of word appear in and " discriminate Huan to pass " In content, these words are not identical as the context of the word for " discriminating Huan to pass " comment, so that " Real Madrid ", " tubercle bacillus " this kind of word The degree of correlation between language and other words commented on for " discriminating Huan to pass " becomes smaller, according to all words of a sentence in comment content The significant probability of the sentence that the degree of correlation between language obtains reduces.In practice, this kind of comment is likely to be the intentional Hu of user Disorderly input and for it.So, the technical program filters out this kind of comment.
Based on foregoing description, in the present embodiment, word segmentation processing is passed through into all comments of the comment main body, is obtained every The sequence of terms of item comment.Then, the word in sequence of terms is converted into term vector, and using the term vector as identification mould The input of type is handled by the identification model, first obtains the degree of correlation of word, the identification model is according to the degree of correlation of word It determines the significant probability of sentence, the probability is less than or equal to comment corresponding to threshold value and is filtered out.Such as: " I is one Man you be a woman ", input of the term vector of each word as identification model after the words word segmentation processing, by identifying mould Type processing, obtaining the significant probability of the words is 0.71428573, which is greater than the threshold value of setting.This probability is bigger, Show that the words meets Chinese expression word order, and there is practical significance, is not belonging to the meaningless sentence inputted carelessly.In practical fortune When making, threshold value is arranged according to the actual situation.
In the present embodiment, the sequence of terms of remaining comment is matched with deactivated dictionary after filtering, if in comment Include stop words in appearance, then filters out stop words from the comment.
Stop words can regard a kind of special high frequency words as, can reject when content Mass Calculation, in comment When appearance point calculates, it is not involved in content point and calculates.Deactivated dictionary includes: number, letter, punctuation mark, emoji, function word etc., is stopped Word can also can be obtained with self-defining from open source dictionary.Currently, all included deactivated dictionary of general open source participle.
In the present embodiment, the sequence of terms for filtering out stop words is matched with high frequency dictionary.If in sequence of terms Comprising the word in high frequency dictionary, then this kind of high frequency words are filtered out from the comment.In the present embodiment, high frequency dictionary can lead to The magnanimity comment data sample for crossing different video does a dictionary obtained after participle statistics screening.Specifically, high frequency dictionary is set Meter can be according to the comment data for obtaining 1,000,000 or more in such as youku.com's the whole network station comment database at random, to these comment texts Word frequency statistics are done after word segmentation processing, and then word frequency threshold is set obtain high frequency words, and high frequency words and comment main body are without practical meaning Justice.High frequency words threshold value can be adjusted according to word segmentation result dynamic.Such as: like, video, sofa, advertisement, rubbish, refuels, thanks Deng these words belong to the word in high frequency dictionary.High frequency words belong to noise in comment content, can be when calculating content is divided It weeds out.
In the present embodiment, in a comment, the individual character word repeated is deleted, so that property set of words one by one In be not in identical individual character word.Such as: comment content be " " spending thousand bones " plot is compact, be one it will be appreciated that TV play, be one it will be appreciated that TV play ", after word segmentation processing, the participle of acquisition includes " TV play ", " appreciation ", and this The number of two participles is two, and there is a situation where repeat.In this case, the participle repeated can be deleted Except processing, one is only retained in individual character set of words.By each comment in the individual character word that repeats delete, can be into one Step ground reduces the appearance of homogeneity content, accurately obtains high-quality comment content.
In the present embodiment, the corresponding property set of words one by one of the comment after word segmentation processing, individual character set of words is according to each self-appraisal The issuing time of opinion is ranked up, and is compared between individual character set of words.If one by one in property set of words element and another Element is identical in individual character set of words, and positional relationship of the element in set is identical, then comments on duplicated in area's list Property set of words.The individual character set of words repeated indicates that the content of comment is essentially identical or similar, occurs homogeneity in comment Content.Such case needs to take further measure, guarantees that high-quality comment does not repeat in terms of content.
In the present embodiment, for the individual character set of words repeated, corresponding publication in the individual character set of words that repeats The content point of time earliest comment determines according to individual character word quantity in the individual character set of words, other individual character words repeated The content for gathering corresponding comment, which sets up separately, is set to 0;The comparison result is that the individual character set of words does not duplicate, then comments on Content point determined according to individual character word quantity in the individual character set of words.Ensure that high-quality comment does not repeat in terms of content in this way, it is rich The content of richness comment area's list forefront comment, improves the mutual momentum between user.
In practice, the individual character word quantity of long comment is most, be not long comment is exactly high-quality comment, in order to avoid Treatise comment has comparative advantage for content point, improves the screening accuracy of high-quality comment, needs to be allocated as content upper limit processing. That is: the actual content point of each comment divides threshold value comparison with content, real when actual content point, which is less than or equal to content, divides threshold value Border content is divided into the content point of comment;When actual content point, which is greater than the content, divides threshold value, content divides threshold value to be the interior of comment Hold and divides.
For the technical scheme, in order to obtain more accurate content point, to the individual character word of comment respectively with high-quality word Library, dictionary inferior and shielding dictionary are matched, if the individual character word is high-quality word;Then in the reality for determining each comment Bonus point operation processing is made in content timesharing;If the individual character word is word inferior, in the actual content point for determining each comment When make deduction operation processing, if the individual character word be shielding word, the actual content of the comment is allocated as at clear operation Reason.
In practice, it is to promote the discussion atmosphere of user that high-quality word, word inferior and the purpose for shielding word, which is arranged,.For Different high-quality words, word inferior and shielding word is arranged in different comment main bodys, and commenting in content includes high-quality word, is commented in calculating Consider that high-quality word as bonus point point, can guide the hot spot of comment, improve the participation and response rate of user by content timesharing.Separately Outside, " the anchor point word " comprising comment main body in high-quality word, if a comment content includes anchor point word, mouse clicks anchor point word When, the link about anchor point word is opened.Such behavior also belongs to the mutual-action behavior of comment, if high-quality in comment area's list Commenting on content includes anchor point word, can extend comment content, and the readability of comment content of having extended.
By taking the comment of youku.com's video as an example, according to collection of drama/video dimension, by off-line calculation task, to different subjects Comment sample under collection of drama/video does participle and calculates word frequency (same removal deactivates/high frequency words), determines some individual characteies according to word frequency Hot word of the word as system recommendation, output to artificial operation backstage.
According to the content of homogeneity negative, excessive in the classification of collection of drama/video and system recommendation hot word, some words are defined For " word inferior ", dictionary inferior is constituted.Under collection of drama/video dimension, defined and collection of drama/view according to performer, role, plot etc. Frequency correspondingly suitable word, such as " spending thousand bones ", " Zhao Liying " in " spending thousand bones ", " Si Mayi " in " military counsellor alliance ", " Wu Xiubo " etc., along with content objective, just in system recommendation hot word, defining some words is " high-quality word ", is constituted High-quality dictionary.The individual character word of comment is matched with high-quality dictionary, dictionary inferior respectively, for the high-quality word in comment, as The bonus point item that content calculates score is commented on, for the word inferior in comment, the deduction item of score is calculated as comment content.
For video contents such as part politics, current events, public opinion focuses, relevant " shielding word " can be defined, as comment Content distinguishes zero score item, matches to the individual character word of comment with shielding dictionary, unfavorable comment content is moved to bottom, It avoids the occurrence of in the forefront of comment list, timely pure and fresh network operation environment.
For the technical scheme, in order to obtain more accurate content point, for containing in picture, video, voice Hold, bonus point processing appropriate can be made.In arithmetic operation, individual character is regarded for the picture, video, the voice that include in comment Word is handled, and is calculating content timesharing, this kind of individual character word is respectively provided with different weights, makees bonus point operation processing.Certainly, in order to ensure The rationality and compliance of picture, video and voice needs to add audit function when backstage is runed, for unreasonable legal in comment Picture, video and voice, the gross mass of the comment point is zeroed out operation.This just needs to occupy background server resource, It is arranged according to the actual situation.
Technical staff carries out statistics discovery to a large amount of comment content, less than the comment accounting 72% of 20 words, 20-140 The accounting 26% of a word, accountings 2% more than 140 words.In the present embodiment, short sentence following for 20 words, as to content The additional policy for mentioning smart noise reduction does deduction operation processing in the content timesharing for determining comment.In arithmetic operation, calculate in comment The number of words of appearance, it is poor that the number of words and a number of words threshold value for commenting on content are made, and it is poor to obtain number of words.Use " content point/number of words is poor " to comment The content of content point carries out calculation processing.Such as: a comment " feels that Cao behaviour is really cruelty, murders countless!", here After word segmentation processing, obtaining individual character set of words is (Cao behaviour, cruel, hand is peppery, murders, is countless), according to property word one by one accumulative 3 Divide and calculated, the comment is scored at 15 points at present.In the present embodiment, threshold value 20.Since this comment has 15 Chinese characters, It is poor to make with 20, and number of words difference is 5.Then the final content of the comment is divided into: 15/5=3.Using this algorithm, for short sentence, comment The content of opinion point is reduced to 3 points by original 15 points.Using short sentence in this step number of words threshold value can according to the business form come Definition, the logic that other deductions also can be used replace, such as: the context of the emotional semantic, frequency content of commenting on content is acute Feelings are associated with situation.
In order to obtain more accurate comment content point, other extensions can be done on calculating principle, are not limited to this technology side The content that case is enumerated.The purpose of these extensions is exactly the tail portion for allowing the comment content of no practical significance to sink to comment list, screening Out for the premium content of comment main body.
As shown in Fig. 2, for the curve graph of content point in the present embodiment.According to the comment content point of the technical program design Computational algorithm can obtain the content component curve in Fig. 2 by the fitting of the statistics and result of a large amount of comment datas.For content For component curve, in two-dimensional coordinate system, abscissa indicates individual character word quantity, and ordinate indicates to utilize the technical program proposition The content point that content divides computational algorithm to obtain.
By content component curve it is found that making content component curve when individual character word reaches certain amount, content point there is no with Continue increase.Because the quantity for relying solely on the individual character word of comment calculates comment content point, will lead to lengthy speech but with reality The content for the comment that the comment main body on border is not inconsistent point is very high, in order to avoid treatise comment has comparative advantage for content point, improves The screening accuracy of high-quality comment, the present embodiment are allocated as upper limit processing to the content being calculated according to the case where individual character word.Root It whether is high-quality word, word inferior, shielding word, picture, video, voice etc. according to individual character word, corresponding different weight coefficient executes phase Bonus point, the deduction, clear operation answered, individual character word quantity in a certain range so that with premium content comment with a Property word quantity increase content point also with linearly increasing.In other words, individual character word quantity in a certain range, individual character word Directly proportional linear relationship between quantity and content point.It follows that the technical program is obtaining comment content timesharing computational accuracy Ensured, accurately filter out the comment with premium content, comment spam or dispute comment can sink in comment list Bottom, it is ensured that the fairness of sequence improves the quality of the comment content in comment area's list forefront.
S12: using the mutual-action behavior of each comment of the comment main body, the interaction point of each comment is obtained.
In the present embodiment, mutual-action behavior includes: browsing comment content, shares comment content, beats reward and comment on content, to commenting By the return operation of content, the top of comment content is operated and operation is stepped on to comment content.In addition, including comment in high-quality word " the anchor point word " of main body, if a comment content includes anchor point word, link when mouse clicks anchor point word, about anchor point word It is opened.Such behavior also belongs to the mutual-action behavior of comment, if high-quality comment content includes anchor point word in comment area's list, Comment content can be extended, and the readability of comment content of having extended.
Different mutual-action behaviors can obtain interaction accordingly point, to the return operation of comment content, to comment content Top operation and to comment content step on operation three kinds of mutual-action behaviors for, elaborate interaction point computational algorithm.Content is such as Under:
For the reply point for commenting on content, the reply point obtained for comment content is calculated according to formula (1).It utilizes The precondition of the formula is that there is the reply of other users to happen for current this comment.That is REPLY_COUNT > 0.
REPLY_SCORE=(REPLY_FACTOR* (Math.log (REPLY_COUNT)/Math.log (2))) (1)
In formula (1), REPLY_COUNT indicates that number is replied in the comment of this comment, and REPLY_SCORE indicates that this comments on Reply point, REPLY_FACTOR indicates to reply fraction weight.
For the top point for commenting on content, the top point obtained for comment content is calculated according to formula (2).Utilize the public affairs The precondition of formula is that there is the top operation of other users to occur for current this comment.That is UP_COUNT > 0.
UP_SCORE=(UP_FACTOR* (Math.log (UP_COUNT)/Math.log (2))) (2)
In formula (2), UP_COUNT indicates that the top operand of this comment, UP_SCORE indicate the top point of this comment, UP_FACTOR indicates the top fraction weight of this comment.
For the stepping on point of comment content, is calculated and obtained for commenting on stepping on point for content according to formula (3).Utilize the public affairs The precondition of formula be current this comment there are other users step on operation.That is DOWN_COUNT > 0.
DOWN_SCORE=(DOWN_FACTOR* (Math.log (DOWN_COUNT)/Math.log (2))) (3)
In formula (3), DOWN_COUNT indicates that the operand of stepping on of this comment, DOWN_SCORE indicate stepping on for this comment Point, DOWN_FACTOR indicate this comment step on fraction weight.
It according to the reply of comment point, top point and steps on point, formula (4) is utilized to determine that interaction of comment divides.
INTERACT_SCORE=REPLY_SCORE+UP_SCORE+DOWN_SCORE (4)
Wherein, INTERACT_SCORE indicates the interaction point of comment.
In practice, other extensions can also be done in the calculating that interacts point, such as: comment browsing, comment share, comment Beat the score item of the mutual-action behaviors such as reward.As shown in figure 3, being led to according to the computational algorithm of the comment interaction point of the technical program design The fitting for crossing the statistics and result of a large amount of comment datas can obtain the interaction component curve in Fig. 3.For interaction component curve, In two-dimensional coordinate system, abscissa indicates mutual momentum, which passes through the numbers of various mutual-action behaviors and corresponding weight system Number, which is combined together, to be calculated.Ordinate indicates the interaction for dividing computational algorithm to obtain using the interaction that the technical program proposes Point.By interaction component curve it is found that the interaction component curve of the present embodiment belongs to one kind of logarithmic curve, curve is located at the of coordinate system In one quadrant.Since a comment does not cause the generation of mutual-action behavior, then interaction is divided into 0.So curve is former by coordinate system Point.
The interaction point of the present embodiment is based on the score replied number, serve as a fill-in and step on number calculating, when weight is arranged, replys fraction The value that the value of weight REPLY_FACTOR is all larger than the value of top fraction weight UP_FACTOR, steps on fraction weight DOWN_FACTOR.Push up fraction The value of weight UP_FACTOR is more than or equal to 1, indicates that, when interaction point calculates, top operation belongs to bonus point item.Step on fraction weight DOWN_ The value of FACTOR indicates to step on operation when interaction point calculates and belong to effective interaction behavior, step on fraction weight less than 1 and greater than 0 The value of DOWN_FACTOR is centainly greater than 0.But in order to distinguish the behavior stepped between operation and top operation, step on fraction weight DOWN_ The value of FACTOR is set smaller than 1, and the value of top fraction weight UP_FACTOR is more than or equal to 1.In practice, interaction point uses logarithm It is calculated, so that relatively reasonable ratio is kept between interaction point and content point, it is accurate in the gross mass timesharing for obtaining comment Property is ensured.
S13: using the temporal information of each comment of the comment main body, the time point of each comment is obtained.
In the present embodiment, temporal information includes: to reset the period, when the time is delivered in damped cycle, comment and comment is delivered Between be spaced.Herein, resetting the period indicates that the time point decays to for 1 period, when being the time to be divided into 1, calculates in gross mass point When there is no need to consider that the time divides.In practice, resetting the period can be set to 1 year, 1 month etc..According to actual needs Adjustment.Damped cycle indicates the interval of time point decaying, and interval is shorter, and the effect that the time point is adjusted is more obvious.Damped cycle can To be set as 1 minute, 5 minutes etc..With resetting, the period is the same, adjusts according to actual needs.Time interval expression is delivered in comment The difference of time is delivered in current time and comment.
The computational algorithm of time point is elaborated based on above-mentioned temporal information.Content is as follows:
In the present embodiment, the time point for obtaining comment is calculated using formula (5), the precondition using formula (5) is Comment delivers time interval and is less than the clearing period.
TIME_SCORE=TIME_FACTOR*Math.log ((ZERO_TIME-INTERVAL))/Math.log (INTERVAL/ATTENUATION_TIME+1)+1 (5)
In formula (5), ZREO_TIME indicates to reset the period, and ATTENUATION_TIME indicates damped cycle, INTERVAL Indicate that time interval is delivered in comment, TIME_FACTOR indicates that time fraction weight, TIME_SCORE indicate the time of comment content Point.
As shown in figure 4, passing through a large amount of comment datas according to the computational algorithm of the comment time point of the technical program design The fitting of statistics and result, can obtain the time component curve in Fig. 4.It is horizontal in two-dimensional coordinate system for time component curve Coordinate representation time interval.As shown in Figure 4, a unit in time graph is 10 minutes.Ordinate indicates to utilize this technology The time point that the time that scheme proposes divides computational algorithm to obtain.By time component curve it is found that the time component curve category of the present embodiment In one kind of logarithmic curve, curve is located in the first quartile of coordinate system.By the time graph in Fig. 4 it is found that when comment is delivered Between be spaced bigger, the time is point fewer, guarantees that the comment newly issued has the high time and divides, as time goes by, comprehensively considers Interaction point and content point, poor quality comment will quickly be sunk to the bottom, and high-quality comment has more chances and is screened out.
S14: according to the content point of each comment, interaction point and time point, the gross mass point of each comment is obtained.
In practice, technical staff carries out statistics discovery to a large amount of comment content, and the mutual more comment of momentum all has A standby common trait: there is largely information relevant to comment main body.The time of mutual momentum and publication comment also has centainly Relationship.The more early comment of issuing time, the mutual momentum for the comment issued after mutual ratio of momentum want high.But if comment content Not practical meaning is not consistent with comment main body, and the time for issuing comment is early again, and mutual momentum does not increase yet.Therefore, weigh favorable comment Relationship by content, the time three of the interaction of comment, publication comment is particularly important.
Technical staff based on time dimension, interaction three dimensions of dimension and content dimension, by content point, interaction point, Time point is combined with each other to propose the comment of comment area's magnanimity the purpose of smart noise reduction.Such as: (content point+interaction point+time point) Or the various forms such as (content divides * interaction to divide the * time point) divide to calculate the gross mass of comment.Discovery is compared in practice, Have very by the adaptability that the model of " (content point+interaction point) * time point " comments on a variety of application scenarios screening high-qualities Big raising, the model increase the front and back adjustment effect of time point on the basis of in view of content point and interaction point, can Very good solution comment is difficult the problem of floating after quickly sinking to the bottom after delivering.
In practice, it can continue to increase on the basis of based on content dimension, time dimension and interaction dimension Add other dimensions, obtains the gross mass point for more accurately commenting on content.Such as: the user of user's fractional dimension, different identity is sent out Comment have different user point.When gross mass point calculates, influence of the user identity to gross mass point is comprehensively considered.
In the present embodiment, the gross mass point for all comments that the technical program comments on main body according to one exists from high to low It resequences and shows in list, so that the former pages of comment contents in comment area are premium content, it is not intended to adopted comment content, The comment content of low interaction and duplicate comment content can sink to the tail portion of comment list by the technical program, be not in Former pages in comment area, promoted user reading experience and whole comment content can be readability, while also improving user For the mutual momentum of comment main body.
Referring to Fig. 5, the application also provides a kind of comment information processing system, and the system comprises: memory a and processing Computer program, which is stored, in device b, the memory a realizes following function when the computer program is executed by the processor b Can:
The content of each comment of comment main body is handled, the content point of each comment is obtained;
Using the mutual-action behavior of each comment of the comment main body, the interaction point of each comment is obtained;
Using the temporal information of each comment of the comment main body, the time point of each comment is obtained;
According to the content point of each comment, interaction point and time point, the gross mass point of each comment is obtained.
In the present embodiment, the content point for obtaining each comment, when the computer program is executed by the processor, Realize following functions:
Word segmentation processing is carried out to each comment of comment main body, obtains the sequence of terms of each comment;
Processing is filtered to the sequence of terms of each comment, obtains the individual character word of each comment;Wherein, described Property word be comment information filtering processing after remaining word, for characterize comment content;
The content point of the comment is obtained according to the quantity of the individual character word of each comment.
In the present embodiment, the content point for obtaining each comment, when the computer program is executed by the processor, Realize following functions:
By each comment in the individual character word that repeats delete.
In the present embodiment, the content point for obtaining each comment, when the computer program is executed by the processor, Realize following functions:
The individual character set of words that each is commented on is compared, judges whether the individual character set of words repeats;Wherein, The individual character set of words is obtained after deleting the individual character word repeated in every comment.
In the present embodiment, the content point for obtaining each comment, when the computer program is executed by the processor, Realize following functions:,
The content of the comment point divides threshold value comparison with content, when the content point of the comment is less than or equal to the content point When threshold value, the content of the comment is divided into final content point;When the content of the comment point, which is greater than the content, divides threshold value, institute Content point threshold value is stated as final content point.
In the present embodiment, the content point for obtaining each comment, when the computer program is executed by the processor, Realize following functions:
The comparison result is that the individual character set of words duplicates, then corresponding publication in the individual character set of words repeated The content point of time earliest comment determines according to individual character word quantity in the individual character set of words, other individual character words repeated The content for gathering corresponding comment, which sets up separately, is set to 0;The comparison result is that the individual character set of words does not duplicate, then comments on Content point determined according to individual character word quantity in the individual character set of words.
In the present embodiment, the content point for obtaining each comment, when the computer program is executed by the processor, Realize following functions:
The individual character set of words of the comment is matched with high-quality dictionary, dictionary inferior and shielding dictionary respectively, if The individual character word is high-quality word;Then make bonus point operation processing in the actual content timesharing of determining comment;If the individual character word is Word inferior then makees deduction operation processing in the content timesharing of determining comment, if when the individual character word is shielding word, institute's commentary The content of opinion is allocated as clear operation processing.
In the present embodiment, the content point for obtaining each comment, when the computer program is executed by the processor, Realize following functions:
When the individual character word is picture, video or voice, make bonus point operation processing in the content timesharing for determining comment.
In the present embodiment, the content point for obtaining each comment, when the computer program is executed by the processor, Realize following functions:
Make corresponding plus-minus in the content timesharing for determining comment according to the number of words of comment content and the actual content of comment Divide operation processing;Wherein, the actual content of the comment includes: that emotional semantic is associated with context plot.
In the present embodiment, the content point for obtaining each comment, when the computer program is executed by the processor, Realize following functions:
The content of each comment is pre-processed.
In the present embodiment, the content to each comment pre-processes, and the computer program is by the place When managing device execution, following functions are realized:
Identify other people contents are quoted or forwarded to the comment whether, if the content of a comment includes reference or forwards him People's content, then other people contents are quoted or are forwarded in removal from comment content;
Emoticon is commented in removal comment content.
In the present embodiment, it is described to each comment sequence of terms be filtered processing the step of include:
Any two in the comment are determined according to the corresponding term vector of word each in the sequence of terms of each comment The degree of correlation between word determines significant general of the comment content using the degree of correlation between all words in the comment The probability is less than or equal to comment corresponding to threshold value and filtered out by rate;
The sequence of terms of remaining comment is matched with high frequency dictionary after filtration treatment, according to matching result by high frequency words from It is filtered out in sequence of terms;Wherein, the high frequency dictionary is by making word frequency statistics after comment sample data word segmentation processing, being greater than word Frequency threshold value and with it is described comment main body without practical significance word constitute;
The sequence of terms for filtering out high frequency words is matched with deactivated dictionary, according to matching result by stop words out of comment It is filtered out in appearance;Wherein, the deactivated dictionary is obtained by open source dictionary or self-defining obtains.
In the present embodiment, when the computer program is executed by the processor, following functions are also realized:
Using the subscriber identity information of the comment main body made comments, the user point of each comment is obtained;
According to the content point of each comment, interaction point, time point and user point, the gross mass of each comment is obtained Point.
In the present embodiment, the gross mass point of each comment is obtained, the computer program is executed by the processor When, realize following functions:
It is that the content point of each comment and each were commented on interact point and multiplied by each comment Time point, obtain the gross mass point of each comment.
In the present embodiment, the memory includes but is not limited to random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), caching (Cache), hard disk (Hard Disk Drive, HDD) or storage card (Memory Card).
In the present embodiment, the processor can be implemented in any suitable manner.For example, the processor can be with Take such as microprocessor or processor and storage can by (micro-) processor execute computer readable program code (such as Software or firmware) computer-readable medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), programmable logic controller (PLC) and the form etc. for being embedded in microcontroller.
The concrete function that the comment processing system that this specification embodiment provides, memory and processor are realized, can To contrast explanation with the aforementioned embodiments in this specification, and the technical effect of aforementioned embodiments can be reached, here Just it repeats no more.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present Integrated Circuit Hardware Description Language) and Verilog2.Those skilled in the art It will be apparent to the skilled artisan that only needing method flow slightly programming in logic and being programmed into integrated circuit with above-mentioned several hardware description languages In, so that it may it is readily available the hardware circuit for realizing the logical method process.
It is also known in the art that in addition to realizing client, server in a manner of pure computer readable program code In addition, completely can by by method and step carry out programming in logic come so that client, server with logic gate, switch, dedicated The form of integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc. realizes identical function.Therefore this client, Server is considered a kind of hardware component, and can also be considered as to the device for realizing various functions for including in it Structure in hardware component.Or even, can will be considered as realizing the device of various functions either implementation method Software module can be the structure in hardware component again.
As seen through the above description of the embodiments, those skilled in the art can be understood that the application can It realizes by means of software and necessary general hardware platform.Based on this understanding, the technical solution essence of the application On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product It can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment of the application or embodiment Method described in certain parts.
Each embodiment in this specification is described in a progressive manner, same and similar between each embodiment Part may refer to each other, what each embodiment stressed is the difference with other embodiments.In particular, needle For the embodiment of client, the introduction control for being referred to the embodiment of preceding method is explained.
The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.
Although depicting the application by embodiment, it will be appreciated by the skilled addressee that there are many deformations by the application With variation without departing from spirit herein, it is desirable to which the attached claims include these deformations and change without departing from the application Spirit.

Claims (28)

1. a kind of comment processing method, which is characterized in that the described method includes:
The content of each comment of comment main body is handled, the content point of each comment is obtained;
Using the mutual-action behavior of each comment of the comment main body, the interaction point of each comment is obtained;
Using the temporal information of each comment of the comment main body, the time point of each comment is obtained;
According to the content point of each comment, interaction point and time point, the gross mass point of each comment is obtained.
2. the method as described in claim 1, which is characterized in that the content the step of dividing for obtaining each comment includes:
Word segmentation processing is carried out to each comment of comment main body, obtains the sequence of terms of each comment;
Processing is filtered to the sequence of terms of each comment, obtains the individual character word of each comment;Wherein, the individual character word For remaining word after comment information filtering processing, content is commented on for characterizing;
The content point of the comment is obtained according to the quantity of the individual character word of each comment.
3. method according to claim 2, which is characterized in that obtain the step of content that each is commented on is divided further include:
By each comment in the individual character word that repeats delete.
4. method as claimed in claim 2 or claim 3, which is characterized in that obtain the step of content that each is commented on is divided further include:
The individual character set of words that each is commented on is compared, judges whether the individual character set of words repeats;Wherein, described Individual character set of words is obtained after deleting the individual character word repeated in every comment.
5. method as claimed in claim 2 or claim 3, which is characterized in that obtain the step of content that each is commented on is divided further include:
The content of the comment point divides threshold value comparison with content, divides threshold value when the content point of the comment is less than or equal to the content When, the content of the comment is divided into final content point;It is described interior when the content of the comment point, which is greater than the content, divides threshold value Appearance divides threshold value for final content point.
6. method as claimed in claim 4, which is characterized in that obtain the step of content that each is commented on is divided further include:
The comparison result is that the individual character set of words duplicates, then corresponds to issuing time in the individual character set of words repeated The content of earliest comment point is determining according to individual character word quantity in the individual character set of words, other individual character set of words repeated The content of corresponding comment, which sets up separately, is set to 0;The comparison result is that the individual character set of words does not duplicate, then that comments on is interior Hold to divide and be determined according to individual character word quantity in the individual character set of words.
7. method as claimed in claim 2 or claim 3, which is characterized in that obtain the step of content that each is commented on is divided further include:
The individual character set of words of the comment is matched with high-quality dictionary, dictionary inferior and shielding dictionary respectively, if described Individual character word is high-quality word;Then make bonus point operation processing in the actual content timesharing of determining comment;If the individual character word is poor quality Word then makees deduction operation processing in the content timesharing of determining comment, if the individual character word is shielding word, the comment Content is allocated as clear operation processing.
8. method as claimed in claim 2 or claim 3, which is characterized in that obtain the step of content that each is commented on is divided further include:
When the individual character word is picture, video or voice, make bonus point operation processing in the content timesharing for determining comment.
9. method as claimed in claim 2 or claim 3, which is characterized in that obtain the step of content that each is commented on is divided further include:
Make corresponding positive or negative points behaviour in the content timesharing for determining comment according to the number of words of comment content and the actual content of comment It deals with;Wherein, the actual content of the comment includes: that emotional semantic is associated with context plot.
10. method as claimed in claim 2 or claim 3, which is characterized in that obtain and also wrap the step of content that each is commented on is divided It includes:
The content of each comment is pre-processed.
11. method as claimed in claim 10, which is characterized in that the content to each comment carries out pretreated step Suddenly include:
Identify other people contents are quoted or forwarded to the comment whether, if the content of a comment includes reference or forwards in other people Hold, then other people contents are quoted or forwarded in removal from comment content;
Emoticon is commented in removal comment content.
12. method as claimed in claim 2 or claim 3, which is characterized in that the sequence of terms to each comment is filtered The step of processing includes:
Any two word in the comment is determined according to the corresponding term vector of word each in the sequence of terms of each comment Between the degree of correlation, determine the significant probability of the comment content using the degree of correlation between all words in the comment, The probability is less than or equal to comment corresponding to threshold value to filter out;
The sequence of terms of remaining comment is matched with high frequency dictionary after filtration treatment, according to matching result by high frequency words from word It is filtered out in sequence;Wherein, the high frequency dictionary is by making word frequency statistics after comment sample data word segmentation processing, being greater than word frequency threshold It is worth and comments on what main body was constituted without the word of practical significance with described;
The sequence of terms for filtering out high frequency words is matched with deactivated dictionary, according to matching result by stop words from comment content in It filters out;Wherein, the deactivated dictionary is obtained by open source dictionary or self-defining obtains.
13. the method as described in claim 1, which is characterized in that the mutual-action behavior includes: browsing comment content, shares and comment By content, plays reward comment content, the return operation to comment content, the top operation to comment content and behaviour is stepped on to comment content Make.
14. the method as described in claim 1, which is characterized in that the temporal information includes: to reset the period, damped cycle, comment By delivering the time and time interval is delivered in comment.
15. the method as described in claim 1, which is characterized in that the content point of each comment is commented with each The interaction point of opinion and multiplied by each comment time point, obtain the gross mass point of each comment.
16. a kind of comment processing system, which is characterized in that the system comprises memory and processor, deposited in the memory Computer program is stored up, when the computer program is executed by the processor, realizes following functions:
The content of each comment of comment main body is handled, the content point of each comment is obtained;
Using the mutual-action behavior of each comment of the comment main body, the interaction point of each comment is obtained;
Using the temporal information of each comment of the comment main body, the time point of each comment is obtained;
According to the content point of each comment, interaction point and time point, the gross mass point of each comment is obtained.
17. system as claimed in claim 16, which is characterized in that obtain the content point of each comment, the computer journey When sequence is executed by the processor, following functions are realized:
Word segmentation processing is carried out to each comment of comment main body, obtains the sequence of terms of each comment;
Processing is filtered to the sequence of terms of each comment, obtains the individual character word of each comment;Wherein, the individual character word For remaining word after comment information filtering processing, content is commented on for characterizing;
The content point of the comment is obtained according to the quantity of the individual character word of each comment.
18. system as claimed in claim 17, which is characterized in that obtain the content point of each comment, the computer journey When sequence is executed by the processor, following functions are realized:
By each comment in the individual character word that repeats delete.
19. the system as described in claim 17 or 18, which is characterized in that obtain the content point of each comment, the calculating When machine program is executed by the processor, following functions are realized:
The individual character set of words that each is commented on is compared, judges whether the individual character set of words repeats;Wherein, described Individual character set of words is obtained after deleting the individual character word repeated in every comment.
20. the system as described in claim 17 or 18, which is characterized in that obtain the content point of each comment, the calculating When machine program is executed by the processor, following functions are realized:,
The content of the comment point divides threshold value comparison with content, divides threshold value when the content point of the comment is less than or equal to the content When, the content of the comment is divided into final content point;It is described interior when the content of the comment point, which is greater than the content, divides threshold value Appearance divides threshold value for final content point.
21. system as claimed in claim 19, which is characterized in that obtain the content point of each comment, the computer journey When sequence is executed by the processor, following functions are realized:
The comparison result is that the individual character set of words duplicates, then corresponds to issuing time in the individual character set of words repeated The content of earliest comment point is determining according to individual character word quantity in the individual character set of words, other individual character set of words repeated The content of corresponding comment, which sets up separately, is set to 0;The comparison result is that the individual character set of words does not duplicate, then that comments on is interior Hold to divide and be determined according to individual character word quantity in the individual character set of words.
22. the system as described in claim 17 or 18, which is characterized in that obtain the content point of each comment, the calculating When machine program is executed by the processor, following functions are realized:
The individual character set of words of the comment is matched with high-quality dictionary, dictionary inferior and shielding dictionary respectively, if described Individual character word is high-quality word;Then make bonus point operation processing in the actual content timesharing of determining comment;If the individual character word is poor quality Word then makees deduction operation processing in the content timesharing of determining comment, if the individual character word is shielding word, the comment Content is allocated as clear operation processing.
23. the system as described in claim 17 or 18, which is characterized in that obtain the content point of each comment, the calculating When machine program is executed by the processor, following functions are realized:
When the individual character word is picture, video or voice, make bonus point operation processing in the content timesharing for determining comment.
24. the system as described in claim 17 or 18, which is characterized in that obtain the content point of each comment, the calculating When machine program is executed by the processor, following functions are realized:
Make corresponding positive or negative points behaviour in the content timesharing for determining comment according to the number of words of comment content and the actual content of comment It deals with;Wherein, the actual content of the comment includes: that emotional semantic is associated with context plot.
25. the system as described in claim 17 or 18, which is characterized in that obtain the content point of each comment, the calculating When machine program is executed by the processor, following functions are realized:
The content of each comment is pre-processed.
26. system as claimed in claim 25, which is characterized in that the content to each comment pre-processes, institute When stating computer program and being executed by the processor, following functions are realized:
Identify other people contents are quoted or forwarded to the comment whether, if the content of a comment includes reference or forwards in other people Hold, then other people contents are quoted or forwarded in removal from comment content;
Emoticon is commented in removal comment content.
27. the system as described in claim 17 or 18, which is characterized in that the sequence of terms to each comment carried out Filtering the step of handling includes:
Any two word in the comment is determined according to the corresponding term vector of word each in the sequence of terms of each comment Between the degree of correlation, determine the significant probability of the comment content using the degree of correlation between all words in the comment, The probability is less than or equal to comment corresponding to threshold value to filter out;
The sequence of terms of remaining comment is matched with high frequency dictionary after filtration treatment, according to matching result by high frequency words from word It is filtered out in sequence;Wherein, the high frequency dictionary is by making word frequency statistics after comment sample data word segmentation processing, being greater than word frequency threshold It is worth and comments on what main body was constituted without the word of practical significance with described;
The sequence of terms for filtering out high frequency words is matched with deactivated dictionary, according to matching result by stop words from comment content in It filters out;Wherein, the deactivated dictionary is obtained by open source dictionary or self-defining obtains.
28. system as claimed in claim 16, which is characterized in that obtain the gross mass point of each comment, the computer When program is executed by the processor, following functions are realized:
It is that the content point of each comment and each were commented on interact point and multiplied by each comment on when Between point, obtain the gross mass point of each comment.
CN201711373565.4A 2017-12-19 2017-12-19 Comment processing method and comment processing system Active CN109948138B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711373565.4A CN109948138B (en) 2017-12-19 2017-12-19 Comment processing method and comment processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711373565.4A CN109948138B (en) 2017-12-19 2017-12-19 Comment processing method and comment processing system

Publications (2)

Publication Number Publication Date
CN109948138A true CN109948138A (en) 2019-06-28
CN109948138B CN109948138B (en) 2023-06-20

Family

ID=67005093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711373565.4A Active CN109948138B (en) 2017-12-19 2017-12-19 Comment processing method and comment processing system

Country Status (1)

Country Link
CN (1) CN109948138B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414543A (en) * 2020-03-25 2020-07-14 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for generating comment information sequence
CN111522940A (en) * 2020-04-08 2020-08-11 百度在线网络技术(北京)有限公司 Method and device for processing comment information
CN112492381A (en) * 2019-09-11 2021-03-12 北京字节跳动网络技术有限公司 Information display method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226576A (en) * 2013-04-01 2013-07-31 杭州电子科技大学 Comment spam filtering method based on semantic similarity
CN103389971A (en) * 2013-07-04 2013-11-13 北京卓易讯畅科技有限公司 Method and equipment for determining high-quality grade of comment content corresponding to application
CN104281606A (en) * 2013-07-08 2015-01-14 腾讯科技(北京)有限公司 Method and device for displaying microblog comments
US20170154077A1 (en) * 2015-12-01 2017-06-01 Le Holdings (Beijing) Co., Ltd. Method for comment tag extraction and electronic device
CN107229608A (en) * 2016-03-23 2017-10-03 阿里巴巴集团控股有限公司 Comment spam recognition methods and device
CN107391729A (en) * 2017-08-02 2017-11-24 掌阅科技股份有限公司 Sort method, electronic equipment and the computer-readable storage medium of user comment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226576A (en) * 2013-04-01 2013-07-31 杭州电子科技大学 Comment spam filtering method based on semantic similarity
CN103389971A (en) * 2013-07-04 2013-11-13 北京卓易讯畅科技有限公司 Method and equipment for determining high-quality grade of comment content corresponding to application
CN104281606A (en) * 2013-07-08 2015-01-14 腾讯科技(北京)有限公司 Method and device for displaying microblog comments
US20170154077A1 (en) * 2015-12-01 2017-06-01 Le Holdings (Beijing) Co., Ltd. Method for comment tag extraction and electronic device
CN107229608A (en) * 2016-03-23 2017-10-03 阿里巴巴集团控股有限公司 Comment spam recognition methods and device
CN107391729A (en) * 2017-08-02 2017-11-24 掌阅科技股份有限公司 Sort method, electronic equipment and the computer-readable storage medium of user comment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112492381A (en) * 2019-09-11 2021-03-12 北京字节跳动网络技术有限公司 Information display method and device and electronic equipment
CN112492381B (en) * 2019-09-11 2023-05-30 北京字节跳动网络技术有限公司 Information display method and device and electronic equipment
CN111414543A (en) * 2020-03-25 2020-07-14 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for generating comment information sequence
CN111414543B (en) * 2020-03-25 2023-03-21 抖音视界有限公司 Method, device, electronic equipment and medium for generating comment information sequence
CN111522940A (en) * 2020-04-08 2020-08-11 百度在线网络技术(北京)有限公司 Method and device for processing comment information

Also Published As

Publication number Publication date
CN109948138B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
Leydesdorff et al. Citations: Indicators of quality? The impact fallacy
Abu-Salih et al. Twitter mining for ontology-based domain discovery incorporating machine learning
Kamvar et al. We feel fine and searching the emotional web
Kramer et al. Dimensions of self-expression in Facebook status updates
Li et al. Racism in tourism reviews
CN110019720A (en) A kind of content of comment, which is separately won, takes method and system
Wang et al. Who sets the agenda? The dynamic agenda setting of the wildlife issue on social media
JP5884740B2 (en) Time-series document summarization apparatus, time-series document summarization method, and time-series document summarization program
CN108460153A (en) A kind of social media friend recommendation method of mixing blog article and customer relationship
CN111523923A (en) Merchant comment management system, method, server and storage medium
CN109948138A (en) A kind of comment processing method and system
JP5435249B2 (en) Event analysis apparatus, event analysis method, and program
Bagdouri et al. On predicting deletions of microblog posts
Alp et al. Extracting topical information of tweets using hashtags
Oh et al. How trump won: the role of social media sentiment in political elections
Li et al. Improved new word detection method used in tourism field
Cortez et al. Measuring user influence in financial microblogs: experiments using stocktwits data
Chen et al. Analysis of the public opinion evolution on the normative policies for the live streaming e-commerce industry based on online comment mining under COVID-19 epidemic in China
Deraman et al. A social media mining using topic modeling and sentiment analysis on tourism in Malaysia during COVID19
CN109118243A (en) A kind of product is shared, useful evaluation identifies, method for pushing and server
Gurciullo et al. Complex politics: A quantitative semantic and topological analysis of uk house of commons debates
Hur et al. Are we ready for MICE 5.0? An investigation of technology use in the MICE industry using social media big data
Dugan Mechanizing Alice: Automating the Subject Matter Eligibility Test of Alice v. CLS Bank
Fong et al. An event driven neural network system for evaluating public moods from online users' comments
Preotiuc-Pietro Temporal models of streaming social media data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200514

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Applicant before: Youku network technology (Beijing) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant