CN110633351A - Method, apparatus, device and computer-readable storage medium for processing comments - Google Patents

Method, apparatus, device and computer-readable storage medium for processing comments Download PDF

Info

Publication number
CN110633351A
CN110633351A CN201810538882.5A CN201810538882A CN110633351A CN 110633351 A CN110633351 A CN 110633351A CN 201810538882 A CN201810538882 A CN 201810538882A CN 110633351 A CN110633351 A CN 110633351A
Authority
CN
China
Prior art keywords
comment
processing
signature
information
review
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810538882.5A
Other languages
Chinese (zh)
Other versions
CN110633351B (en
Inventor
施茜
陈思姣
刁世亮
罗雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810538882.5A priority Critical patent/CN110633351B/en
Publication of CN110633351A publication Critical patent/CN110633351A/en
Application granted granted Critical
Publication of CN110633351B publication Critical patent/CN110633351B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products

Abstract

According to an exemplary implementation of the present disclosure, a method for processing reviews is provided. In the method, in response to receiving a comment entered by a user for information in a first information source, a valid portion of the comment is extracted. A signature of the valid portion is obtained based on a predetermined signature rule. Determining a frequency of occurrence of the signature in a review database that includes signatures for historical reviews for a plurality of information in the first information source and a second information source different from the first information source. The comments are then processed based on the frequency of occurrence. According to an example implementation of the present disclosure, an apparatus, a device, and a computer storage medium for processing comments are also provided.

Description

Method, apparatus, device and computer-readable storage medium for processing comments
Technical Field
Implementations of the present disclosure relate generally to review processing and, more particularly, to methods, apparatuses, devices, and computer storage media for processing reviews for information in information sources.
Background
With the rapid development of information technology and the internet, online information is more and more popular, and has become a main way for people to obtain information in daily life. People can obtain information from a variety of sources (e.g., news web sites or news-providing applications). At present, hundreds of millions of information can be shown in the internet at any moment. When a piece of information is related to news, the reading amount of the information can reach millions of times or more.
Users of information sources often post their comments on the information. At this time, the comment is displayed together with the information itself. However, there may be, for example, advertisements, dirty words, or other objectionable information in the review. When the information becomes hot information, the comment can also obtain an extremely high reading amount, thereby causing the bad information to be widely spread. At this time, how to process information and filter information including bad information becomes a research focus. Thus, it is desirable to provide a solution that handles reviews in a more convenient and efficient manner.
Disclosure of Invention
According to an example implementation of the present disclosure, a scheme for processing reviews is provided.
In a first aspect of the present disclosure, a method for processing reviews is provided. In the method, in response to receiving a comment entered by a user for information in a first information source, a valid portion of the comment is extracted. A signature of the valid portion is obtained based on a predetermined signature rule. Determining a frequency of occurrence of the signature in a review database that includes signatures for historical reviews for a plurality of information in the first information source and a second information source different from the first information source. The comments are then processed based on the frequency of occurrence.
In a second aspect of the present disclosure, an apparatus for processing reviews is provided. The device includes: the extracting module is used for responding to the received comments input by the user aiming at the information in the first information source and extracting the effective part of the comments; an obtaining module configured to obtain a signature of the valid portion based on a predetermined signature rule; a determination module configured to determine a frequency of occurrence of the signature in a review database comprising signatures of historical reviews for a plurality of information in the first information source and a second information source different from the first information source; and a processing module configured to process the comment based on the frequency of occurrence.
In a third aspect of the present disclosure, an apparatus is provided. The apparatus includes one or more processors; and storage means for storing the one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method according to the first aspect of the disclosure.
In a fourth aspect of the present disclosure, a computer readable medium is provided, having stored thereon a computer program, which when executed by a processor, implements a method according to the first aspect of the present disclosure.
It should be understood that what is described in this summary section is not intended to limit key or critical features of implementations of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of various implementations of the present disclosure will become more apparent with reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:
FIG. 1 schematically shows a diagram of an application environment in which technical solutions according to exemplary implementations of the present disclosure may be employed;
FIG. 2 schematically shows a block diagram of a technical solution according to an exemplary implementation of the present disclosure;
FIG. 3 schematically illustrates a flow chart of a method according to an exemplary implementation of the present disclosure;
FIG. 4 schematically illustrates a flow chart of another method according to an exemplary implementation of the present disclosure;
FIG. 5 schematically illustrates a flow diagram of a method for extracting active ingredients from reviews, according to an exemplary implementation of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a method for performing further processing on unfiltered reviews, according to an exemplary implementation of the present disclosure;
FIG. 7 schematically illustrates a block diagram of an apparatus for processing reviews, according to an exemplary implementation of the present disclosure; and
FIG. 8 illustrates a block diagram of a computing device capable of implementing various implementations of the present disclosure.
Detailed Description
Implementations of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain implementations of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the implementations set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and implementations of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
In describing implementations of the present disclosure, the terms "include," including, "and their like are to be construed as being inclusive, i.e.," including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one implementation" or "the implementation" should be understood as "at least one implementation". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.
For convenience of description, the meanings of terms referred to in the present disclosure are first introduced. In the context of this disclosure, an information source represents a source that can provide information, e.g., one news site can be used as one information source and another news site can be used as another information source. Applications running on the electronic device may also be used as sources of information. For example, assuming that a news website provider provides a web site and an application for installation on a mobile device, respectively, the web site and the application may be considered different sources of information at this time.
It will be appreciated that although news websites and news applications are specific examples of information sources in this disclosure, in other implementations, the information sources may be, for example, websites, applications provided by other companies, organizations, or individuals, or other forms of sources that may provide information, such as social platforms, forums, and the like. In the context of the present disclosure, information may include, for example, text, images, audio, video, or other multimedia information.
Fig. 1 schematically shows a diagram of an application environment 100 in which technical solutions according to exemplary implementations of the present disclosure may be employed. In fig. 1, an information 110 in an information source is schematically shown, after the information 110 is disclosed, a user of the information source may input his/her comment on the information 110. For example, the user may enter comments 120, 122, and 124, respectively, and so on. It will be appreciated that a user may post one or more comments depending on the different implementations of the information source.
A variety of tools for processing reviews have been developed, however, existing tools are typically developed for filtering reviews of information in a particular website or application, are not fully functional, and have a lot of repetitive labor in developing the tools for each website or application. On the other hand, the existing tools mainly filter the sensitive information in the comments by means of respective keyword databases, the filtering effect depends heavily on the updating of the keyword databases, and various sensitive information in the comments which are expected to be filtered cannot be effectively identified.
Based on the above-mentioned shortcomings in the prior art, it is desirable to provide a technical solution that can process comments in a more convenient and faster manner. Further, it is desirable that this solution can be combined with existing solutions and implemented without changing the hardware architecture of the existing solutions as much as possible.
According to an exemplary implementation of the present disclosure, a concept of a comment database is presented, which may include information related to historical comments for information in a plurality of information sources (e.g., news websites, social websites, and corresponding client applications, etc.), and bad comments may be more accurately identified by comparing user comments with the comment database and determining the frequency of occurrence of comments in the comment database. Especially for various advertisements, flaring speeches, water army characters and the like which are difficult to extract keywords, the identification efficiency can be greatly improved.
Fig. 2 schematically shows a block diagram 200 of a technical solution according to an exemplary implementation of the present disclosure. As shown in FIG. 2, in response to receiving a comment 210 entered by a user for information in a first information source, a valid portion 220 of the comment 210 may be extracted, as indicated by arrow 212. Next, as indicated by arrow 214, a signature 230 of the valid portion 220 may be obtained based on a predetermined signature rule. It will be understood that the signature 230 of the comment 210 herein refers to a hash value determined based on the content of the signature 210 and a hash function, and the hash function on which the signature 230 is determined may be referred to as a signature rule. In different implementations of the present disclosure, different hash functions may be selected as the predetermined signature rules as needed for a particular application environment, as long as it is ensured that the signatures 230 obtained for different comments 210 are unique.
The frequency of occurrence 250 of the signature 230 in the reviews database 240 may then be determined by comparing the signature 230 to the reviews database 240. Here, the review database 240 may include signatures for historical reviews for a plurality of information in a first information source and a second information source different from the first information source. Finally, the comments may be processed based on the frequency of occurrence 250. It will be appreciated that although the review database 240 is shown herein as relating to reviews from two different sources of information, in other examples, the review database 240 may also relate reviews from more sources of information.
More details about processing comment 210 will be described in detail below with reference to fig. 3. Fig. 3 schematically shows a flow chart of a method 300 according to an exemplary implementation of the present disclosure. As shown in FIG. 3, at block 310, a determination is made as to whether a comment entered by a user for information in a first information source has been received. If the determination is YES, operation proceeds to block 320 to extract the valid portion 220 of the comment 210. In this implementation, the valid portion 220 may be extracted from the comment 210 based on a variety of ways. For example, the valid portion 220 may be extracted based on one or more of the following processes: punctuation processing, emoticon processing, traditional character processing, repeated content processing, messy code processing, and content keyword processing.
At block 330, the signature 230 of the valid portion 220 is obtained based on a predetermined signature rule. In this implementation, the predetermined signature rule may be selected from a plurality of candidate signature rules. For example, a selection message digest algorithm fifth edition (MD5) rule, SHA-1 rule, or other candidate rules may be selected.
At block 340, a frequency of occurrence 250 of the signature 230 in the review database 240 is determined, the review database 240 including signatures for historical reviews for a plurality of information in a first information source and a second information source different from the first information source. It will be appreciated that the signatures included in this review database 240 and the rules for determining the signature 230 from the valid portion 220 should be the same so that a comparison can be made between the signatures 230 and in the review database 240.
At block 350, the reviews 210 are processed based on the frequency of occurrence 250. It will be appreciated that the comment 210 may be handled in a variety of ways herein. In this process, if the comment 210 is found to be a bad comment, the comment may be deleted directly or may be further processed for the comment. Hereinafter, how to process for the comment 210 will be described in connection with a specific example of the comment database 240.
According to one exemplary implementation of the present disclosure, a comment may be deleted if the frequency of occurrence 250 is determined to be above a predefined threshold. By comparing the user's reviews 210 with the review database 240 and determining the frequency of occurrence 250 of the reviews 210 in the review database 240, bad reviews may be more accurately identified. Particularly, the method is used for various advertisements, flaring statements, water army characters and the like which are difficult to extract keywords and are difficult to identify by a keyword mode.
According to an exemplary implementation of the present disclosure, the comment database 250 may be updated based on the signature 230. In this implementation, although the comment 210 is deleted and not displayed in the information source, updating the comment database 250 with the signature 230 may record a history of comments entered by the user for subsequent use with other comments.
According to one exemplary implementation of the present disclosure, the frequency of occurrence 250 may represent the number of times the signature 230 occurs in the review database 240. In the following, how to determine the frequency of occurrence 250 will be described based on different implementations of the comment database 240. According to one exemplary implementation of the present disclosure, the comment database 240 may be stored using a data structure as shown in table 1 below.
Table 1 example of a review database
Figure BDA0001678394950000061
Figure BDA0001678394950000071
As shown in table 1, the first column indicates the serial number of each signature, the second column indicates the specific content of the signature, and the third column indicates the number of occurrences of each signature in the entire comment database 240. At this time, assuming that the signature 230 of the received comment 210 is "signature 1", the frequency of occurrence 250 may be determined to be 500 times based on the number in the third column. In this way, individual reviews from a user may be filtered across multiple sources of information. If the number of occurrences of the review 210 in the range of multiple sources of information exceeds a predetermined value (e.g., 3000), it may be deleted; otherwise the comment 210 may be retained and the comment 210 displayed at the relevant location of the information.
According to one exemplary implementation of the present disclosure, the frequency of occurrence 250 may represent the number of occurrences of the signature 230 in the reviews database 240 in association with reviews of given information. According to one exemplary implementation of the present disclosure, the comment database 240 may be stored using a data structure as shown in table 2 below.
Table 2 example of a review database
Serial number Signature Information ID Number of occurrences
1 Signature 1 ID1 500
2 Signature 2 ID2 4000
…… …… ……
As shown in Table 2, the first column represents the serial number of each signature, the second column represents the specific content of the signature, and the third column represents the ID of the information for which the comment associated with the signature is directed, and the fourth column represents the number of occurrences of each signature in the comment database 240 associated with comments for different information. At this time, assuming that the signature 230 of the received comment 210 is "signature 1", it can be determined that the comment 210 appears 500 times in comments for the information "ID 1" based on the numbers in the third and fourth columns. In this way, individual reviews from users may be filtered at a finer granularity across multiple sources of information. If the number of occurrences of a comment 210 in a comment for certain information exceeds a predetermined value (e.g., 300), it may be deleted; otherwise the comment 210 may be retained and displayed at the relevant location of the information.
According to an exemplary implementation of the present disclosure, the dictionary defining the sensitive content to be filtered may also be updated based on the comments 210. The dictionary of sensitive content herein may include some or all of the information of the sensitive content. Once the text corresponding to the sensitive content is found in the comment 210, the comment 210 can be immediately deleted. The dictionary of sensitive content may include, for example, a plurality of dictionaries for recording various aspects of content obtained from a plurality of information sources. For example, the advertisement dictionary may include "query qq", "wholesale retail", "welcome query", and the like. The flaring language dictionary may include, for example, "help forwarding," "forward flooding," "solicit everybody," and so on. It will be understood that the manner in which the dictionary is stored is not limited herein. For example, the dictionary may include the text of the comment directly, or may also include the signature of the comment.
In the process for multiple reviews, if a review is found to occur very frequently, the review may be added to a dictionary of sensitive content. For example, assume that only "detail query qq", "wholesale retail", "reduced price offer", "welcome query" are included in the existing advertising dictionary. If the frequency of occurrence of the reviews "discount offers" is found to gradually increase over a period of time and above a predetermined threshold, the "discount offers" may be added to the advertising dictionary at this point.
According to one exemplary implementation of the present disclosure, the valid portion 220 may be compared to a dictionary defining sensitive content to be filtered to determine whether the valid portion 220 includes sensitive content to be filtered. Then, if it is determined that the valid portion 220 does not include sensitive content to be filtered, the signature 230 of the valid portion 220 is obtained based on a predetermined signature rule. In this manner, pre-processing may be performed first before performing the step of determining a signature, and if the content of the valid portion 220 may already be sufficient to determine that the comment 210 belongs to a bad comment, the subsequent process of determining a signature and comparing to the comment database may not be performed. The comment 210 may be deleted directly here to improve processing efficiency.
Fig. 4 schematically illustrates a flow chart of another method 400 according to an exemplary implementation of the present disclosure. As shown in fig. 4, the steps of the processing shown at blocks 320 to 350 are the same as the processing steps at the corresponding blocks in fig. 3, and thus are not described again. Fig. 4 differs from fig. 3 in that a block 410 is also included between blocks 320 and 330 for the purpose of determining whether sensitive content to be filtered is included in the active portion 220. As shown, the processing at block 330 may be performed only if it is determined that sensitive content is not included. Otherwise, the comment 210 may be deleted directly.
According to one exemplary implementation of the present disclosure, if it is determined that the valid portion 220 includes sensitive relevant content to be filtered, the comment is deleted. Referring again to FIG. 4, at block 410, if it is determined that the valid portion 220 includes sensitive content to be filtered, operational flow proceeds to block 420 to delete the comment 210.
According to one exemplary implementation of the present disclosure, extracting the valid portion 220 of the comment 210 may include various processing, such as at least any one of: punctuation processing, emoticon processing, traditional character processing, repeated content processing, messy code processing and content keyword processing. FIG. 5 schematically illustrates a flow diagram of a method 500 for extracting active ingredients from reviews, according to an exemplary implementation of the present disclosure. More information will be described in detail below with reference to fig. 5.
As shown in FIG. 5, at block 510, punctuation symbol processing may be performed. The rule handled here is that if the same punctuation occurs multiple times, it is left at most three times. Suppose that comment 210 is:
"I particularly like Alice! | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A "
Since a large number of sighs are included in the comment 210, too many sighs can be deleted, and the processed effective part 220 is:
"I particularly like Alice! | A | A "
As shown in fig. 5, at block 520, emoticon processing may be performed. The rule here is that if the same emoticon appears multiple times, it is left at most three times. Suppose that comment 210 is:
' I particularly like Alice [ love heart ]) "
Since a large number of emoticons are included in the comment 210, an excessive number of parts can be deleted, and the processed effective part 220 is:
"I particularly like Alice [ love ]".
As shown in FIG. 5, at block 530, traditional word processing may be performed. The rule handled here is that if a traditional word appears in the comment, the traditional word is converted into a simplified word. Suppose that comment 210 is:
'I special happiness Alice'.
Since the comment 210 includes traditional characters, the traditional characters can be converted into simplified characters, and the processed effective part 220 is:
"i particularly like Alice".
As shown in fig. 5, at block 540, duplicate content processing may be performed. The rule handled here is that if there is a large amount of duplicate content in the comment, the duplicate content can be deleted. Too long reviews are often desired to convey an emphasized meaning, but include a large amount of redundant information. At this time, the removal of the repeated portion does not result in losing the original meaning of the comment. The duplicate content may include two cases: (1) a single word is repeated multiple times, e.g., "Ha-Ha! "; (2) the phrase is repeated multiple times, e.g., "I prefer Alice! I like Alice! I like Alice! I like Alice! "at this point, the processed active portion 220 may be represented as" haha "and" I prefer Alice! "
It will be appreciated that it is also possible that the active portion is too short at this time, as removing duplicates can greatly shorten the content of the comment. For too short content such as "haha," comments associated with too short valid portions may also be deleted directly according to one exemplary implementation of the present disclosure, since "haha" does not have too much semantics.
As shown in fig. 5, at block 550, scrambling code processing may be performed. The rule handled here is that if a garbled occurs in the comment, the garbled part is deleted. Assuming that the comment 210 is "stack sparrow hawk chi ", the comment 210 may be deleted directly. As another example, a portion of the review may include a scrambling code, such as for the review "I prefer Alice! Bi plasma ", then the latter half of the comments may be considered as the wrong input of the user at this time, and only the first half of the comments are retained.
As shown in fig. 5, at block 560, content keyword processing may be performed. The rule handled here is that if a keyword defined in the keyword dictionary appears in the comment, the relevant keyword is masked. The keywords may include multiple types of keywords, for example, a visceral keyword may include "his mom", etc.; obscene keywords may include "nude" and the like. Keyword allographs, pinyins, pinyin acronyms, etc. may also be included in the keyword dictionary for more accurate identification of the keywords desired to be masked.
For reviews containing a small number of keywords, the part of the hit dictionary may be replaced with a special symbol, assuming the review 210 is: "his, must punish strictly for this kind of greedy! "the processed review can be denoted as". about.! If a certain comment has a large number of keywords in the keyword dictionary or the proportion of the special symbols is too large after the comment is shielded, the whole comment can be deleted at the moment.
It will be appreciated that although specific details of the various processes are depicted in a sequential manner in FIG. 5, in a particular application environment, the processes described above may be performed in a different order, may be performed in parallel, or may be repeated. For example, assume that comment 210 is:
"Ha |! I special happiness Alice! I like Alice! I like Alice! I like Alice! [ love ] I prefer Alice! BI ".
The determined valid portion 220 may be expressed as:
"Ha! I like Alice! [ love ] ".
According to one exemplary implementation of the present disclosure, further processing may also be performed on the unfiltered comments. FIG. 6 schematically shows a flow diagram of a method 600 for performing further processing on unfiltered reviews, according to an exemplary implementation of the present disclosure. At block 610, an sentiment score for the comment 210 may be determined. The emotion score herein can indicate the user's support for the information content. For example, sentiment scores may be represented in the interval [ -1,1], where "-1" represents that the review supports a strong opposition attitude to the information and "-1" represents that the review supports a strong support attitude to the information. At block 620, a score may be determined that indicates whether the review is a good review. For example, a good comment score may be represented in the interval [0,1], where "0" represents a poor comment and "1" represents a good comment. At block 630, sentence stems may also be extracted from the reviews 210 to determine the primary perspectives of the reviews from the various users.
According to one exemplary implementation of the present disclosure, a predefined sentiment keyword database may be obtained, the sentiment keyword database defining keywords representing supportive, objectionable and neutral sentiments, respectively. Then, one or more keywords extracted from the comments may be compared with the emotional keyword number database. Next, sentiment of the commentary expression may be determined based on the comparison. In this way, the support/opposition attitude of the respective user to the information may be determined, and further processing may be performed for the comments based on the determined sentiment scores. For example, an article may be written about the supporting information based on the supported comments.
In this implementation, the sentiment keyword database may include a set of related supportive, objectionable, neutral keywords. For example, the database of support may include: keywords such as support, approval and the like; the pertinent objectionable database may include: keywords such as objection, rejection, bad comment and the like; the relational neutral database may include: generally, no so-called, almost no, etc. keywords. According to one exemplary implementation of the present disclosure, sentiment scores may be determined based on the number of times the comment 210 hits a keyword in the respective database. For example, if support, approval, and approval are included in the comment 210, the sentiment score of the comment 210 may be set to 1. For another example, if don't care, general, is included in the comment 210, the sentiment score of the comment 210 may be set to 0.
According to one exemplary implementation of the present disclosure, a sentence backbone may be extracted from the comments, and then the perspective of the comments may be extracted based on the sentence backbone. In this implementation, a subject + predicate-form short sentence may be extracted to express the viewpoint of the comment, or an adjective + noun-form phrase may also be extracted to express the viewpoint of the comment. For example, assume that comment 210 is expressed as "Alice is sweet and enjoys a long life, and that the skill in that movie is particularly good". The extracted viewpoint may be "Alice is very sweet and performing well".
According to one exemplary implementation of the present disclosure, a high-quality comment model trained based on historical comments may be obtained, and comments are evaluated based on the high-quality comment model. In the implementation, the target comment set can be selected from the comment data according to a preset high-quality comment model. The high-quality comment is a novel comment which is representative or unique in view. Specifically, a large amount of comment data can be labeled in advance to obtain high-quality comment data, and then a Bayesian polynomial model is adopted to train a high-quality comment model based on the original comment data and the labeled high-quality comment data and with the comment content and the like as features. And then, after obtaining comment data corresponding to the hotspot information, scoring each comment by using the trained high-quality comment model, and selecting the comments with the scores larger than the preset score to form a target comment set. In this implementation, other models such as neural networks may also be utilized in training the quality review model, and no further limitations are given in the context of this disclosure.
The specific steps on how to process text comments have been described above with reference to the figures. According to one exemplary implementation of the present disclosure, the commentary may include audio content or the like. At this point, the commentary may first be converted from audio to text and then processed as described above.
According to an exemplary implementation of the present disclosure, the comment may also include picture content. For example, a user may upload pictures to express their own view. At this time, the picture may be processed in an image processing manner. For example, the subject matter of the picture, black and white/color, sharpness, similarity, whether a watermark is included, etc. may be identified.
FIG. 7 schematically illustrates a block diagram of an apparatus 700 for processing reviews, according to an exemplary implementation of the present disclosure. Specifically, the apparatus 700 may include: an extraction module 710 configured to extract a valid portion of the comment in response to receiving a comment entered by a user for information in the first information source; an obtaining module 720 configured to obtain a signature of the valid portion based on a predetermined signature rule; a determining module 730 configured for determining a frequency of occurrence of signatures in a review database, the review database comprising signatures of historical reviews for a plurality of information in a first information source and a second information source different from the first information source; and a processing module 740 configured to process the comments based on the frequency of occurrence.
According to an example implementation of the present disclosure, the processing module 740 includes: a deletion module configured to delete the comment in response to determining that the frequency of occurrence is above a predefined threshold; and an update module configured to update the review database based on the signature.
According to one exemplary implementation of the disclosure, the frequency of occurrence includes at least any one of: the number of times the signature appears in the review database; and the number of occurrences that the signature is associated with a review for the given information in the review database.
According to an example implementation of the present disclosure, the processing module 740 further includes: a dictionary update module configured to update a dictionary defining sensitive content to be filtered based on the review.
According to an exemplary implementation of the present disclosure, further comprising: a comparison module configured to compare the valid portion to a dictionary defining sensitive content to be filtered to determine whether the valid portion includes sensitive content to be filtered; and wherein the obtaining module is further configured to obtain a signature of the valid portion based on a predetermined signature rule in response to determining that the valid portion does not include sensitive content to be filtered.
According to an exemplary implementation of the present disclosure, further comprising: a deletion module configured to delete the comment in response to determining that the valid portion includes sensitive relevant content to be filtered.
According to an exemplary implementation of the present disclosure, the extraction module 710 is configured to perform at least any one of: punctuation processing, emoticon processing, traditional character processing, repeated content processing, messy code processing and content keyword processing.
According to an exemplary implementation of the present disclosure, further comprising: a database acquisition module configured to acquire a predefined emotion keyword database defining keywords respectively representing supportive, objectionable and neutral emotions; a comparison module configured to compare one or more keywords extracted from the comments with an emotion keyword database; and an emotion determination module configured to determine an emotion expressed by the comment based on the comparison.
According to an exemplary implementation of the present disclosure, further comprising: a stem extraction module configured to extract a sentence stem from the comment; and a opinion extraction module configured to extract opinions of the comments based on the sentence skeleton.
According to an exemplary implementation of the present disclosure, further comprising: the model acquisition module is configured for acquiring a high-quality comment model trained on the basis of historical comments; and an evaluation module configured to evaluate the review based on the quality review model.
Fig. 8 illustrates a block diagram of a computing device 800 capable of implementing various implementations of the present disclosure. Device 800 may be used to implement computing device 82 of fig. 1. As shown, device 800 includes a Central Processing Unit (CPU)801 that may perform various appropriate actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM)802 or loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processing unit 801 performs the various methods and processes described above, such as the process 400. For example, in some implementations, process 400 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some implementations, part or all of the computer program can be loaded and/or installed onto the device 800 via the ROM 802 and/or the communication unit 809. When loaded into RAM 803 and executed by CPU 801, a computer program may perform one or more of the steps of process 400 described above. Alternatively, in other implementations, CPU 801 may be configured to perform process 400 in any other suitable manner (e.g., by way of firmware).
According to an exemplary implementation of the present disclosure, a computer-readable storage medium having a computer program stored thereon is provided. The program when executed by a processor implements the methods described in the present disclosure.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), and the like.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (22)

1. A method for processing reviews, comprising:
in response to receiving a comment input by a user for information in a first information source, extracting a valid portion of the comment;
acquiring a signature of the valid portion based on a predetermined signature rule;
determining a frequency of occurrence of the signature in a review database comprising signatures of historical reviews for a plurality of information in the first information source and a second information source different from the first information source; and
processing the comment based on the frequency of occurrence.
2. The method of claim 1, wherein processing the comments based on the frequency of occurrence comprises:
deleting the comment in response to determining that the frequency of occurrence is above a predefined threshold; and
updating the review database based on the signature.
3. The method of claim 2, wherein the frequency of occurrence comprises at least any one of:
a number of times the signature appears in the review database; and
the number of occurrences of the signature in the reviews database associated with a review of given information.
4. The method of claim 2, wherein processing the comment based on the frequency of occurrence further comprises:
a dictionary defining sensitive content to be filtered is updated based on the review.
5. The method of claim 1, further comprising:
comparing the valid portion to a dictionary defining sensitive content to be filtered to determine whether the valid portion includes sensitive content to be filtered; and
in response to determining that the valid portion does not include sensitive content to be filtered, obtain the signature of the valid portion based on the predetermined signature rule.
6. The method of claim 1, further comprising:
deleting the comment in response to determining that the valid portion includes sensitive relevant content to be filtered.
7. The method of claim 1, wherein extracting the valid portion of the comment comprises at least any one of:
punctuation processing, emoticon processing, traditional character processing, repeated content processing, messy code processing, and content keyword processing.
8. The method of claim 1, further comprising:
obtaining a predefined sentiment keyword database defining keywords representing supporting, objecting and neutral sentiments, respectively;
comparing the one or more keywords extracted from the comment with the emotional keyword database; and
determining an emotion expressed by the comment based on the comparison.
9. The method of claim 1, further comprising:
extracting a sentence backbone from the comment; and
a perspective of the comment is extracted based on the sentence skeleton.
10. The method of claim 1, further comprising:
acquiring a high-quality comment model trained based on historical comments; and
evaluating the review based on the premium review model.
11. An apparatus for processing reviews, comprising:
the extracting module is used for responding to the received comments input by the user aiming at the information in the first information source and extracting the effective part of the comments;
an obtaining module configured to obtain a signature of the valid portion based on a predetermined signature rule;
a determination module configured to determine a frequency of occurrence of the signature in a review database comprising signatures of historical reviews for a plurality of information in the first information source and a second information source different from the first information source; and
a processing module configured to process the comment based on the frequency of occurrence.
12. The apparatus of claim 11, wherein the processing module comprises:
a deletion module configured to delete the comment in response to determining that the frequency of occurrence is above a predefined threshold; and
an update module configured to update the review database based on the signature.
13. The apparatus of claim 12, wherein the frequency of occurrence comprises at least any one of:
a number of times the signature appears in the review database; and
the number of occurrences of the signature in the reviews database associated with a review of given information.
14. The apparatus of claim 12, wherein the processing module further comprises:
a dictionary update module configured to update a dictionary defining sensitive content to be filtered based on the review.
15. The apparatus of claim 11, further comprising:
a comparison module configured to compare the valid portion to a dictionary defining sensitive content to be filtered to determine whether the valid portion includes sensitive content to be filtered; and
wherein the obtaining module is further configured to obtain the signature of the valid portion based on the predetermined signature rule in response to determining that the valid portion does not include sensitive content to be filtered.
16. The apparatus of claim 11, further comprising:
a deletion module configured to delete the comment in response to determining that the valid portion includes sensitive relevant content to be filtered.
17. The apparatus of claim 11, wherein the extraction module is configured to perform at least any one of: punctuation processing, emoticon processing, traditional character processing, repeated content processing, messy code processing, and content keyword processing.
18. The apparatus of claim 11, further comprising:
a database acquisition module configured to acquire a predefined emotion keyword database defining keywords representing supporting, objecting and neutral emotions, respectively;
a comparison module configured to compare one or more keywords extracted from the comment with the emotional keyword database; and
an emotion determination module configured to determine an emotion expressed by the comment based on the comparison.
19. The apparatus of claim 11, further comprising:
a stem extraction module configured to extract a sentence stem from the comment; and
a opinion extraction module configured to extract opinions of the comments based on the sentence stems.
20. The apparatus of claim 11, further comprising:
the model acquisition module is configured for acquiring a high-quality comment model trained on the basis of historical comments; and
an evaluation module configured to evaluate the review based on the premium review model.
21. An apparatus, the apparatus comprising:
one or more processors; and
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method according to any one of claims 1-10.
22. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-10.
CN201810538882.5A 2018-05-30 2018-05-30 Method, apparatus, device and computer-readable storage medium for processing comments Active CN110633351B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810538882.5A CN110633351B (en) 2018-05-30 2018-05-30 Method, apparatus, device and computer-readable storage medium for processing comments

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810538882.5A CN110633351B (en) 2018-05-30 2018-05-30 Method, apparatus, device and computer-readable storage medium for processing comments

Publications (2)

Publication Number Publication Date
CN110633351A true CN110633351A (en) 2019-12-31
CN110633351B CN110633351B (en) 2022-09-13

Family

ID=68966132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810538882.5A Active CN110633351B (en) 2018-05-30 2018-05-30 Method, apparatus, device and computer-readable storage medium for processing comments

Country Status (1)

Country Link
CN (1) CN110633351B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968408A (en) * 2012-11-23 2013-03-13 西安电子科技大学 Method for identifying substance features of customer reviews
CN103778109A (en) * 2014-02-13 2014-05-07 北京奇艺世纪科技有限公司 Method and device for identifying user comments
CN103957275A (en) * 2014-05-19 2014-07-30 北京奇虎科技有限公司 Pushing method, client terminal, server and system for user commenting information
CN106708816A (en) * 2015-07-16 2017-05-24 北京国双科技有限公司 Handling method and device of repeat content of webpage text in webpage analysis
CN107729538A (en) * 2017-10-31 2018-02-23 广东欧珀移动通信有限公司 comment information processing method, device, terminal device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968408A (en) * 2012-11-23 2013-03-13 西安电子科技大学 Method for identifying substance features of customer reviews
CN103778109A (en) * 2014-02-13 2014-05-07 北京奇艺世纪科技有限公司 Method and device for identifying user comments
CN103957275A (en) * 2014-05-19 2014-07-30 北京奇虎科技有限公司 Pushing method, client terminal, server and system for user commenting information
CN106708816A (en) * 2015-07-16 2017-05-24 北京国双科技有限公司 Handling method and device of repeat content of webpage text in webpage analysis
CN107729538A (en) * 2017-10-31 2018-02-23 广东欧珀移动通信有限公司 comment information processing method, device, terminal device and storage medium

Also Published As

Publication number Publication date
CN110633351B (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN108536852B (en) Question-answer interaction method and device, computer equipment and computer readable storage medium
US20170185581A1 (en) Systems and methods for suggesting emoji
CN107704512B (en) Financial product recommendation method based on social data, electronic device and medium
EP2339514A1 (en) System and method for identifying topics for short text communications
EP3035210A1 (en) Method and device for obtaining web page category standards, and method and device for categorizing web page categories
US9311372B2 (en) Product record normalization system with efficient and scalable methods for discovering, validating, and using schema mappings
CN105138511A (en) Method and system for semantically analyzing search keyword
US8793120B1 (en) Behavior-driven multilingual stemming
US9772991B2 (en) Text extraction
CN111930895A (en) Document data retrieval method, device, equipment and storage medium based on MRC
CN108536676B (en) Data processing method and device, electronic equipment and storage medium
Kantharaj et al. Opencqa: Open-ended question answering with charts
CN111091883B (en) Medical text processing method, device, storage medium and equipment
US20140324740A1 (en) Ontology-Based Attribute Extraction From Product Descriptions
KR100998696B1 (en) System for searching advertisement keyword and method for providing recommended advertisement keyword
CN110633351B (en) Method, apparatus, device and computer-readable storage medium for processing comments
CN111259223A (en) News recommendation and text classification method based on emotion analysis model
CN110991169A (en) Method and device for identifying risk content variety and electronic equipment
CN116225956A (en) Automated testing method, apparatus, computer device and storage medium
CN107169065B (en) Method and device for removing specific content
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium
CN112115237A (en) Method and device for constructing tobacco scientific and technical literature data recommendation model
CN112597295A (en) Abstract extraction method and device, computer equipment and storage medium
CN111736804A (en) Method and device for identifying App key function based on user comment
CN107943781B (en) Keyword recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant