WO2018129978A1 - 信息处理方法、装置、存储介质及计算机设备 - Google Patents

信息处理方法、装置、存储介质及计算机设备 Download PDF

Info

Publication number
WO2018129978A1
WO2018129978A1 PCT/CN2017/107191 CN2017107191W WO2018129978A1 WO 2018129978 A1 WO2018129978 A1 WO 2018129978A1 CN 2017107191 W CN2017107191 W CN 2017107191W WO 2018129978 A1 WO2018129978 A1 WO 2018129978A1
Authority
WO
WIPO (PCT)
Prior art keywords
comment
user
queue
threshold
information
Prior art date
Application number
PCT/CN2017/107191
Other languages
English (en)
French (fr)
Inventor
林海
Original Assignee
广东欧珀移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东欧珀移动通信有限公司 filed Critical 广东欧珀移动通信有限公司
Publication of WO2018129978A1 publication Critical patent/WO2018129978A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web

Definitions

  • the present application relates to the field of communications technologies, and in particular, to the field of Internet technologies, and in particular, to an information processing method, apparatus, storage medium, and computer device.
  • the embodiment of the present application provides an information processing method, device, storage medium, and computer device, which can improve information processing efficiency.
  • an embodiment of the present application provides an information processing method, where the method includes:
  • Traversing the comment queue determining whether the number of comments in the comment queue that are the same as or similar to the user comment reaches a first threshold, wherein the comment queue is a first in first out queue and the length has a second threshold;
  • the user comment is added to the comment queue, and the last comment of the FIFO queue is processed according to the second threshold.
  • the embodiment of the present application further provides an information processing apparatus, where the apparatus includes:
  • a first judging module configured to traverse the comment queue, and determine whether the number of comments in the comment queue that is the same as or similar to the user comment reaches a first threshold, wherein the comment queue is a first in first out queue and the length has a second threshold ;
  • a determining module configured to determine the user comment as a spam comment when determining that the number of comments in the comment queue that are the same as or similar to the user comment reaches a first threshold
  • a processing module configured to add the user comment to the comment queue when determining that the number of comments in the comment queue that are the same as or similar to the user comment does not reach the first threshold, and to the advanced according to the second threshold The tail-end comments of the first-out queue are processed.
  • an embodiment of the present application provides a storage medium.
  • the storage medium stores a plurality of instructions, where the instructions are adapted to be loaded by a processor to perform an information processing method provided by any one of the embodiments of the present application.
  • the embodiment of the present application further provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor calls the memory to store
  • the computer program executes the information processing method described in any of the embodiments of the present application.
  • the embodiment of the present application provides an information processing method, device, storage medium, and computer device, which can improve information processing efficiency.
  • FIG. 1 is a schematic flowchart diagram of an information processing method according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a first use state of an information processing method according to an embodiment of the present disclosure.
  • FIG. 3 is a schematic diagram of a second usage state of an information processing method according to an embodiment of the present disclosure.
  • FIG. 4 is another schematic flowchart of an information processing method according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a third usage state of an information processing method according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a fourth usage state of an information processing method according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application.
  • FIG. 8 is another schematic structural diagram of an information processing apparatus according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.
  • references to "an embodiment” herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the present application.
  • the appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will understand and implicitly understand that the embodiments described herein can be combined with other embodiments.
  • An embodiment of the present application provides an information processing method, including:
  • Traversing the comment queue determining whether the number of comments in the comment queue that are the same as or similar to the user comment reaches a first threshold, wherein the comment queue is a first in first out queue and the length has a second threshold;
  • the user comment is added to the comment queue, and the last comment of the comment queue is processed according to the second threshold.
  • the user comment is added to the comment queue, and the tail comment of the comment queue is processed according to the second threshold, including:
  • the user comment is added to the comment queue as a team leader comment, and the team tail comment that overflows the second threshold is deleted.
  • the method further includes:
  • the determining whether the comment information in the user comment exists in the blacklist library includes:
  • the method further includes:
  • the contact mode is added to the blacklist library as feature information.
  • the comment similar to the user comment includes a historical comment that the similarity to the user comment reaches a third threshold.
  • the feature information includes any one or more of a username, a user ID, a contact, a keyword, and a homonym of the keyword.
  • An execution body of an information processing method provided by an embodiment of the present application may be an information processing apparatus provided by an embodiment of the present application, or a computer device (such as a desktop computer, a notebook, a palmtop computer, or the like) integrated with the information processing apparatus.
  • a computer device such as a desktop computer, a notebook, a palmtop computer, or the like
  • the tablet, the smart phone, etc., the information processing device can be implemented by hardware or software.
  • FIG. 1 is a schematic flowchart diagram of an information processing method according to an embodiment of the present application. The method includes:
  • step S101 a user comment is obtained.
  • Step S102 traversing the comment queue, determining whether the number of comments in the comment queue that are the same as or similar to the user comment reaches a first threshold, wherein the comment queue is a first in first out queue and
  • the length has a second threshold; if not, step S103 is performed; if so, step S104 is performed.
  • the user may be judged to be in the comment queue by determining whether the number of comments of the historical comment that is in the comment queue and the degree of similarity of the user comment reaches a third threshold reaches a first threshold. Whether the number of comments with the same or similar comments reaches the first threshold.
  • step S103 determines that the comment queue is the same as or similar to the user comment If the number of comments does not reach the first threshold.
  • step S104 Determining the number of comments in the comment queue that are the same as or similar to the user comment when it is determined that the number of comments of the historical comment that exists in the comment queue with the similarity of the user comment reaches a third threshold reaches a first threshold When the first threshold is reached, step S104 is performed.
  • Step S103 adding the user comment to the comment queue, and processing the tail comment of the comment queue according to the second threshold.
  • the user comment can be added to the comment queue as a team leader comment and the team tail comment that overflows the second threshold is deleted.
  • Step S104 determining the user comment as a spam comment.
  • FIG. 2 is a schematic diagram of a first usage state of an information processing method according to an embodiment of the present application
  • FIG. 3 is a schematic information provided by an embodiment of the present application.
  • the server of the forum traverses the comment queue to determine whether the number of comments in the comment queue that are the same as or similar to the user comment reaches a first threshold; if so, the user comment is determined to be a spam comment; if not, the user comment If it is determined to be a non-spam comment, the user comment is added to the comment queue, and the tail-end comment of the FIFO queue is processed according to the second threshold.
  • the comment queue is a first in first out queue and has a length of 1000.
  • the comment queue is updated,
  • the user comment "Getting a smog sensor again” is added to the comment queue as the leader comment displayed in the comment area, and the comment comment time is the earliest and overflows the 1000th tail comment "to find the formaldehyde sensor. "Delete.
  • the embodiment of the present application by obtaining a user comment and traversing the comment queue, it is determined whether the number of comments in the comment queue that is the same as or similar to the user comment reaches a first threshold, wherein the comment queue is a first in first out queue and has a length a second threshold, if yes, determining the user comment as a spam comment; if not, adding the user comment to the comment queue, and processing the tail comment of the FIFO queue according to the second threshold .
  • the embodiment of the present application can effectively identify the spam comment, and when the user comment is identified as a non-spam comment, only the comment queue needs to be updated, the processing of all the contents in the database is avoided, the operation load of the system is reduced, and the information processing efficiency is effectively improved.
  • the method further includes:
  • the determining whether the comment information in the user comment exists in the blacklist library includes:
  • the contact mode is added to the blacklist library as feature information.
  • the comment similar to the user comment includes a historical comment that has a similarity to the user comment that reaches a third threshold.
  • FIG. 4 is another schematic flowchart of an information processing method according to an embodiment of the present application.
  • the method includes:
  • step S201 a user comment is obtained.
  • Step S202 determining whether the comment information in the user comment exists in the blacklist library. If no, step S203 is performed; if yes, step S205 is performed.
  • the comment information in the user comment may include information such as a user name, a user ID, a comment content, a comment posting time, and the like.
  • step S205 it is determined whether the user comment contains information that matches the feature information in the blacklist library. If yes, go to step S205; if no, go to step S203.
  • the public platform can include e-commerce platforms, forums, communities, websites, Weibo, post bars, blogs, and application download platforms.
  • the user identity information of the website becomes a user of the website, and the user can display its user behavior on the website, such as posting an article, publishing a product, posting a microblog, posting, Reply to comments, etc., you can also comment, like, and so on other published information.
  • some users may post a large number of spam comments with the same or similar content, such as ad reviews, sales comments, comments with adverse effects such as reaction, violence, pornography, hyperlinks, scams, scams, etc.
  • the blacklist library may be preset, and the blacklist library includes multiple feature information.
  • the feature information includes any one or more of a username, a user ID, a contact, a keyword, and a homonym of the keyword.
  • the format of the contact may be a combination of letters and numbers, and the length is more than 7 bytes.
  • phone number mobile phone number
  • micro signal code QQ number.
  • the keywords may include hyperlinks and advertisement words, prohibited words, special symbols, and the like.
  • user comments submitted by users include hyperlinks and advertising words, such as product promotion, store or website recommendation, company promotion, business promotion, and so on.
  • the hyperlink generally appears in the form of a web address, and a plurality of consecutive English alphabet characters, such as http://..., may be set as the keyword, and the keyword in the user comment may be scanned. Detect whether there is a hyperlink; if the hyperlink is included, it is considered that the user comment may be a spam comment, and then further determine whether the advertisement word is included.
  • the advertisement words for example, setting the words such as QQ, special price, hot sale, Taobao, and ⁇ as the keywords of the announcement word, further includes setting the combination of any number and “yuan” as the feature information.
  • the prohibited word is a vocabulary containing a personal attack.
  • some users may add special symbols to the text of a keyword or comment when submitting a user's comment, thereby avoiding the detection of spam comments by the relevant platform. Therefore, special symbols such as " ⁇ ", “*”, “#”, “&”, etc. can be set as keywords, and the feature information is stored in the blacklist library.
  • the user may replace the original keyword with a homonym or a near sound to avoid the detection of spam comments on the relevant platform, such as "fishing people 3
  • the squid is lifted.
  • the galactic new a5a7a9 is used to raise the line. Therefore, for the above case including homophonic or near-tone, the homonym of the keyword can be set as the feature information to be stored in the blacklist library.
  • the steps are performed.
  • S205 For example, the user comment submitted by the user is “deep text, worth learning.” If it is detected that the package does not contain the information matching the feature information in the blacklist, the step S203 is performed.
  • step S203 is performed.
  • FIG. 5 is a schematic diagram of a third usage state of an information processing method according to an embodiment of the present disclosure.
  • step S205 is performed.
  • the whitelist library may also be set to determine whether the comment information in the user comment exists in the whitelist library, and if yes, the user comment may be determined to be a non-spam comment; if otherwise, the user may be Comments are identified as spam comments.
  • the keyword may be a core term related to the product, and the keyword related to the product standard description may be stored in the whitelist library in advance, if the user is detected to submit the product. If the comment information does not contain any keyword in the product standard description, the user comment may be determined as a spam comment; if it is detected that the user submits the comment information for the product, any one or more of the product standard descriptions are included.
  • the user comment can be determined to be a non-spam comment.
  • the emotional words include the emotional vocabulary of the user's true will expressing their subjective opinions, attitudes, feelings, emotions, and the like.
  • the comments of the products are people's evaluations and arguments about the relevant parameters of the products and the purchasing experience. People can truly express their subjective opinions, attitudes and feelings through comments. , emotions, etc. Therefore, product reviews necessarily include the comments of the reviewer. The fewer the number of emotional words, the more likely it is to be a spam comment.
  • Step S203 traversing the comment queue, determining whether the number of comments in the comment queue that are the same as or similar to the user comment reaches a first threshold; if not, executing step S204; if yes, executing step S205.
  • the number of comments in the comment queue that are the same as or similar to the user comment can be determined by detecting whether a comment queue contains the same or similar historical comment as the user comment.
  • the comment information in the user comment does not exist in the blacklist library, there are a large number of historical comments in the comment queue that are the same as or similar to the content of the user comment, when the content is the same or similar to the comment information.
  • the number of comments reaches a certain threshold, it will also hinder the user's access to useful information.
  • the user comments of the duplicate content can also be classified as spam comments. Therefore, in order to more accurately identify the spam comment, it may further detect whether the comment queue contains the same or similar historical comment as the user comment, and determine whether the number of comments in the comment queue that are the same as or similar to the user comment is The first threshold is reached.
  • the comment queue is a FIFO queue composed of historical comments.
  • the comment similar to the user comment includes a historical comment that the similarity to the user comment reaches a third threshold. Determining, in the comment queue, the same or similar to the user comment in the comment queue by determining whether the number of comments of the historical comment that exists in the comment queue to the third threshold reaches a first threshold Whether the number of comments reaches the first threshold.
  • the magnitude of the similarity can be determined by comparing the degree of matching of the user comments with the information contained in the historical comments in the comment queue.
  • the third threshold may be 80%, and is determined to be similar when the user's comment matches the information contained in the historical comment in the comment queue by 90%; when the user comments and the historical comment in the comment queue When the matching degree of the information contained is 100%, it is determined to be the same.
  • the comment queue may include any one of a chained queue and an arrayed queue.
  • the queue is a linear table, and the data elements of the queue are also called queue elements. Inserting a queue element into the queue is called enqueue, and deleting a queue element from the queue becomes dequeued. Because the queue is only allowed to be inserted at one end, and deleted at the other end, that is, the element that first enters the queue can be deleted from the queue first, so the queue is also called first in first out (FIFO_first). In first out) Linear table. Therefore, the comment queue can be referred to as a first in first out queue.
  • the queue can be stored with the array Q[1...m], which is the maximum capacity allowed by the queue.
  • Q the maximum capacity allowed by the queue.
  • two pointers are required: head, the head of the team, pointing to the actual leader of the team; tail, the tail pointer, pointing to the next position of the actual tail element.
  • the initial value of the two pointers is set to 0, then the queue is empty and there are no elements.
  • the number of queue elements reaches the upper bound m of the array, when a new queue element is enqueued, the queue element that first entered the queue is deleted from the queue.
  • the queue can also be stored in a linked list, and the sequential relationship of the data in the mathematical logic is indicated by the pointer of the storage address of the element, thereby forming a chained queue, which can dynamically perform storage allocation.
  • the comment queue is an array queue
  • the second threshold of the length of the comment queue is the maximum capacity of the array queue, for example, 1000 user comments.
  • the comment queue may be rejected for updating. And adding 1 to the list of clicks recording the same historical comment as the user comment to indicate that there are other people posting the same or similar user comments as the content of the historical comment, or indicating that others agree with the historical comment. Content.
  • the first threshold value is 5, and the number of comments in the comment queue that is the same as the user comment that the content is "a smog sensor is obtained soon" is 1, and the comment queue is determined. If the number of comments in which the user comments are the same or similar does not reach the first threshold, step S204 is performed.
  • the first threshold is 5
  • the content of the comment queue is "fishing people 3 squid lifting line Jia Weixin a5a7a9"
  • the user comment of the lesson line has the same number of comments of 7, and it is determined that the number of comments in the comment queue that are the same as or similar to the user comment has reached the first threshold, and step S205 is performed.
  • Step S204 adding the user comment to the comment queue as a team leader comment, and deleting the team tail comment overflowing the second threshold.
  • the length of the FIFO queue can be preset to a second threshold.
  • the length can be represented by the total number of packets that can be accommodated in the array queue.
  • the array needs to be set to a fixed size before being established, that is, an appropriate byte length is set for each queue element to satisfy a single queue element.
  • the requirement for byte length can be understood as that each queue element represents a data packet, and each data packet has a fixed size. For example, if the array is N[1...1000], the second threshold is 1000.
  • the length of the FIFO queue can also be represented by the number of pointers of the storage units in the chain queue. The linked list does not need to allocate a fixed size storage space in advance. When data needs to be stored, one queue element can be set.
  • a suitable storage unit is used to store the data and link the storage unit to other storage units in the queue by pointers.
  • the content of the comment queue is changed in real time. For example, when a new user comments in the area displayed in the comment area is added to the queue, the user comment is added to the comment queue as the team leader comment as the history of the team tail comment.
  • the comments are queued, and the queue numbers of other historical reviews are incremented by one on the original basis.
  • Step S205 determining the user comment as a spam comment.
  • the comment queue may be rejected for updating.
  • FIG. 6 is a schematic diagram of a fourth usage state of an information processing method according to an embodiment of the present application.
  • a prompt box may be popped up to remind the user of the prompt information that the comment information fails to be published.
  • the pop-up content is a prompt box of “Comment review failed: spam comment!”, and refuses to update the comment queue, and the comment posted on the mobile phone interface is displayed. The comment area has not changed.
  • Step S206 detecting whether the user comment contains a contact information, and if yes, adding the contact mode to the blacklist library as the feature information.
  • the new contact extracted in the user comment is added to the black.
  • the list library is used as feature information.
  • the contact mode is the old contact mode, the original contact information in the blacklist library may be overwritten or not added to the blacklist library.
  • the new contact information is extracted and added to the blacklist library as feature information, as a detection basis of the next user comment.
  • the content is "fishing darling 3 squid lifting line Jia Weixin a5a7a9
  • the new contact mode "a5a7a9" in the user comment is extracted, and "a5a7a9" is added to the blacklist library as feature information.
  • the embodiment of the present application determines whether the user comment is a spam comment by detecting whether the user comment contains information that matches the feature information in the blacklist library, and traverses the comment queue when the user comment is a non-spam comment. And, when determining that the number of comments in the comment queue that is the same as or similar to the user comment does not reach the first threshold, adding the user comment to the comment queue as a leader comment, and overflowing the second threshold The end of the team comments are removed.
  • the historical comment of the content repetition is further detected, thereby eliminating user comments that avoid adding a large amount of duplicate content, reducing the running burden of the system, effectively improving the information processing efficiency, and improving The efficiency with which users get useful information.
  • An embodiment of the present application further provides an information processing apparatus, including:
  • a first judging module configured to traverse the comment queue, and determine whether the number of comments in the comment queue that is the same as or similar to the user comment reaches a first threshold, wherein the comment queue is a first in first out queue and the length has a second threshold ;
  • a determining module configured to determine the user comment as a spam comment when determining that the number of comments in the comment queue that are the same as or similar to the user comment reaches a first threshold
  • a processing module configured to add the user comment to the comment queue when the number of comments in the comment queue that is the same as or similar to the user comment does not reach the first threshold, and compare the comment according to the second threshold The queue's tail comment is processed.
  • the processing module is configured to add the user comment to the comment queue as a team leader comment, and delete the team tail comment that overflows the second threshold.
  • the apparatus further includes:
  • a second determining module configured to determine, after the obtaining module obtains the user comment, whether the comment information in the user comment exists in a blacklist library, and if yes, determine the user comment as a spam comment;
  • the first determining module is configured to traverse the comment queue when the second determining module determines to be no, and determine whether the number of comments in the comment queue that is the same as or similar to the user comment reaches a first threshold.
  • the second determining module is configured to determine whether the user comment includes information that matches the feature information in the blacklist library, and if yes, determine that the comment information in the user comment is black. In the list library.
  • the apparatus further includes:
  • the detecting module is configured to detect, when the user comment is determined to be a spam comment, whether the user comment contains a contact information, and if yes, add the contact mode to the blacklist library as the feature information.
  • the comment similar to the user comment includes a historical comment that the similarity to the user comment reaches a third threshold.
  • FIG. 7 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application.
  • the information processing apparatus 30 includes an acquisition module 31, a first determination module 33, a processing module 34, and a determination module 35.
  • the obtaining module 31 is configured to obtain a user comment.
  • the first judging module 33 is configured to traverse the comment queue, and determine whether the number of comments in the comment queue that is the same as or similar to the user comment reaches a first threshold, wherein the comment queue is a first in first out queue and has a length Second threshold.
  • the determining module 35 is configured to determine the user comment as a spam comment when it is determined that the number of comments in the comment queue that is the same as or similar to the user comment reaches a first threshold.
  • the processing module 34 is configured to: when determining that the number of comments in the comment queue that is the same as or similar to the user comment does not reach the first threshold, add the user comment to the comment queue, and according to the second threshold pair The tail comment of the comment queue is processed.
  • FIG. 8 is another schematic structural diagram of an information processing apparatus according to an embodiment of the present disclosure.
  • the information processing device 30 includes an obtaining module 31, a second determining module 32, a first determining module 33, a processing module 34, a determining module 35, and a detecting module 36.
  • the obtaining module 31 is configured to obtain a user comment.
  • the acquisition module 31 obtains the user comment.
  • the second determining module 32 is configured to determine, after the obtaining module 31 obtains the user comment, whether the comment information in the user comment exists in the blacklist library, and if yes, determine the user comment as a spam comment.
  • the comment information in the user comment may include information such as a user name, a user ID, a comment content, a comment posting time, and the like.
  • the second determining module 32 is configured to determine whether the user comment includes information that matches the feature information in the blacklist library, and if yes, determine that the comment information in the user comment is black. In the list library.
  • the public platform can include e-commerce platforms, forums, communities, websites, Weibo, post bars, blogs, and application download platforms.
  • the user identity information of the website becomes a user of the website, and the user can display its user behavior on the website, such as posting an article, publishing a product, posting a microblog, posting, Reply to comments, etc., you can also comment, like, and so on other published information.
  • some users may post a large number of spam comments with the same or similar content, such as ad reviews, sales comments, comments with adverse effects such as reaction, violence, pornography, hyperlinks, scams, scams, etc.
  • the blacklist library may be preset, and the blacklist library includes multiple feature information.
  • the feature information includes any one or more of a username, a user ID, a contact, a keyword, and a homonym of the keyword.
  • the format of the contact may be a combination of letters and numbers, and the length is more than 7 bytes.
  • phone number mobile phone number
  • micro signal code QQ number.
  • the keywords may include hyperlinks and advertisement words, prohibited words, special symbols, and the like.
  • user comments submitted by users include hyperlinks and advertising words, such as product promotion, store or website recommendation, company promotion, business promotion, and so on.
  • the hyperlink generally appears in the form of a web address, and a plurality of consecutive English alphabet characters, such as http://..., may be set as the keyword, and the keyword in the user comment may be scanned. Detect whether there is a hyperlink; if the hyperlink is included, it is considered that the user comment may be a spam comment, and then further determine whether the advertisement word is included.
  • the advertisement words for example, setting the words such as QQ, special price, hot sale, Taobao, and ⁇ as the keywords of the announcement word, further includes setting the combination of any number and “yuan” as the feature information.
  • the second determining module 32 determines that the comment information in the user comment exists in the blacklist library, and determines the user comment as a spam comment.
  • the prohibited word is a vocabulary containing a personal attack.
  • some users may add special symbols to the text of a keyword or comment when submitting a user's comment, thereby avoiding the detection of spam comments by the relevant platform. Therefore, special symbols such as " ⁇ ", “*”, “#”, “&”, etc. can be set as keywords, and the feature information is stored in the blacklist library.
  • the user may replace the original keyword with a homonym or a near sound to avoid the detection of spam comments on the relevant platform, such as "fishing people 3
  • the squid is lifted.
  • the galactic new a5a7a9 is used to raise the line. Therefore, for the above case including homophonic or near-tone, the homonym of the keyword can be set as the feature information to be stored in the blacklist library.
  • the user comment submitted by the user on a certain forum is “invoicing, adding Q (22222211)”, and when the second judging module 32 determines that the user comment contains the contact information in the blacklist library. When the information is matched, the user comment is determined as a spam comment.
  • a whitelist library can also be set.
  • the second determining module 32 may be further configured to determine whether the comment information in the user comment exists in the whitelist library, and if yes, determine the user comment as a non-spam comment; if not, the user comment may be Determined to be a spam comment.
  • the keyword may be a core term related to the product, and the keyword related to the product standard description may be stored in the whitelist library in advance, if the user is detected to submit the product. If the comment information does not contain any keyword in the product standard description, the user comment may be determined as a spam comment; if it is detected that the user submits the comment information for the product, any one or more of the product standard descriptions are included.
  • the user comment can be determined to be a non-spam comment.
  • the emotional words include the emotional vocabulary of the user's true will expressing their subjective opinions, attitudes, feelings, emotions, and the like.
  • the comments of the products are people's evaluations and arguments about the relevant parameters of the products and the purchasing experience. People can truly express their subjective opinions, attitudes and feelings through comments. , emotions, etc. Therefore, product reviews necessarily include the comments of the reviewer. The fewer the number of emotional words, the more likely it is to be a spam comment.
  • the first judging module 33 is configured to traverse the comment queue, and determine whether the number of comments in the comment queue that is the same as or similar to the user comment reaches a first threshold, wherein the comment queue is a first in first out queue and has a length Second threshold.
  • the number of comments in the comment queue that are the same as or similar to the user comment can be determined by detecting whether a comment queue contains the same or similar historical comment as the user comment. For example, when the comment information in the user comment does not exist in the blacklist library, there are a large number of historical comments in the comment queue that are the same as or similar to the content of the user comment, when the content is the same or similar to the comment information. When the number of comments reaches a certain threshold, it will also hinder the user's access to useful information. In fact, the user comments of the duplicate content can also be classified as spam comments.
  • the comment queue may be further detected to include a history comment that is the same as or similar to the user comment, and the first judging module 33 determines the user in the comment queue and the user. Whether the number of comments with the same or similar comments reaches the first threshold.
  • the comment queue is a FIFO queue composed of historical comments.
  • the comment similar to the user comment includes a historical comment that the similarity to the user comment reaches a third threshold. And determining, by the first determining module 33, whether the number of comments of the historical comment that the similarity of the user comment in the comment queue reaches the third threshold reaches a first threshold, to determine the context of the comment queue. Whether the number of comments of the same or similar user comments reaches the first threshold.
  • the magnitude of the similarity can be determined by comparing the degree of matching of the user comments with the information contained in the historical comments in the comment queue.
  • the third threshold may be 80%, and is determined to be similar when the user's comment matches the information contained in the historical comment in the comment queue by 90%; when the user comments and the historical comment in the comment queue When the matching degree of the information contained is 100%, it is determined to be the same.
  • the comment queue may include any one of a chained queue and an arrayed queue.
  • the queue is a linear table, and the data elements of the queue are also called queue elements. Inserting a queue element into the queue is called enqueue, and deleting a queue element from the queue becomes dequeued. Because the queue is only allowed to be inserted at one end, and deleted at the other end, that is, the element that first enters the queue can be deleted from the queue first, so the queue is also called first in first out (FIFO_first). In first out) Linear table. Therefore, the comment queue can be referred to as a first in first out queue.
  • the queue can be stored with the array Q[1...m], which is the maximum capacity allowed by the queue.
  • Q the maximum capacity allowed by the queue.
  • two pointers are required: head, the head of the team, pointing to the actual leader of the team; tail, the tail pointer, pointing to the next position of the actual tail element.
  • the initial value of the two pointers is set to 0, then the queue is empty and there are no elements.
  • the number of queue elements reaches the upper bound m of the array, when a new queue element is enqueued, the queue element that first entered the queue is deleted from the queue.
  • the queue can also be stored in a linked list, and the sequential relationship of the data in the mathematical logic is indicated by the pointer of the storage address of the element, thereby forming a chained queue, which can dynamically perform storage allocation.
  • the comment queue is an array queue
  • the second threshold of the length of the comment queue is the maximum capacity of the array queue, for example, 1000 user comments.
  • the comment queue may be rejected for updating. And adding 1 to the list of clicks recording the same historical comment as the user comment to indicate that there are other people posting the same or similar user comments as the content of the historical comment, or indicating that others agree with the historical comment. Content.
  • the first threshold is 5, and the number of comments in the comment queue that is the same as the user comment that the content is "a smog sensor is obtained soon" is 1, the first determining module 33 It is determined that the number of comments in the comment queue that are the same as or similar to the user comment does not reach the first threshold.
  • the first threshold is 5, and the content of the comment queue is "fishing people 3 squid lifting line Jia Weixin a5a7a9"
  • the user comment of the lesson line has the same number of comments of 7, and the first judging module 33 determines that the number of comments in the comment queue that are the same as or similar to the user comment has reached the first threshold.
  • the processing module 34 is configured to add the user comment to the comment queue as a team leader comment, and delete the team tail comment that overflows the second threshold.
  • the length of the FIFO queue can be preset to a second threshold.
  • the length can be represented by the total number of packets that can be accommodated in the array queue.
  • the array needs to be set to a fixed size before being established, that is, an appropriate byte length is set for each queue element to satisfy a single queue element.
  • the requirement for byte length can be understood as that each queue element represents a data packet, and each data packet has a fixed size. For example, if the array is N[1...1000], the second threshold is 1000.
  • the length of the FIFO queue can also be represented by the number of pointers of the storage units in the chain queue. The linked list does not need to allocate a fixed size storage space in advance. When data needs to be stored, one queue element can be set.
  • a suitable storage unit is used to store the data and link the storage unit to other storage units in the queue by pointers.
  • the content of the comment queue is changed in real time. For example, when there is a new user commenting into the queue in the area displayed in the comment area, the processing module 34 adds the user comment to the comment queue as the team leader comment.
  • the historical comments of the team's comments are queued, and the queue numbers of other historical reviews are incremented by one on the original basis.
  • the comment queue is updated, and the processing module 34 "fetches the user's comment” Smog sensor" Add to the leader No. 1 of the comment queue, and delete the history comment "Validation Sensor" located at the end of the comment queue No. 1000.
  • the number of the historical comment "Good Article! Like” originally numbered No. 1 is changed to No. 2, which is displayed in the display field of No. 2, and the remaining historical comments are moved backward by one display field.
  • the determining module 35 is configured to determine the user comment as a spam comment when it is determined that the number of comments in the comment queue that is the same as or similar to the user comment reaches a first threshold.
  • the processing module 34 can refuse to update the comment queue.
  • the processing module 34 refuses to update the comment queue.
  • the processing module 34 may also pop up a prompt box to remind the user of the prompt information that the comment information fails to be published. As shown in FIG. 6 , after the user clicks the “Comment” button, the pop-up content is a prompt box of “Comment review failed: spam comment!”, and refuses to update the comment queue, and the comment posted on the mobile phone interface is displayed. The comment area has not changed.
  • the detecting module 36 is configured to detect whether the user comment contains a contact mode when the user comment is determined to be a spam comment, and if yes, add the contact mode to the blacklist library as the feature information.
  • the detecting module 36 when the detecting module 36 detects that the user comment includes a contact, and the contact mode is a new contact mode, the new contact information extracted in the user comment is new. Added to the blacklist library as feature information. When the contact mode is the old contact mode, the original contact information in the blacklist library may be overwritten or not added to the blacklist library.
  • the new contact information is extracted and added to the blacklist library as feature information, as a detection basis of the next user comment.
  • the content is "fishing darling 3 squid lifting line Jia Weixin a5a7a9
  • the new contact mode "a5a7a9" in the user comment is extracted, and "a5a7a9" is added to the blacklist library as feature information.
  • the embodiment of the present application further provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor calls the computer stored in the memory
  • the program implements the information processing method provided by any of the embodiments of the present application. such as:
  • Traversing the comment queue determining whether the number of comments in the comment queue that are the same as or similar to the user comment reaches a first threshold, wherein the comment queue is a first in first out queue and the length has a second threshold;
  • the user comment is added to the comment queue, and the last comment of the comment queue is processed according to the second threshold.
  • FIG. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.
  • the computer device 400 can include radio frequency (RF, Radio) Circuit 401, memory 402 including one or more computer readable storage media, input unit 403, display unit 404, sensor 405, audio circuit 406, wireless fidelity (WiFi, Wireless)
  • the Fidelity module 407 includes a processor 408 having one or more processing cores, and a power supply 409 and the like. It will be understood by those skilled in the art that the computer device structure illustrated in FIG. 9 does not constitute a limitation to a computer device, and may include more or fewer components than those illustrated, or a combination of certain components, or different component arrangements.
  • the radio frequency circuit 401 can be used to transmit and receive information, or to receive and transmit signals during a call.
  • Memory 402 can be used to store applications and data.
  • the application stored in the memory 402 contains a computer program.
  • the input unit 403 can be configured to receive input digits, character information, or user characteristic information (such as fingerprints), and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function controls.
  • user characteristic information such as fingerprints
  • Display unit 404 can be used to display information entered by the user or information provided to the user, as well as various graphical user interfaces of the computer device, which can be constructed from graphics, text, icons, video, and any combination thereof.
  • the computer device may also include at least one type of sensor 405, such as a light sensor, motion sensor, and other sensors.
  • sensor 405 such as a light sensor, motion sensor, and other sensors.
  • the audio circuit 406 can provide an audio interface between the user and the computer device through a speaker, a microphone.
  • the Wireless Fidelity (WiFi) module 407 can be used for short-range wireless transmission, and can help users to send and receive emails, browse websites, and access streaming media. It provides users with wireless broadband Internet access.
  • the processor 408 is a control center of the computer device that links various portions of the entire computer device using various interfaces and lines, executes the computer device by running or executing an application stored in the memory 402, and recalling data stored in the memory 402. The various functions and processing of data to provide overall monitoring of computer equipment.
  • the computer device also includes a power source 409 (such as a battery) that powers the various components.
  • a power source 409 such as a battery
  • the computer device may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
  • the processor 408 in the computer device loads the computer program corresponding to the process of one or more applications into the memory 402 according to the following instructions, and is executed by the processor 408 to be stored in the memory.
  • the application in 402 performs the following steps:
  • Traversing the comment queue determining whether the number of comments in the comment queue that are the same as or similar to the user comment reaches a first threshold, wherein the comment queue is a first in first out queue and the length has a second threshold;
  • the user comment is added to the comment queue, and the last comment of the comment queue is processed according to the second threshold.
  • the processor 408 when the user comment is added to the comment queue and the tail comment of the comment queue is processed according to the second threshold, the processor 408 is configured to perform the following steps:
  • the user comment is added to the comment queue as a team leader comment, and the team tail comment that overflows the second threshold is deleted.
  • the processor 408 is further configured to perform the following steps:
  • the processor 408 when determining whether the comment information in the user comment exists in the blacklist library, the processor 408 is configured to perform the following steps:
  • the processor 408 is further configured to perform the following steps:
  • the user comment is determined to be a spam comment, it is detected whether the user comment contains a contact method, and if yes, the contact mode is added to the blacklist library.
  • the processor 408 is configured to perform the following steps: determining whether the number of comments in the comment queue that is the same as or similar to the user comment reaches a first threshold.
  • the information processing apparatus belongs to the same concept as the information processing method in the foregoing embodiment, and any method provided in the embodiment of the information processing method may be run on the information processing apparatus.
  • the specific implementation process is described in the embodiment of the information processing method, and details are not described herein again.
  • the computer program may be stored in a computer readable storage medium, such as in a memory of the computer device, and executed by at least one processor within the computer device, and may include, as described, information processing during execution The flow of an embodiment of the method.
  • the storage medium may be a magnetic disk, an optical disk, a read only memory (ROM, Read) Only Memory), random access memory (RAM, Random Access Memory), etc.
  • each functional module may be integrated into one processing chip, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated module if implemented in the form of a software functional module and sold or used as a standalone product, may also be stored in a computer readable storage medium, such as a read only memory, a magnetic disk or an optical disk, etc. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种信息处理方法、装置、存储介质及计算机设备;该信息处理方法包括:获取用户评论(S101),遍历评论队列,判断评论队列中与用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值(S102),若是,将所述用户评论确定为垃圾评论(S104);若否,将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理(S103)。

Description

信息处理方法、装置、存储介质及计算机设备
本申请要求于2017年01月13日提交中国专利局、申请号为201710026441.2、发明名称为“信息处理方法、装置及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及互联网技术领域,具体涉及一种信息处理方法、装置、存储介质及计算机设备。
背景技术
随着互联网技术的发展,用户可以通过网络在各个论坛、社区、网站等各种公共平台上发表相关评论。然而,基于公共平台的言论开放性,部分用户通过将广告评论、推销评论、其他含有不良影响的评论等垃圾评论大量发布到公共平台上,以此影响用户对有用信息的获取,且给用户带来不良影响。现有的计算机设备在使用的过程中,垃圾评论已经越来越困扰到用户,而如何有效识别垃圾评论已越来越受到业界的普遍关注。
技术问题
本申请实施例提供一种信息处理方法、装置、存储介质及计算机设备,可以提高信息处理效率。
技术解决方案
第一方面,本申请实施例提供一种信息处理方法,所述方法包括:
获取用户评论;
遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值;
若是,则将所述用户评论确定为垃圾评论;
若否,则将所述用户评论加入评论队列,并根据所述第二阈值对所述先进先出队列的队尾评论进行处理。
第二方面,本申请实施例还提供一种信息处理装置,所述装置包括:
获取模块,用于获取用户评论;
第一判断模块,用于遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值;
确定模块,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量达到第一阈值时,将所述用户评论确定为垃圾评论;
处理模块,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值时,将所述用户评论加入评论队列,并根据所述第二阈值对所述先进先出队列的队尾评论进行处理。
第三方面,本申请实施例提供了一种存储介质所述存储介质中存储有多条指令,所述指令适于由处理器加载以执行本申请实施例任一提供的信息处理方法。
第四方面,本申请实施例还提供一种计算机设备,包括存储器,处理器及存储在存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器调用所述存储器中存储的所述计算机程序,执行本申请任一实施例所述的信息处理方法。
有益效果
本申请实施例提供一种信息处理方法、装置、存储介质及计算机设备,可以提高信息处理效率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种信息处理方法的流程示意图。
图2为本申请实施例提供的一种信息处理方法的第一使用状态示意图。
图3为本申请实施例提供的一种信息处理方法的第二使用状态示意图。
图4为本申请实施例提供的一种信息处理方法的另一流程示意图。
图5为本申请实施例提供的一种信息处理方法的第三使用状态示意图。
图6为本申请实施例提供的一种信息处理方法的第四使用状态示意图。
图7为本申请实施例提供的一种信息处理装置的结构示意图。
图8为本申请实施例提供的一种信息处理装置的另一结构示意图。
图9为本申请实施例提供的一种计算机设备的结构示意图。
本发明的最佳实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。可以理解的是,此处所描述的具体实施例仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或模块的过程、方法、系统、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤或模块,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或模块。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
本申请实施例提供了一种信息处理方法,包括:
获取用户评论;
遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值;
若是,则将所述用户评论确定为垃圾评论;
若否,则将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理。
在一些实施方式中,所述将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理,包括:
将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。
在一些实施方式中,在所述获取用户评论之后,还包括:
判断所述用户评论中的评论信息是否存在黑名单库中,若是则将所述用户评论确定为垃圾评论;若否,则执行遍历评论队列的步骤。
在一些实施方式中,所述判断所述用户评论中的评论信息是否存在黑名单库中,包括:
判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,若是,则判定所述用户评论中的评论信息存在黑名单库中。
在一些实施方式中,所述方法还包括:
当所述用户评论确定为垃圾评论时,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中作为特征信息。
在一些实施方式中,所述与用户评论相似的评论包括与所述用户评论的相似度达到第三阈值的历史评论。
在一些实施方式中,所述特征信息包括用户名、用户ID、联系方式、关键字、关键字的谐音中的任意一种或者多种。
本申请实施例提供的一种信息处理方法的执行主体,可以为本申请实施例提供的一种信息处理装置,或者集成了所述信息处理装置的计算机设备(譬如台式电脑、笔记本、掌上电脑、平板电脑、智能手机等),所述信息处理装置可以采用硬件或者软件的方式实现。
请参阅图1,图1为本申请实施例提供的一种信息处理方法的流程示意图。所述方法包括:
步骤S101,获取用户评论。
步骤S102,遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且
长度具有第二阈值;若否,则执行步骤S103;若是,则执行步骤S104。
一些实施方式中,可以通过判断所述评论队列中存在的与所述用户评论的相似度达到第三阈值的历史评论的评论数量是否达到第一阈值,来判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值。
比如,当判断所述评论队列中存在的与所述用户评论的相似度达到第三阈值的历史评论的评论数量未达到第一阈值时,确定所述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值,则执行步骤S103。
当判断所述评论队列中存在的与所述用户评论的相似度达到第三阈值的历史评论的评论数量达到第一阈值时,确定所述评论队列中与所述用户评论相同或者相似的评论数量达到第一阈值,则执行步骤S104。
步骤S103,将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理。
一些实施方式中,可以将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。
步骤S104,将所述用户评论确定为垃圾评论。
为了更进一步理解本申请技术方案,请参阅图2及图3,图2为本申请实施例提供的一种信息处理方法的第一使用状态示意图,图3为本申请实施例提供的一种信息处理方法的第二使用状态示意图。
例如,如图2所示,在某一论坛中,“会飞的老虎”发表了标题为“手机里的传感器”的文章,用户“码农”在该论坛上提交内容为“争取早日再来个雾霾传感器”的用户评论。该论坛的服务器遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值;若是则将所述用户评论确定为垃圾评论;若否则将所述用户评论确定为非垃圾评论,则将所述用户评论加入评论队列,并根据所述第二阈值对所述先进先出队列的队尾评论进行处理。
例如,所述评论队列是先进先出队列且长度具有1000条。
如图3所示,当所述用户评论确定为非垃圾评论时,对评论队列进行更新, 将所述用户评论“争取早日再来个雾霾传感器”添加至所述评论队列中作为评论区所显示的队首评论,并将评论时间最早且溢出第1000条的队尾评论“求甲醛传感器。”进行删除。
本申请实施例通过获取用户评论,并遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值,若是,则将所述用户评论确定为垃圾评论;若否,则将所述用户评论加入评论队列,并根据所述第二阈值对所述先进先出队列的队尾评论进行处理。本申请实施例可以有效识别出垃圾评论,并且当用户评论识别为非垃圾评论时,仅需更新评论队列,避免对数据库中的所有内容进行处理,减轻系统运行负担,有效提高信息处理效率。
一些实施方式中,在所述获取用户评论之后,还包括:
判断所述用户评论中的评论信息是否存在黑名单库中,若是则将所述用户评论确定为垃圾评论。
一些实施方式中,所述判断所述用户评论中的评论信息是否存在黑名单库中,包括:
判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,若是则判定所述用户评论中的评论信息存在黑名单库中。
一些实施方式中,当所述用户评论确定为垃圾评论时,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中作为特征信息。
一些实施方式中,所述与所述用户评论相似的评论包括与所述用户评论的相似度达到第三阈值的历史评论。
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例,在此不再一一赘述。
请参阅图4,图4为本申请实施例提供的一种信息处理方法的另一流程示意图。所述方法包括:
步骤S201,获取用户评论。
例如,如图2所示,在某一论坛中,“会飞的老虎”发表了标题为“手机里的传感器”的文章,用户“码农”在该论坛上提交内容为“争取早日再来个雾霾传感器”的用户评论,该论坛的服务器从后台获取到该用户评论。
步骤S202,判断所述用户评论中的评论信息是否存在黑名单库中。若否,则执行步骤S203;若是,则执行步骤S205。
可以理解的是,所述用户评论中的评论信息可以包括用户名、用户ID、评论内容、评论发布时间等信息。
一些实施方式中,判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息。若是,则执行步骤S205;若否,则执行步骤S203。
目前,很多公共平台支持用户之间的互动行为,所述公共平台的表现形式可以包括电子商务平台、论坛、社区、网站、微博、贴吧、博客、应用下载平台等。比如,当用户在网站上注册并通过认证之后,拥有该网站的用户身份信息,成为该网站的用户,用户可以在网站中展示其用户行为,例如发布文章、发布产品、发微博、发帖、回复评论等,还可以对其他发布的信息进行评论、点赞等。针对上述评论内容,某些用户可能会发布大量内容相同或相近的垃圾评论,例如广告评论,推销评论,含有反动、暴力、色情、超链接、谩骂、诽谤等不良影响的评论。
可以理解的是,可以预先设置黑名单库,所述黑名单库中包含有多个特征信息。
一些实施方式中,所述特征信息包括用户名、用户ID、联系方式、关键字、关键字的谐音中的任意一种或者多种。
可以理解的是,所述联系方式的格式可以为字母和数字的组合,长度超过7个字节。比如电话号码、手机号码、微信号码、QQ号码。
例如,所述关键字可以包括超链接与广告词、违禁词、特殊符号等。
例如,用户提交的用户评论中包含有超链接与广告词,比如包括产品推销、店铺或网站推荐、公司宣传、业务推广等。所述超链接一般以网址形式出现,会出现多个连续英文字母字符,如http://...,将所述“http”字符设置为关键字,可以通过扫描用户评论中的关键字来检测是否含有超链接;若包含有超链接,即认为所述用户评论可能为垃圾评论,则进一步再判断是否包含有广告词。针对广告词,比如将QQ、特价、热卖、淘宝、包邮等词汇设置为所述公告词的关键字,还包括将任意数字与“元”的组合设置为特征信息。当用户评论中包含有所述关键字时,则确定所述用户评论中的评论信息存在黑名单库中,则执行步骤S205。
例如,所述违禁词为含有人身攻击的词汇。
例如,有些用户在提交用户评论时,可能会在关键字或者评论信息的文字中间加入特殊符号,以此避开相关平台的对垃圾评论的检测。因此,可以将“★”、“*”、“#”、“&”等特殊符号设置为关键字,作为特征信息存储到黑名单库中。
例如,用户可能用谐音或者近音代替原来的关键字,以此避开相关平台的对垃圾评论的检测,比如“捕鱼达人3 逋鱼提线 迦魏新a5a7a9 课提线”。因此针对上述包含有谐音或者近音的情形,可以将关键字的谐音设置为特征信息存储到黑名单库中。
例如,在某一论坛上用户提交的用户评论为“代开发票,加Q(22222211)”,检测到所述用户评论中包含有与黑名单库中的联系方式相匹配的信息,则执行步骤S205。比如,用户提交的用户评论为“深度好文,值得学习。”,检测到所述用户评论中包未含有与黑名单库中的特征信息相匹配的信息,则执行步骤S203。
如图2所示,在某一论坛中,“会飞的老虎”发表了标题为“手机里的传感器”的文章,用户“码农”在该论坛上提交了内容为“争取早日再来个雾霾传感器”的用户评论。当判断用户提交的用户评论中未包含有与黑名单库中的特征信息相匹配的信息时,则执行步骤S203。
请参阅图5,图5为本申请实施例提供的一种信息处理方法的第三使用状态示意图。
在某一论坛中,“会飞的老虎”发表了标题为“手机里的传感器”的文章,用户在该论坛上提交了内容为“捕鱼达人3 逋鱼提线 迦魏新a5a7a9 课提线”的用户评论。当判断用户提交的用户评论中包含有与黑名单库中的特征信息相匹配的信息时,则执行步骤S205。
在一些实施方式中,也可是设置白名单库,判断所述用户评论中的评论信息是否存在白名单库中,若是则可以将所述用户评论确定为非垃圾评论;若否则可以将所述用户评论确定为垃圾评论。
例如,针对产品的用户评论,与产品相关的用户评论通常归类为有用信息,因此可以通过筛选与产品描述相关的关联词,比如主题词或者情感词来确实是否为垃圾评论。例如以电子商务平台上发布的产品为例,所述主题词可以是与产品相关的核心名词,可以预先将关于产品标准描述的主题词存储到白名单库中,如果检测到用户针对该产品提交的评论信息中未含有产品标准描述中的任何主题词,则可以将所述用户评论确定为垃圾评论;如果检测到用户针对该产品提交的评论信息中含有产品标准描述中的任意一个或者多个主题词时,则可以将所述用户评论确定为非垃圾评论。
例如,所述情感词包括用户真实意愿的表达自己的主观性看法、态度、感觉、情绪等的情感词汇。比如以对某一网站销售的产品的评价为例,所述产品的评论是人们对产品相关参数及购买体验的评价和议论,人们通过评论可以真实的表达出自己的主观性看法、态度、感觉、情绪等。因此,产品评论必然包含评论者的情感。情感词词数越少,越有可能属于垃圾评论。
步骤S203,遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值;若否,则执行步骤S204;若是,则执行步骤S205。
可以理解的是,可以通过检测评论队列中是否包含有与所述用户评论相同或相似的历史评论,来确定所述评论队列中与所述用户评论相同或者相似的评论数量。
例如,当所述用户评论中的评论信息不存在黑名单库中时,所述评论队列中还存在大量与所述用户评论的内容相同或者相似的历史评论,当内容相同或相似的评论信息的评论数量达到某个阈值时,也会妨碍用户对有用信息的获取,实际上,该重复内容的用户评论也可以归为垃圾评论。因此为了更准确的识别出垃圾评论,可进一步检测评论队列中是否包含有与所述用户评论相同或者相似的历史评论,并判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值。其中,所述评论队列为由历史评论组成的先进先出队列。
一些实施方式中,所述与用户评论相似的评论包括与所述用户评论的相似度达到第三阈值的历史评论。可以通过判断所述评论队列中存在的与所述用户评论的相似度达到第三阈值的历史评论的评论数量是否达到第一阈值,来确定所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值。
比如,可以通过比对用户评论与评论队列中的历史评论中所含有的信息的匹配程度来确定出所述相似度的大小。比如,所述第三阈值可以为80%,当用户评论与评论队列中的历史评论中所含有的信息的匹配程度达到90%时,确定为相似;当用户评论与评论队列中的历史评论中所含有的信息的匹配程度达到100%时,确定为相同。
一些实施方式中,所述评论队列可以包括链式队列、数组队列中的任意一种。
可以理解的是,在程序设计语言中,队列是一种线性表,队列的数据元素又称为队列元素。在队列中插入一个队列元素称为入队,从队列中删除一个队列元素成为出队。因为队列只允许在一端插入,在另一端删除,即最早进入队列的元素才能最先从队列中删除,故队列又称为先进先出(FIFO—first in first out)线性表。因此,所述评论队列可以称为先进先出队列。
例如,队列可以用数组Q[1…m]来存储,数组的上界m即是队列所容许的最大容量。在队列的运算中需设两个指针:head,队首指针,指向实际队首元素;tail,队尾指针,指向实际队尾元素的下一个位置。一般情况下,两个指针的初值设为0,这时队列为空,没有元素。当队列元素的个数达到数组的上界m时,当有新的队列元素入队时,最早进入队列的队列元素从队列中删除。
例如,队列也可以用链表来存储,把数据在数学逻辑上的先后相邻关系用元素的存储地址的指针来指示,以此形成链式队列,可以动态地进行存储分配。
例如,所述评论队列为数组队列,则所述评论队列的长度所具有的第二阈值即为所述数组队列的最大容量,比如为1000条用户评论。
比如,当检测评论队列中包含有与所述用户评论相同的历史评论时,为了避免评论队列中多次出现重复内容的用户评论,进而影响用户的信息获取效率,可以拒绝对评论队列进行更新,并在记录与所述用户评论相同的历史评论的点赞数组上加1,以表示有其他人发表与所述历史评论的内容相同或相似的用户评论,或者表示有其他人赞同所述历史评论的内容。
如图2所示,比如所述第一阈值为5,所述评论队列中与内容为“争取早日再来个雾霾传感器”的用户评论相同的评论数量为1,则判定述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值,则执行步骤S204。
如图5所示,比如所述第一阈值为5,所述评论队列中与内容为“捕鱼达人3 逋鱼提线 迦魏新a5a7a9 课提线”的用户评论相同的评论数量为7,判定所述评论队列中与所述用户评论相同或者相似的评论数量已达到第一阈值,则执行步骤S205。
步骤S204,将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。
可以理解的是,所述先进先出队列的长度可以预设为第二阈值。所述长度可以用数组队列中的所能容纳的数据包总数来表示,数组在建立之前需提前设置为固定的大小,即为每个队列元素设置一个合适的字节长度,以满足单个队列元素对字节长度的需求,可以理解为每个队列元素代表一个数据包,每个数据包具有固定的大小,比如数组为N[1…1000],则所述第二阈值为1000个。所述先进先出队列的长度也可以用链式队列中的存储单元的指针个数来表示,链表不需要提前分配固定大小的存储空间,当需要存储数据时,可以为每个队列元素设置一个合适的存储单元用于存储数据,并将所述存储单元通过指针与队列中的其他的存储单元链接在一起。所述评论队列的内容是实时变化的,比如,在评论区所展示的区域有新的用户评论入队列时,将所述用户评论添加至评论队列中作为队首评论,作为队尾评论的历史评论则出队列,其他的历史评论的队列编号分别在原来的基础上加1。
如图3所示,所述评论队列中与所述用户评论相同或者相似的评论数量小于第一阈值时,对所述评论队列进行更新,将所述用户评论“争取早日再来个雾霾传感器” 添加至所述评论队列的队首No.1,并删除位于所述评论队列的队尾No.1000的历史评论“求甲醛传感器”。原来编号为No.1的历史评论“好文章!点赞”的编号变为No.2,其显示于编号No.2的显示栏位,其余历史评论均向后移动一个显示栏位。
步骤S205,将所述用户评论确定为垃圾评论。
可以理解的是,当确定所述用户评论为垃圾评论时,可以拒绝对评论队列进行更新。
请参阅图6,图6分别为本申请实施例提供的一种信息处理方法的第四使用状态示意图。
当确定所述内容为“捕鱼达人3 逋鱼提线 迦魏新a5a7a9 课提线”的用户评论为垃圾评论时,拒绝对评论队列进行更新。
一些实施方式中,在拒绝对评论队列进行更新时,还可以弹出提示框,以提醒用户其评论信息发表失败的提示信息。如图6所示,当用户点“评论”按钮之后,弹出内容为“评论审核未通过:为垃圾评论!”的提示框,同时拒绝对评论队列进行更新,所述手机界面上显示的发表评论的评论区没有变化。
步骤S206,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中作为特征信息。
一些实施方式中,当检测到所述用户评论中包含有联系方式时,且所述联系方式为新的联系方式时,将所述用户评论中提取到的新的联系方式新增至所述黑名单库中作为特征信息。当所述联系方式为旧的联系方式时,可以对所述黑名单库中原有的联系方式进行覆盖,或者不添加到所述黑名单库中。
可以理解的是,当所述用户评论中检测到新的联系方式时,提取所述新的联系方式,并新增至所述黑名单库中作为特征信息,以作为下一个用户评论的检测依据。
如图6所示,比如内容为“捕鱼达人3 逋鱼提线 迦魏新a5a7a9 课提线”的用户评论为垃圾评论时,提取所述用户评论中的新的联系方式“a5a7a9”,并将“a5a7a9”新增至所述黑名单库中作为特征信息。
本申请实施例通过检测用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,以确定所述用户评论是否为垃圾评论,在所述用户评论为非垃圾评论时,遍历评论队列,且在判断所述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值时,将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。本申请实施例在识别出用户提交的用户评论为非垃圾评论时,进一步检测内容重复的历史评论,以此排除避免加入大量重复内容的用户评论,减轻系统运行负担,有效提高信息处理效率,提升用户获取有用信息的效率。
本申请实施例还提供一种信息处理装置,包括:
获取模块,用于获取用户评论;
第一判断模块,用于遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值;
确定模块,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量达到第一阈值时,将所述用户评论确定为垃圾评论;
处理模块,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值时,将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理。
一些实施方式中,所述处理模块,用于将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。
一些实施方式中,所述装置还包括:
第二判断模块,用于在所述获取模块获取用户评论之后判断所述用户评论中的评论信息是否存在黑名单库中,若是则将所述用户评论确定为垃圾评论;
所述第一判断模块,用于在所述第二判断模块判断为否时,遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值。
一些实施方式中,所述第二判断模块,用于判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,若是,则判定所述用户评论中的评论信息存在黑名单库中。
一些实施方式中,所述装置还包括:
检测模块,用于当所述用户评论确定为垃圾评论时,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中作为特征信息。
一些实施方式中,所述与用户评论相似的评论包括与所述用户评论的相似度达到第三阈值的历史评论。
本申请实施例还提供一种信息处理装置,如图7所示,图7为本申请实施例提供的一种信息处理装置的结构示意图。所述信息处理装置30包括获取模块31,第一判断模块33,处理模块34,以及确定模块35。
其中,所述获取模块31,用于获取用户评论。
所述第一判断模块33,用于遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值。
所述确定模块35,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量达到第一阈值时,将所述用户评论确定为垃圾评论。
所述处理模块34,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值时,将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理。
请参阅图8,图8为本申请实施例提供的一种信息处理装置的另一结构示意图。所述信息处理装置30包括获取模块31,第二判断模块32,第一判断模块33,处理模块34,确定模块35,以及检测模块36。
其中所述获取模块31,用于获取用户评论。
例如,如图2所示,在某一论坛中,“会飞的老虎”发表了标题为“手机里的传感器”的文章,用户“码农”在该论坛上提交内容为“争取早日再来个雾霾传感器”的用户评论,所述获取模块31获取到该用户评论。
所述第二判断模块32,用于在所述获取模块31获取用户评论之后,判断所述用户评论中的评论信息是否存在黑名单库中,若是则将所述用户评论确定为垃圾评论。
可以理解的是,所述用户评论中的评论信息可以包括用户名、用户ID、评论内容、评论发布时间等信息。
一些实施方式中,所述第二判断模块32,用于判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,若是则判定所述用户评论中的评论信息存在黑名单库中。
目前,很多公共平台支持用户之间的互动行为,所述公共平台的表现形式可以包括电子商务平台、论坛、社区、网站、微博、贴吧、博客、应用下载平台等。比如,当用户在网站上注册并通过认证之后,拥有该网站的用户身份信息,成为该网站的用户,用户可以在网站中展示其用户行为,例如发布文章、发布产品、发微博、发帖、回复评论等,还可以对其他发布的信息进行评论、点赞等。针对上述评论内容,某些用户可能会发布大量内容相同或相近的垃圾评论,例如广告评论,推销评论,含有反动、暴力、色情、超链接、谩骂、诽谤等不良影响的评论。
可以理解的是,可以预先设置黑名单库,所述黑名单库中包含有多个特征信息。
一些实施方式中,所述特征信息包括用户名、用户ID、联系方式、关键字、关键字的谐音中的任意一种或者多种。
可以理解的是,所述联系方式的格式可以为字母和数字的组合,长度超过7个字节。比如电话号码、手机号码、微信号码、QQ号码。
例如,所述关键字可以包括超链接与广告词、违禁词、特殊符号等。
例如,用户提交的用户评论中包含有超链接与广告词,比如包括产品推销、店铺或网站推荐、公司宣传、业务推广等。所述超链接一般以网址形式出现,会出现多个连续英文字母字符,如http://...,将所述“http”字符设置为关键字,可以通过扫描用户评论中的关键字来检测是否含有超链接;若包含有超链接,即认为所述用户评论可能为垃圾评论,则进一步再判断是否包含有广告词。针对广告词,比如将QQ、特价、热卖、淘宝、包邮等词汇设置为所述公告词的关键字,还包括将任意数字与“元”的组合设置为特征信息。当用户评论中包含有所述关键字时,则所述第二判断模块32判定所述用户评论中的评论信息存在黑名单库中,则将所述用户评论确定为垃圾评论。
例如,所述违禁词为含有人身攻击的词汇。
例如,有些用户在提交用户评论时,可能会在关键字或者评论信息的文字中间加入特殊符号,以此避开相关平台的对垃圾评论的检测。因此,可以将“★”、“*”、“#”、“&”等特殊符号设置为关键字,作为特征信息存储到黑名单库中。
例如,用户可能用谐音或者近音代替原来的关键字,以此避开相关平台的对垃圾评论的检测,比如“捕鱼达人3 逋鱼提线 迦魏新a5a7a9 课提线”。因此针对上述包含有谐音或者近音的情形,可以将关键字的谐音设置为特征信息存储到黑名单库中。
例如,在某一论坛上用户提交的用户评论为“代开发票,加Q(22222211)”,当所述第二判断模块32判定所述用户评论中包含有与黑名单库中的联系方式相匹配的信息时,则将所述用户评论确定为垃圾评论。
如图5所示,在某一论坛中,“会飞的老虎”发表了标题为“手机里的传感器”的文章,用户在该论坛上提交了内容为“捕鱼达人3 逋鱼提线 迦魏新a5a7a9 课提线”的用户评论。当所述第二判断模块32判定用户提交的用户评论中包含有与黑名单库中的特征信息相匹配的信息时,将所述用户评论确定为垃圾评论。
在一些实施方式中,也可是设置白名单库。所述第二判断模块32,也可以用于判断所述用户评论中的评论信息是否存在白名单库中,若是则可以将所述用户评论确定为非垃圾评论;若否则可以将所述用户评论确定为垃圾评论。
例如,针对产品的用户评论,与产品相关的用户评论通常归类为有用信息,因此可以通过筛选与产品描述相关的关联词,比如主题词或者情感词来确实是否为垃圾评论。例如以电子商务平台上发布的产品为例,所述主题词可以是与产品相关的核心名词,可以预先将关于产品标准描述的主题词存储到白名单库中,如果检测到用户针对该产品提交的评论信息中未含有产品标准描述中的任何主题词,则可以将所述用户评论确定为垃圾评论;如果检测到用户针对该产品提交的评论信息中含有产品标准描述中的任意一个或者多个主题词时,则可以将所述用户评论确定为非垃圾评论。
例如,所述情感词包括用户真实意愿的表达自己的主观性看法、态度、感觉、情绪等的情感词汇。比如以对某一网站销售的产品的评价为例,所述产品的评论是人们对产品相关参数及购买体验的评价和议论,人们通过评论可以真实的表达出自己的主观性看法、态度、感觉、情绪等。因此,产品评论必然包含评论者的情感。情感词词数越少,越有可能属于垃圾评论。
所述第一判断模块33,用于遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值。
可以理解的是,可以通过检测评论队列中是否包含有与所述用户评论相同或相似的历史评论,来确定所述评论队列中与所述用户评论相同或者相似的评论数量。例如,当所述用户评论中的评论信息不存在黑名单库中时,所述评论队列中还存在大量与所述用户评论的内容相同或者相似的历史评论,当内容相同或相似的评论信息的评论数量达到某个阈值时,也会妨碍用户对有用信息的获取,实际上,该重复内容的用户评论也可以归为垃圾评论。因此为了更准确的识别出垃圾评论,可进一步检测评论队列中是否包含有与所述用户评论相同或者相似的历史评论,并通过所述第一判断模块33判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值。其中,所述评论队列为由历史评论组成的先进先出队列。
一些实施方式中,所述与用户评论相似的评论包括与所述用户评论的相似度达到第三阈值的历史评论。可以通过所述第一判断模块33判断所述评论队列中存在的与所述用户评论的相似度达到第三阈值的历史评论的评论数量是否达到第一阈值,来确定所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值。
比如,可以通过比对用户评论与评论队列中的历史评论中所含有的信息的匹配程度来确定出所述相似度的大小。比如,所述第三阈值可以为80%,当用户评论与评论队列中的历史评论中所含有的信息的匹配程度达到90%时,确定为相似;当用户评论与评论队列中的历史评论中所含有的信息的匹配程度达到100%时,确定为相同。
一些实施方式中,所述评论队列可以包括链式队列、数组队列中的任意一种。
可以理解的是,在程序设计语言中,队列是一种线性表,队列的数据元素又称为队列元素。在队列中插入一个队列元素称为入队,从队列中删除一个队列元素成为出队。因为队列只允许在一端插入,在另一端删除,即最早进入队列的元素才能最先从队列中删除,故队列又称为先进先出(FIFO—first in first out)线性表。因此,所述评论队列可以称为先进先出队列。
例如,队列可以用数组Q[1…m]来存储,数组的上界m即是队列所容许的最大容量。在队列的运算中需设两个指针:head,队首指针,指向实际队首元素;tail,队尾指针,指向实际队尾元素的下一个位置。一般情况下,两个指针的初值设为0,这时队列为空,没有元素。当队列元素的个数达到数组的上界m时,当有新的队列元素入队时,最早进入队列的队列元素从队列中删除。
例如,队列也可以用链表来存储,把数据在数学逻辑上的先后相邻关系用元素的存储地址的指针来指示,以此形成链式队列,可以动态地进行存储分配。
例如,所述评论队列为数组队列,则所述评论队列的长度所具有的第二阈值即为所述数组队列的最大容量,比如为1000条用户评论。
比如,当检测评论队列中包含有与所述用户评论相同的历史评论时,为了避免评论队列中多次出现重复内容的用户评论,进而影响用户的信息获取效率,可以拒绝对评论队列进行更新,并在记录与所述用户评论相同的历史评论的点赞数组上加1,以表示有其他人发表与所述历史评论的内容相同或相似的用户评论,或者表示有其他人赞同所述历史评论的内容。
如图2所示,比如所述第一阈值为5,所述评论队列中与内容为“争取早日再来个雾霾传感器”的用户评论相同的评论数量为1,则所述第一判断模块33判定述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值。
如图5所示,比如所述第一阈值为5,所述评论队列中与内容为“捕鱼达人3 逋鱼提线 迦魏新a5a7a9 课提线”的用户评论相同的评论数量为7,所述第一判断模块33判定所述评论队列中与所述用户评论相同或者相似的评论数量已达到第一阈值。
所述处理模块34,用于将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。
可以理解的是,所述先进先出队列的长度可以预设为第二阈值。所述长度可以用数组队列中的所能容纳的数据包总数来表示,数组在建立之前需提前设置为固定的大小,即为每个队列元素设置一个合适的字节长度,以满足单个队列元素对字节长度的需求,可以理解为每个队列元素代表一个数据包,每个数据包具有固定的大小,比如数组为N[1…1000],则所述第二阈值为1000个。所述先进先出队列的长度也可以用链式队列中的存储单元的指针个数来表示,链表不需要提前分配固定大小的存储空间,当需要存储数据时,可以为每个队列元素设置一个合适的存储单元用于存储数据,并将所述存储单元通过指针与队列中的其他的存储单元链接在一起。所述评论队列的内容是实时变化的,比如,在评论区所展示的区域有新的用户评论入队列时,所述处理模块34将所述用户评论添加至评论队列中作为队首评论,作为队尾评论的历史评论则出队列,其他的历史评论的队列编号分别在原来的基础上加1。
如图3所示,所述评论队列中与所述用户评论相同或者相似的评论数量小于第一阈值时,对所述评论队列进行更新,所述处理模块34将所述用户评论“争取早日再来个雾霾传感器” 添加至所述评论队列的队首No.1,并删除位于所述评论队列的队尾No.1000的历史评论“求甲醛传感器”。原来编号为No.1的历史评论“好文章!点赞”的编号变为No.2,其显示于编号No.2的显示栏位,其余历史评论均向后移动一个显示栏位。
所述确定模块35,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量达到第一阈值时,将所述用户评论确定为垃圾评论。
可以理解的是,当所述确定模块35确定所述用户评论为垃圾评论时,所述处理模块34可以拒绝对评论队列进行更新。
如图6所示,当所述确定模块35确定所述内容为“捕鱼达人3 逋鱼提线 迦魏新a5a7a9 课提线”的用户评论为垃圾评论时,所述处理模块34拒绝对评论队列进行更新。
一些实施方式中,所述处理模块34在拒绝对评论队列进行更新时,还可以弹出提示框,以提醒用户其评论信息发表失败的提示信息。如图6所示,当用户点“评论”按钮之后,弹出内容为“评论审核未通过:为垃圾评论!”的提示框,同时拒绝对评论队列进行更新,所述手机界面上显示的发表评论的评论区没有变化。
所述检测模块36,用于当所述用户评论确定为垃圾评论时,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中作为特征信息。
一些实施方式中,当所述检测模块36检测到所述用户评论中包含有联系方式时,且所述联系方式为新的联系方式时,将所述用户评论中提取到的新的联系方式新增至所述黑名单库中作为特征信息。当所述联系方式为旧的联系方式时,可以对所述黑名单库中原有的联系方式进行覆盖,或者不添加到所述黑名单库中。
可以理解的是,当所述用户评论中检测到新的联系方式时,提取所述新的联系方式,并新增至所述黑名单库中作为特征信息,以作为下一个用户评论的检测依据。
如图6所示,比如内容为“捕鱼达人3 逋鱼提线 迦魏新a5a7a9 课提线”的用户评论为垃圾评论时,提取所述用户评论中的新的联系方式“a5a7a9”,并将“a5a7a9”新增至所述黑名单库中作为特征信息。
本申请实施例还提供一种计算机设备,包括存储器,处理器及存储在存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器调用所述存储器中存储的所述计算机程序实现本申请实施例任一提供的信息处理方法。比如:
获取用户评论;
遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值;
若是,则将所述用户评论确定为垃圾评论;
若否,则将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理。
本申请实施例还提供一种计算机设备,如图9所示,图9为本申请实施例提供的一种计算机设备的结构示意图。该计算机设备400可以包括射频(RF,Radio Frequency)电路401、包括有一个或一个以上计算机可读存储介质的存储器402、输入单元403、显示单元404、传感器405、音频电路406、无线保真(WiFi,Wireless Fidelity)模块407、包括有一个或者一个以上处理核心的处理器408、以及电源409等部件。本领域技术人员可以理解,图9中示出的计算机设备结构并不构成对计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
射频电路401可用于收发信息,或通话过程中信号的接收和发送。
存储器402可用于存储应用程序和数据。存储器402存储的应用程序中包含有计算机程序。
输入单元403可用于接收输入的数字、字符信息或用户特征信息(比如指纹),以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
显示单元404可用于显示由用户输入的信息或提供给用户的信息以及计算机设备的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。
计算机设备还可包括至少一种传感器405,比如光传感器、运动传感器以及其他传感器。
音频电路406可通过扬声器、传声器提供用户与计算机设备之间的音频接口。
无线保真(WiFi)模块407可用于短距离无线传输,可以帮助用户收发电子邮件、浏览网站和访问流式媒体等,它为用户提供了无线的宽带互联网访问。
处理器408是计算机设备的控制中心,利用各种接口和线路链接整个计算机设备的各个部分,通过运行或执行存储在存储器402内的应用程序,以及调用存储在存储器402内的数据,执行计算机设备的各种功能和处理数据,从而对计算机设备进行整体监控。
计算机设备还包括给各个部件供电的电源409(比如电池)。
尽管图9中未示出,计算机设备还可以包括摄像头、蓝牙模块等,在此不再赘述。
具体在本实施例中,计算机设备中的处理器408会按照如下的指令,将一个或一个以上的应用程序的进程对应的计算机程序加载到存储器402中,并由处理器408来运行存储在存储器402中的应用程序,执行如下步骤:
获取用户评论;
遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值;
若是,则将所述用户评论确定为垃圾评论;
若否,则将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理。
一些实施方式中,在将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理时,处理器408用于执行如下步骤:
将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。
一些实施方式中,在获取用户评论之后,处理器408还用于执行如下步骤:
判断所述用户评论中的评论信息是否存在黑名单库中,若是则将所述用户评论确定为垃圾评论;若否,则执行遍历评论队列的步骤。
一些实施方式中,在判断所述用户评论中的评论信息是否存在黑名单库中时,处理器408用于执行如下步骤:
判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,若是则确定所述用户评论中的评论信息存在黑名单库中。
一些实施方式中,处理器408还用于执行如下步骤:
当所述用户评论确定为垃圾评论时,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中。
一些实施方式中,在判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,处理器408用于执行如下步骤:
判断所述评论队列中存在的与所述用户评论的相似度达到第三阈值的历史评论的评论数量是否达到第一阈值,若是则确定所述评论队列中与所述用户评论相同或者相似的评论数量达到第一阈值。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
本申请实施例中,所述信息处理装置与上文实施例中的一种信息处理方法属于同一构思,在所述信息处理装置上可以运行所述信息处理方法实施例中提供的任一方法,其具体实现过程详见所述信息处理方法实施例,此处不再赘述。
需要说明的是,对本申请所述信息处理方法而言,本领域普通测试人员可以理解实现本申请实施例所述信息处理方法的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,所述计算机程序可存储于一计算机可读取存储介质中,如存储在计算机设备的存储器中,并被该计算机设备内的至少一个处理器执行,在执行过程中可包括如所述信息处理方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)等。
对本申请实施例的所述信息处理装置而言,其各功能模块可以集成在一个处理芯片中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中,所述存储介质譬如为只读存储器,磁盘或光盘等。
以上对本申请实施例所提供的一种信息处理方法、装置、存储介质及计算机设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的技术方案及其核心思想;本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例的技术方案的范围。

Claims (20)

  1. 一种信息处理方法,其中,所述方法包括:
    获取用户评论;
    遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值;
    若是,则将所述用户评论确定为垃圾评论;
    若否,则将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理。
  2. 根据权利要求1所述的信息处理方法,其中,所述将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理,包括:
    将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。
  3. 根据权利要求1所述的信息处理方法,其中,在所述获取用户评论之后,还包括:
    判断所述用户评论中的评论信息是否存在黑名单库中,若是则将所述用户评论确定为垃圾评论;若否,则执行遍历评论队列的步骤。
  4. 根据权利要求3所述的信息处理方法,其中,所述判断所述用户评论中的评论信息是否存在黑名单库中,包括:
    判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,若是,则判定所述用户评论中的评论信息存在黑名单库中。
  5. 根据权利要求4所述的信息处理方法,其中,所述方法还包括:
    当所述用户评论确定为垃圾评论时,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中作为特征信息。
  6. 根据权利要求1所述的信息处理方法,其中,所述与用户评论相似的评论包括与所述用户评论的相似度达到第三阈值的历史评论。
  7. 根据权利要求4所述的信息处理方法,其中,所述特征信息包括用户名、用户ID、联系方式、关键字、关键字的谐音中的任意一种或者多种。
  8. 一种信息处理装置,其中,所述装置包括:
    获取模块,用于获取用户评论;
    第一判断模块,用于遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值;
    确定模块,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量达到第一阈值时,将所述用户评论确定为垃圾评论;
    处理模块,用于当判断所述评论队列中与所述用户评论相同或者相似的评论数量未达到第一阈值时,将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理。
  9. 根据权利要求8所述的信息处理装置,其中,所述处理模块,用于将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。
  10. 根据权利要求8所述的信息处理装置,其中,所述装置还包括:
    第二判断模块,用于在所述获取模块获取用户评论之后判断所述用户评论中的评论信息是否存在黑名单库中,若是则将所述用户评论确定为垃圾评论;
    所述第一判断模块,用于在所述第二判断模块判断为否时,遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值。
  11. 根据权利要求8所述的信息处理装置,其中,所述第二判断模块,用于判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,若是,则判定所述用户评论中的评论信息存在黑名单库中。
  12. 根据权利要求10所述的信息处理装置,其中,所述装置还包括:
    检测模块,用于当所述用户评论确定为垃圾评论时,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中作为特征信息。
  13. 根据权利要求7所述的信息处理装置,其中,所述与用户评论相似的评论包括与所述用户评论的相似度达到第三阈值的历史评论。
  14. 一种存储介质,其中,所述存储介质中存储有多条指令,所述指令适于由处理器加载以执行如权利要求1-6任一项所述的方法。
  15. 一种计算机设备,包括存储器,处理器及存储在存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器调用所述存储器中存储的所述计算机程序,执行以下步骤:
    获取用户评论;
    遍历评论队列,判断所述评论队列中与所述用户评论相同或者相似的评论数量是否达到第一阈值,其中所述评论队列是先进先出队列且长度具有第二阈值;
    若是,则将所述用户评论确定为垃圾评论;
    若否,则将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理。
  16. 根据权利要求15所述的计算机设备,其中,在将所述用户评论加入评论队列,并根据所述第二阈值对所述评论队列的队尾评论进行处理时,所述处理器,用于执行如下步骤:
    将所述用户评论加入所述评论队列作为队首评论,并将溢出所述第二阈值的队尾评论进行删除。
  17. 根据权利要求15所述的计算机设备,其中,在所述获取用户评论之后,所述处理器,还用于执行如下步骤:
    判断所述用户评论中的评论信息是否存在黑名单库中,若是则将所述用户评论确定为垃圾评论;若否,则执行遍历评论队列的步骤。
  18. 根据权利要求17所述的计算机设备,其中,在判断所述用户评论中的评论信息是否存在黑名单库中时,所述处理器,用于执行如下步骤:
    判断所述用户评论中是否包含有与黑名单库中的特征信息相匹配的信息,若是,则判定所述用户评论中的评论信息存在黑名单库中。
  19. 根据权利要求17所述的计算机设备,其中,所述处理器,还用于执行如下步骤:
    当所述用户评论确定为垃圾评论时,检测所述用户评论中是否含有联系方式,若是则将所述联系方式添加到黑名单库中作为特征信息。
  20. 根据权利要求17所述的计算机设备,其中,所述与用户评论相似的评论包括与所述用户评论的相似度达到第三阈值的历史评论。
PCT/CN2017/107191 2017-01-13 2017-10-21 信息处理方法、装置、存储介质及计算机设备 WO2018129978A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710026441.2A CN106777341A (zh) 2017-01-13 2017-01-13 信息处理方法、装置及计算机设备
CN201710026441.2 2017-01-13

Publications (1)

Publication Number Publication Date
WO2018129978A1 true WO2018129978A1 (zh) 2018-07-19

Family

ID=58945583

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/107191 WO2018129978A1 (zh) 2017-01-13 2017-10-21 信息处理方法、装置、存储介质及计算机设备

Country Status (2)

Country Link
CN (1) CN106777341A (zh)
WO (1) WO2018129978A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241377A (zh) * 2020-01-02 2020-06-05 华数传媒网络有限公司 具备审核功能的现场直播实时评论系统

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777341A (zh) * 2017-01-13 2017-05-31 广东欧珀移动通信有限公司 信息处理方法、装置及计算机设备
CN109933775B (zh) * 2017-12-15 2022-02-18 腾讯科技(深圳)有限公司 Ugc内容处理方法及装置
CN110020057B (zh) * 2017-12-29 2021-05-25 中国移动通信集团陕西有限公司 一种垃圾评论信息识别方法及装置
CN110175851B (zh) * 2019-02-28 2023-09-12 腾讯科技(深圳)有限公司 一种作弊行为检测方法及装置
CN112507146A (zh) * 2020-11-27 2021-03-16 北京达佳互联信息技术有限公司 信息处理方法、装置、电子设备及存储介质
CN114245163B (zh) * 2021-12-15 2023-06-09 四川启睿克科技有限公司 一种过滤机器人弹幕的方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159704A (zh) * 2007-10-23 2008-04-09 浙江大学 基于微内容相似度的反垃圾方法
CN102315953A (zh) * 2010-06-29 2012-01-11 百度在线网络技术(北京)有限公司 基于帖子的出现规律来检测垃圾帖子的方法及设备
US20140122584A1 (en) * 2012-10-25 2014-05-01 Google, Inc. Soft posting to social activity streams
CN104050195A (zh) * 2013-03-15 2014-09-17 北京暴风科技股份有限公司 一种广告贴处理方法和系统
CN106777341A (zh) * 2017-01-13 2017-05-31 广东欧珀移动通信有限公司 信息处理方法、装置及计算机设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226576A (zh) * 2013-04-01 2013-07-31 杭州电子科技大学 基于语义相似度的垃圾评论过滤方法
CN104869467B (zh) * 2015-03-26 2018-09-28 腾讯科技(北京)有限公司 媒体播放中的信息输出方法、装置和系统
CN104933191A (zh) * 2015-07-09 2015-09-23 广东欧珀移动通信有限公司 一种基于贝叶斯算法的垃圾评论识别方法、系统及终端

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159704A (zh) * 2007-10-23 2008-04-09 浙江大学 基于微内容相似度的反垃圾方法
CN102315953A (zh) * 2010-06-29 2012-01-11 百度在线网络技术(北京)有限公司 基于帖子的出现规律来检测垃圾帖子的方法及设备
US20140122584A1 (en) * 2012-10-25 2014-05-01 Google, Inc. Soft posting to social activity streams
CN104050195A (zh) * 2013-03-15 2014-09-17 北京暴风科技股份有限公司 一种广告贴处理方法和系统
CN106777341A (zh) * 2017-01-13 2017-05-31 广东欧珀移动通信有限公司 信息处理方法、装置及计算机设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241377A (zh) * 2020-01-02 2020-06-05 华数传媒网络有限公司 具备审核功能的现场直播实时评论系统
CN111241377B (zh) * 2020-01-02 2023-05-26 华数传媒网络有限公司 具备审核功能的现场直播实时评论系统

Also Published As

Publication number Publication date
CN106777341A (zh) 2017-05-31

Similar Documents

Publication Publication Date Title
WO2018129978A1 (zh) 信息处理方法、装置、存储介质及计算机设备
WO2016167424A1 (ko) 답변 추천 장치, 자동 문장 완성 시스템 및 방법
WO2014025186A1 (en) Method for providing message function and electronic device thereof
WO2020138928A1 (en) Information processing method, apparatus, electrical device and readable storage medium
WO2017150860A1 (en) Predicting text input based on user demographic information and context information
WO2018082484A1 (zh) 一种电子设备的截屏方法、系统及电子设备
WO2017043857A1 (ko) 어플리케이션 제공 방법 및 이를 위한 전자 기기
WO2017041484A1 (zh) 一种实时信息的推荐方法、装置和系统
WO2018076819A1 (zh) 数据的上传方法、装置、存储介质、电子设备及系统
WO2019151804A1 (ko) 추천 어플리케이션을 실행하기 위한 전자 장치 및 그의 동작 방법
WO2018166199A1 (zh) 定位精度等级调整方法、装置、存储介质及电子设备
WO2015072670A1 (en) System and method of sharing profile image card for communication
WO2018076818A1 (zh) 数据的备份方法、装置、电子设备、存储介质及系统
EP2965183A1 (en) Computing system with contextual interaction mechanism and method of operation thereof
WO2018076811A1 (zh) 数据分享方法、装置、存储介质及电子设备
EP3523710A1 (en) Apparatus and method for providing sentence based on user input
WO2015002386A1 (en) Method for restoring an autocorrected character and electronic device thereof
WO2018101671A1 (en) Apparatus and method for providing sentence based on user input
WO2018084581A1 (en) Method and apparatus for filtering a plurality of messages
WO2016188285A1 (zh) 一种终端应用的进程管理方法及装置
WO2020130447A1 (ko) 페르소나에 기반하여 문장을 제공하는 방법 및 이를 지원하는 전자 장치
WO2018062974A1 (en) Electronic device and method thereof for managing notifications
WO2019019217A1 (zh) 基于双面打印机的智能广告系统及方法
WO2017206892A1 (zh) 一种移动终端的传感器处理方法、装置、存储介质及电子设备
EP3523932A1 (en) Method and apparatus for filtering a plurality of messages

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17891713

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17891713

Country of ref document: EP

Kind code of ref document: A1