CN112231484A - News comment auditing method, system, device and storage medium - Google Patents

News comment auditing method, system, device and storage medium Download PDF

Info

Publication number
CN112231484A
CN112231484A CN202011305016.5A CN202011305016A CN112231484A CN 112231484 A CN112231484 A CN 112231484A CN 202011305016 A CN202011305016 A CN 202011305016A CN 112231484 A CN112231484 A CN 112231484A
Authority
CN
China
Prior art keywords
comment
picture
comments
elements
illegal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011305016.5A
Other languages
Chinese (zh)
Other versions
CN112231484B (en
Inventor
谢宇
贺弘联
孔泽平
周珞
陈光林
王炫
张训
汤军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Red Net New Media Group Co ltd
Original Assignee
Hunan Red Net New Media Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Red Net New Media Group Co ltd filed Critical Hunan Red Net New Media Group Co ltd
Priority to CN202011305016.5A priority Critical patent/CN112231484B/en
Publication of CN112231484A publication Critical patent/CN112231484A/en
Application granted granted Critical
Publication of CN112231484B publication Critical patent/CN112231484B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application relates to a news comment auditing method, system, device and storage medium. The method comprises the steps of obtaining comments initiated by a user side, and identifying texts and pictures in the comments; extracting characters and elements in the picture, identifying and judging whether the elements contain violation elements, and if so, removing the picture; semantic monitoring is carried out on the text and the picture characters, if the monitoring result is sensitive comment, irrigation comment or overstrain comment, the comment is judged to be an illegal comment, the comment is removed, and the comment is recorded into a database; if the comment cannot be judged to be the illegal comment, the user condition of the comment-making user is obtained, and the comment is further judged according to the user condition so as to determine whether the comment is illegal. The method and the device have the effects of increasing the auditing efficiency and reducing the passing rate of the illegal comments.

Description

News comment auditing method, system, device and storage medium
Technical Field
The present application relates to the field of review, and in particular, to a method, a system, an apparatus, and a storage medium for review of news reviews.
Background
With the development of the information age, the network propagation speed is faster and faster, online posting, news watching, microblog swiping and the like become popular activities of contemporary people, and meanwhile, contemporary people are also interested in postings, news or microblog message comments and expression viewpoints.
Meanwhile, unfair comments such as control comment, malicious screen swiping, political comment and the like appear on the internet, and the news content and information dissemination are adversely affected.
The existing news comment review is generally manually reviewed by background managers, when review comment contents are too much or the review content is overweight, many illegal or reactive yellow comment contents are taken as normal comments and displayed on a user side interface through review by the managers, and great social safety and content safety hidden dangers are caused.
Disclosure of Invention
In order to increase the auditing strength and reduce the passing rate of illegal comments, the application provides a news comment auditing method, system, device and storage medium.
In a first aspect, the news comment auditing method provided by the application adopts the following technical scheme:
a news review auditing method comprises the following steps:
obtaining comments initiated by a user side, and identifying texts and pictures in the comments;
extracting characters and elements in the picture, identifying and judging whether the elements contain violation elements, and if so, removing the picture;
semantic monitoring is carried out on the text and the picture characters, if the monitoring result is sensitive comment, irrigation comment or overstrain comment, the comment is judged to be an illegal comment, the comment is removed, and the comment is recorded into the database;
if the comment cannot be judged to be the illegal comment, acquiring the user condition of the comment publishing user, and further judging the comment according to the user condition to determine whether the comment is illegal.
By adopting the technical scheme, after a user comments on news contents, the server acquires comments initiated by the user, identifies a text part and a picture part in the comments, identifies an element part contained in the picture, judges the element, and if the picture contains illegal elements, such as political-oriented elements or erotic violence elements, and the like, judges that the picture is the illegal picture, deletes the picture, and avoids the bad influence of the picture in the news comments; then extracting the character part in the picture and the character part in the comment, carrying out semantic monitoring on the characters, if the character part of the comment appears as a sensitive comment, a water-filling comment or an exciting comment, the comment is deleted and recorded in the database, so that the next similar comment is avoided, the comment is directly deleted without judging, therefore, the illegal comment is prevented from appearing in the comment area of the news content, and if the comment which cannot be judged by the server appears, according to the user who posts the comment, judging the comment, if most comments of the user are illegal comments recently, judging the comment to be illegal comment, avoiding the comment from appearing in a comment area of news content, therefore, bad public opinion influence is caused, review strength and efficiency of comments can be further enhanced, and passing rate of illegal comments is reduced.
The present invention in a preferred example may be further configured to: the semantic monitoring of the text and the picture words comprises:
preparing corresponding preset number of sample words for various emotion words, and calibrating the emotion category corresponding to each sample word;
training a recognition model by using the sample words;
performing word segmentation processing on the text and the picture characters;
and inputting the text and the picture characters after the word segmentation processing into the recognition model for recognition, and outputting a first recognition result.
By adopting the technical scheme, training is performed according to a plurality of sample words, emotion types of each word are marked, wherein the emotion types comprise normal comments, positive comments, negative comments and aggressive comments, the aggressive comments are illegal comments, word segmentation processing is performed on text and picture character parts in the comments, the segmented words are input into the recognition model, the emotion types of the text parts and the picture character parts of the comments are obtained, if the emotion types are the aggressive comments, the comments are illegal comments, the comments need to be deleted, and the comments are prevented from appearing in a comment area of news content and causing bad influence.
The present invention in a preferred example may be further configured to: the semantic monitoring of the text and the picture words further comprises:
matching the respective words separated with the sample words;
identifying the successfully matched words and obtaining the emotion types of the successfully matched words;
acquiring the emotion types with the most occurrence times, and outputting a second identification result;
and if the first identification result is inconsistent with the second identification result, sending the comment to an auditor terminal so that the auditor can audit the comment by using the auditor terminal.
By adopting the technical scheme, the divided words are matched with the sample words, the corresponding emotion types are marked on the sample words, the emotion types of the words which are successfully matched are selected, the emotion type with the most emotion types is obtained, the emotion type is judged to be the second identification result of the comment, the first identification result is compared with the second identification result, if the first identification result is inconsistent with the second identification result, manual review needs to be carried out on the comment, the comment is higher in accuracy through the comparison of the first identification result and the second identification result, the emotion judgment on the comment is further accurate through the manual review, and the bad experience of a user due to the fact that too many wrong deletion conditions occur is avoided.
The present invention in a preferred example may be further configured to: the sending the comment to the auditor side comprises:
sending the comments to an auditor end so that the auditor end can manually audit the comments;
if the comment is judged to be an overstimulated comment, acquiring a word with the emotional category marked by the auditor side as overstimulated;
supplementing the terms into the recognition model and removing the comments.
By adopting the technical scheme, when the first recognition result is inconsistent with the second recognition result, the auditor is required to judge the comment by using the auditor end so as to enhance the emotion judgment accuracy of the comment, if the overexcited word appears in the comment, the auditor extracts the overexcited word, supplements the overexcited word to the recognition model, and further trains the recognition model, so that the recognition effect of the recognition model is more accurate, and the probability of recognition errors is reduced.
The present invention in a preferred example may be further configured to: the semantic monitoring of the text and the picture words comprises:
performing word segmentation processing on the comments;
extracting all the participles to be matched with the historical comments of the database;
and if the matching quantity exceeds a preset value, judging the comment to be a watering comment.
By adopting the technical scheme, the comments are matched with the historical comments of the database, if the number of words in the comments is large and exceeds a preset value, the comments are similar to the previously published comments, the comments are judged to be irrigation comments, the comments need to be deleted, and the phenomenon that the user maliciously reviews the news content is avoided.
The present invention in a preferred example may be further configured to: preparing training pictures containing enough various elements, and labeling the training pictures with categories;
training a recognition model by using the training picture so that the recognition model outputs the corresponding category according to the training picture;
inputting the picture into the recognition model, and recognizing the category of elements contained in the picture;
and identifying the corresponding relation among the elements, judging whether the elements are illegal elements, and removing the picture if the elements are illegal elements.
By adopting the technical scheme, when the comments contain the pictures, the server trains the recognition models in a large quantity to improve the recognition accuracy of the recognition models, so that the categories of all elements contained in the pictures are recognized, and if the elements contained in the pictures contain political induction elements, the pictures need to be deleted, so that bad public opinion influence is avoided; meanwhile, if the element combination appearing in the picture has the effect of political induction, the picture may be deleted, and the picture is prevented from appearing in a comment area to cause bad public opinion influence.
The present invention in a preferred example may be further configured to: the obtaining of the user condition of the user who makes the comment and the further judgment of the comment according to the user condition comprise:
obtaining the user condition for posting the comment, wherein the user condition comprises the comment recently posted by the comment;
extracting the monitoring results of the recent comments of the user, and selecting the category with the most occurrence times in the monitoring results;
and if the category with the most occurrence times is sensitive comment, irrigation comment or aggressive comment, judging the comment to be an illegal comment, and removing the comment.
By adopting the technical scheme, if the condition that the comment cannot be judged occurs, the emotion type of the comment is judged according to the emotion type of the comment recently published by the user, if most of the comments recently published by the user are aggressive comments, the comment is judged to be aggressive, the comment is further judged by using the method, the comment auditing strength is improved, and the passing rate of the inappropriate comment is reduced.
In a second aspect, the news comment auditing system provided by the application adopts the following technical scheme:
a news review system comprising:
the device comprises an acquisition device, a display device and a display device, wherein the acquisition device is used for acquiring comments initiated by a user side and identifying texts and pictures in the comments;
the recognition device is used for extracting characters and elements in the picture, recognizing and judging whether the elements contain violation elements or not, and if so, removing the picture;
the monitoring device is used for carrying out semantic monitoring on the text and the picture characters, judging that the comment is an illegal comment if a monitoring result is a sensitive comment, a watering comment or an overstimulated comment, removing the comment and recording the comment into the database;
and the judging device is used for acquiring the user condition of the user who published the comment if the comment cannot be judged to be the illegal comment, and further judging the comment according to the user condition so as to determine whether the comment is illegal.
By adopting the technical scheme, after a user comments on news contents, the server acquires comments initiated by the user, identifies a text part and a picture part in the comments, identifies an element part contained in the picture, judges the element, and if the picture contains illegal elements, such as political-oriented elements or erotic violence elements, and the like, judges that the picture is the illegal picture, deletes the picture, and avoids the bad influence of the picture in the news comments; then extracting the character part in the picture and the character part in the comment, carrying out semantic monitoring on the characters, if the character part of the comment appears as a sensitive comment, a water-filling comment or an exciting comment, the comment is deleted and recorded in the database, so that the next similar comment is avoided, the comment is directly deleted without judging, therefore, the illegal comment is prevented from appearing in the comment area of the news content, and if the comment which cannot be judged by the server appears, according to the user who posts the comment, judging the comment, if most comments of the user are illegal comments recently, judging the comment to be illegal comment, avoiding the comment from appearing in a comment area of news content, therefore, bad public opinion influence is caused, review strength and efficiency of comments can be further enhanced, and passing rate of illegal comments is reduced.
In a third aspect, the news comment auditing device provided by the application adopts the following technical scheme;
a news review auditing apparatus, comprising:
the acquisition module is used for acquiring comments initiated by a user side and identifying texts and pictures in the comments;
the recognition module is used for extracting characters and elements in the picture, recognizing and judging whether the elements contain violation elements or not, and if so, removing the picture;
the monitoring module is used for carrying out semantic monitoring on the text and the picture characters, judging that the comment is an illegal comment if a monitoring result is a sensitive comment, a watering comment or an overstimulated comment, removing the comment and recording the comment into the database;
and the judging module is used for acquiring the user condition of the user who published the comment if the comment cannot be judged to be the illegal comment, and further judging the comment according to the user condition so as to determine whether the comment is illegal.
By adopting the technical scheme, after a user comments on news contents, the server acquires comments initiated by the user, identifies a text part and a picture part in the comments, identifies an element part contained in the picture, judges the element, and if the picture contains illegal elements, such as political-oriented elements or erotic violence elements, and the like, judges that the picture is the illegal picture, deletes the picture, and avoids the bad influence of the picture in the news comments; then extracting the character part in the picture and the character part in the comment, carrying out semantic monitoring on the characters, if the character part of the comment appears as a sensitive comment, a water-filling comment or an exciting comment, the comment is deleted and recorded in the database, so that the next similar comment is avoided, the comment is directly deleted without judging, therefore, the illegal comment is prevented from appearing in the comment area of the news content, and if the comment which cannot be judged by the server appears, according to the user who posts the comment, judging the comment, if most comments of the user are illegal comments recently, judging the comment to be illegal comment, avoiding the comment from appearing in a comment area of news content, therefore, bad public opinion influence is caused, review strength and efficiency of comments can be further enhanced, and passing rate of illegal comments is reduced.
In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program that can be loaded by a processor and used for executing any of the above methods for reviewing news reviews.
In summary, the present application includes at least one of the following beneficial technical effects:
1. in the scheme, the server acquires the comment initiated by the user, performs semantic analysis on the comment, identifies the picture in the comment, deletes the picture if the picture is illegal, removes the comment if the text part in the comment is illegal, so that the review strength of the comment is enhanced, the illegal comment is reduced, the system is displayed after the comment is reviewed, the illegal comment is prevented from being deleted after the comment is displayed in a comment area, and the negative influence caused by the illegal comment is reduced;
2. in the scheme, when the server performs semantic analysis on the comment, the server performs different analysis twice on the emotion type of the comment, and compares different analysis results twice, so that the accuracy of emotion analysis on the comment is further enhanced;
3. in the scheme, if the emotion type of the comment cannot be accurately judged, the emotion type of the recent comment of the user of the comment is analyzed, the comment is judged according to the most emotion types in the recent comment of the user, and the auditing strength is further increased.
Drawings
Fig. 1 is a schematic flow chart in the first embodiment of the present application.
Fig. 2 is a block diagram of the second embodiment of the present application.
Fig. 3 is a schematic diagram of a system in a third embodiment of the present application.
Detailed Description
The present application is described in further detail below with reference to figures 1-3.
The present embodiment is only for explaining the present invention, and it is not limited to the present invention, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present invention.
The first embodiment is as follows:
a news review method, referring to fig. 1, includes:
101. and obtaining comments initiated by a user side, and identifying texts and pictures in the comments.
Specifically, after a user initiates a comment at a user side, the server obtains the comment, wherein the comment at least comprises a text part. And after the server acquires the comment, identifying a text part in the comment, and if the comment contains a picture, acquiring a picture part in the comment. Preferably, the word number requirement of the comment issued by the user side is set, namely, the number of the characters of the comment is at least more than 5, so that the number of meaningless comments is reduced. The user terminal can be a mobile phone terminal or a computer terminal.
102. Extracting characters and elements in the picture, identifying and judging whether the elements contain violation elements, and if so, removing the picture.
Specifically, the server extracts characters and elements in the picture, and pre-processes the picture, wherein the pre-processes mainly comprise graying, binaryzation, noise extraction, inclination correction and the like; then, the picture is divided, the characters after inclination correction are projected to the Y axis, and all values are accumulated to obtain a histogram on the Y axis; the image is divided into a plurality of small images, the characteristic vectors extracted from characters scanned by each part of the image are subjected to template rough classification and template fine matching with a characteristic template library, and the character part in the image is identified, so that the server can analyze the emotion category of the character part in the image, and whether the image is deleted is judged.
Further, preparing enough training pictures containing various elements, and labeling the training pictures with categories;
training a recognition model by using the training picture so that the recognition model outputs the corresponding category according to the training picture;
inputting the picture into the recognition model, and recognizing the category of elements contained in the picture;
and identifying the corresponding relation among the elements, judging whether the elements are illegal elements, and removing the picture if the elements are illegal elements.
Specifically, the server performs a large amount of training on the recognition model, specifically prepares a training picture containing various elements in advance, where the elements may specifically be elements having political induction factors and sub-elements possibly forming the political induction factors, for example, the containing elements are a torch and a national flag, and if the positions of the torch elements and the national flag elements coincide, the combining elements are determined to be the elements having the political induction factors; further, the element may specifically be a pornographic element, a violent element, a two-dimensional code factor, or the like.
All training samples are labeled with categories, which can be national flags, pornography, violence, two-dimensional codes, torch and the like. The number of the training pictures is at least 1000 in each category, and further, if a new violation element appears in the comment picture, the picture can be used as the training picture, and the recognition model is further trained, so that the training accuracy is improved.
Inputting the pictures in the comment into an identification model, identifying the elements in the pictures, specifically applying a YOLO algorithm, dividing the pictures into grids, and predicting class probability and bounding box of each grid, for example, taking a 100x100 image as an example, dividing it into grids, such as 7x 7; for each mesh, the network predicts the probability that a bounding box corresponds to each element (the "flag", "pornography", "violence", "two-dimensional code", "torch", etc.).
And further judging picture elements according to the position relation of each element, namely whether the elements are overlapped, if the positions of the element of the 'torch' and the element of the 'national flag' are overlapped, judging that the element in the picture is an illegal element, and removing the picture. Besides the judgment of the basic elements of the picture, the combination of a plurality of elements in the picture is also judged, the auditing strength of the picture is further enhanced, and the picture is prevented from appearing in a comment area to cause bad public opinion influence.
103. And performing semantic monitoring on the text and the picture characters, if the monitoring result is sensitive comment, irrigation comment or over-excited comment, judging that the comment is an illegal comment, removing the comment, and recording the comment into the database.
Specifically, character parts in the picture and texts in the comments are extracted, and semantic monitoring is performed on the characters and the texts in the picture, wherein the semantic monitoring specifically includes emotion analysis on the texts, similarity analysis on the texts and sensitive word monitoring on the texts. The method mainly comprises the steps of monitoring sensitive words and red marked words on problems, wherein the sensitive words represent names of national leaders, leaders of national important institutions, song leaders of provinces, cities and counties, leaders of institutions, certain sensitive events and the like, and the red marked words represent words with sensitive political tendency, violence tendency and unhealthy colors or non-civilized words and the like.
Specifically, a monitoring word list is established, and monitoring words are supplemented in the monitoring word list, wherein the monitoring words can be extracted from a large amount of news text data, novel text data and magazine text data; the method comprises the steps of performing word segmentation on text contents in comments, matching the text contents with monitoring words in a monitoring word list, judging the comments to be sensitive comments if the monitoring words in the monitoring word list appear, removing the comments, recording the comments into a database, directly judging the comments to be illegal comments if similar comments appear later, and further marking a user who issues the comments after the illegal comments are recorded into the database to indicate that the user issues one illegal comment.
Further, preparing corresponding sample words with preset quantity for various emotion words, and calibrating the emotion category corresponding to each sample word;
training a recognition model by using the sample words;
performing word segmentation processing on the text and the picture characters;
and inputting the text and the picture characters after the word segmentation processing into the recognition model for recognition, and outputting a first recognition result.
Specifically, the server obtains sample words of various preset emotional words, wherein each type of emotional word is one thousand, the types of the emotional words can be positive comments, negative comments, aggressive comments and common comments, the sample words can be continuously supplemented, and the identification accuracy is relatively higher when the number of the sample words is more; calibrating the categories of all sample words; and continuously training the recognition model by utilizing the sample words to ensure that the recognition result of the recognition model is continuously accurate, performing word segmentation processing on the text and the picture words in the comment by using the server, inputting the word-segmented text and the picture words into the recognition model, recognizing the categories of the text and the picture words, and outputting a first recognition result, wherein the first recognition result is the emotion category of the text and the picture words of the comment.
The format of the picture is not limited.
Further, matching each word separated with the sample word;
identifying the successfully matched words and obtaining the emotion types of the successfully matched words;
acquiring the emotion types with the most occurrence times, and outputting a second identification result;
and if the first identification result is inconsistent with the second identification result, sending the comment to an auditor terminal so that the auditor can audit the comment by using the auditor terminal.
Specifically, after segmenting words of the text and the picture characters of the comment, matching all the segmented words with sample words, selecting the emotion type of each word, and screening out the emotion type with the most occurrence times, wherein the emotion type is a second identification result which is the primary judgment on the emotion types of the text and the picture characters of the comment; and comparing the second identification result with the second identification result, if the second identification result is consistent with the second identification result, judging that the emotion types of the comment text and the picture characters are identification results, and if the second identification result is inconsistent with the second identification result, judging the comment again, namely sending the comment to an auditor side, so that the auditor can manually audit the comment. The auditor terminal may be a computer terminal or a mobile phone terminal.
Further, the comments are sent to an auditor side, so that the auditor side can manually audit the comments;
if the comment is judged to be an overstimulated comment, acquiring a word with the emotional category marked by the auditor side as overstimulated;
supplementing the terms into the recognition model and removing the comments.
Specifically, when the first identification result is inconsistent with the second identification result, the server sends the comment to an auditor terminal, so that the auditor uses the auditor terminal to manually audit the comment, and if the comment is judged to be a positive comment, a negative comment or a common comment, the comment is displayed in a comment area of news content; if the comment is judged to be the overexcitation comment, words with the emotion types of being the overexcitation in the comment are extracted and are supplemented into the recognition model, so that the recognition model is trained according to the words, the recognition accuracy of the recognition model is further improved, the workload of an auditing end is reduced, and the intellectualization of the server is further improved.
Further, performing word segmentation processing on the comments;
extracting all the participles to be matched with the historical comments of the database;
and if the matching quantity exceeds a preset value, judging the comment to be a watering comment.
Specifically, the comments are segmented, each user has an account corresponding to the user, historical comments of each account are recorded in the database, the segmentation of new comments is matched with the historical comments in the database, if the matching quantity exceeds a preset value, specifically, if the comment segmentation is 10 words, if 80% of the 10 words in the comment are the same as at least one comment in the historical comments, the comment is judged to be an irrigation comment, the comment needs to be removed, the situation that the user maliciously swipes the news content is avoided, and the reading experience of the user is improved.
104. If the comment cannot be judged to be the illegal comment, acquiring the user condition of the comment publishing user, and further judging the comment according to the user condition to determine whether the comment is illegal.
Specifically, if the comment cannot be judged to be an illegal comment, the user condition of the user is obtained, the user condition is specifically the recent comment of the user, the registration date of the user and the community dynamics of the user, if the recent comment of the user is the illegal comment, the comment of the user is judged to be the illegal comment, the user is in an over-excited or violent emotion in a near stage and is difficult to change the attitude suddenly, the user can be further subjected to number sealing treatment by using positive or normal psychological comments, the comment of the user is forbidden, the number sealing time can be evaluated according to the category of the recent comment of the user, if the user is mostly irrigated comments, the user is judged to be maliciously refreshed, and the user is forbidden for 1 year; if the user is overexcited with comments recently, exempting from the responsibility and identifying that the user is not good in recent mental state, and forbidding the user 1 for worship time to cool and quiet the user for a period of time; if the user is sensitive comment recently, the user is determined to have abnormal political ideas, the user account is forbidden permanently, and bad public opinion influence caused by the user comment is avoided. Furthermore, the IP address of the computer end is fixed according to the registered IP address of the user, and if the user applies for an account number by using the computer end, all account numbers applied for the IP address are simultaneously forbidden, so that the condition that the user creates a plurality of account numbers to illegally comment news contents is avoided.
Further, obtaining the user condition for posting the comment, wherein the user condition comprises the comment recently posted by the comment;
extracting the monitoring results of the recent comments of the user, and selecting the category with the most occurrence times in the monitoring results;
and if the category with the most occurrence times is sensitive comment, irrigation comment or aggressive comment, judging the comment to be an illegal comment, and removing the comment.
Specifically, the user condition of the comment is obtained, wherein the comment can be a comment recently issued by the user, the recent comments of the user are recorded in the database, the recent comment of the user is retrieved from the database, monitoring results of the recent comments are extracted, the monitoring results are specifically 'aggressive comment', 'positive comment', 'negative comment', 'common comment', 'irrigation comment' or 'sensitive comment', the monitoring result with the largest number of occurrences of the monitoring result in the recent comments is selected, and if the category with the largest number of occurrences is 'aggressive comment', 'irrigation comment', 'sensitive comment', the item-using theory is judged to be an illegal comment, and the comment is removed. Therefore, the auditing strength is further increased, and the condition that illegal comments appear in the comment area is avoided, wherein the comment area can be a display area below news content in a computer-side interface or a mobile phone-side interface.
The implementation principle of the embodiment is as follows:
after a user comments on news contents, a server acquires comments initiated by the user, identifies a text part and a picture part in the comments, identifies an element part contained in the picture, judges the element, and if the picture contains illegal elements, such as elements with political guidance or elements with erotic violence and the like, judges that the picture is the illegal picture, deletes the picture and avoids the bad influence caused by the picture appearing in the news comments;
extracting character parts in the pictures and character parts in the comments, performing semantic monitoring on the characters, deleting the comments if the character parts of the comments appear as sensitive comments, irrigation comments or over-excited comments, and recording the comments into a database, so that similar comments are avoided next time, the comments do not need to be judged again, and the comments can be directly deleted, so that illegal comments are prevented from appearing in the comment area of news content;
if the comments which cannot be judged by the server appear, the comments are judged according to the user who issues the comments, if most of the comments of the user in the near future are illegal comments, the comments are judged to be illegal comments, the comments are prevented from appearing in the comment area of news content, and therefore the influence of bad public opinion is caused, the review strength and efficiency of the comments can be further enhanced, and the passing rate of the illegal comments is reduced.
Example two:
a news review system, referring to fig. 2, comprising:
the obtaining device 201 obtains the comment initiated by the user end, and identifies the text and the picture in the comment.
The recognition device 202 extracts characters and elements in the picture, recognizes and judges whether the elements include violation elements, and if so, removes the picture.
Further, preparing enough training pictures containing various elements, and labeling the training pictures with categories;
training a recognition model by using the training picture so that the recognition model outputs the corresponding category according to the training picture;
inputting the picture into the recognition model, and recognizing the category of elements contained in the picture;
and identifying the corresponding relation among the elements, judging whether the elements are illegal elements, and removing the picture if the elements are illegal elements.
And the monitoring device 203 is used for performing semantic monitoring on the text and the picture characters, judging that the comment is an illegal comment if a monitoring result is a sensitive comment, a watering comment or an aggressive comment, removing the comment, and recording the comment into the database.
Further, preparing corresponding sample words with preset quantity for various emotion words, and calibrating the emotion category corresponding to each sample word;
training a recognition model by using the sample words;
performing word segmentation processing on the text and the picture characters;
and inputting the text and the picture characters after the word segmentation processing into the recognition model for recognition, and outputting a first recognition result.
Further, matching each word separated with the sample word;
identifying the successfully matched words and obtaining the emotion types of the successfully matched words;
acquiring the emotion types with the most occurrence times, and outputting a second identification result;
and if the first identification result is inconsistent with the second identification result, sending the comment to an auditor terminal so that the auditor can audit the comment by using the auditor terminal.
Further, the comments are sent to an auditor side, so that the auditor side can manually audit the comments;
if the comment is judged to be an overstimulated comment, acquiring a word with the emotional category marked by the auditor side as overstimulated;
supplementing the terms into the recognition model and removing the comments.
Further, performing word segmentation processing on the comments;
extracting all the participles to be matched with the historical comments of the database;
and if the matching quantity exceeds a preset value, judging the comment to be a watering comment.
If the comment cannot be judged to be an illegal comment, the judgment device 204 acquires the user condition of the comment user, and further judges the comment according to the user condition to determine whether the comment is illegal.
Further, obtaining the user condition for posting the comment, wherein the user condition comprises the comment recently posted by the comment;
extracting the monitoring results of the recent comments of the user, and selecting the category with the most occurrence times in the monitoring results;
and if the category with the most occurrence times is sensitive comment, irrigation comment or aggressive comment, judging the comment to be an illegal comment, and removing the comment.
Example three:
a news review system, referring to fig. 3, comprising:
the obtaining module 301 obtains a comment initiated by a user, and identifies a text and a picture in the comment.
The identification module 302 extracts characters and elements in the picture, identifies and judges whether the elements include violation elements, and if so, removes the picture.
Preparing training pictures containing enough various elements, and labeling the training pictures with categories;
training a recognition model by using the training picture so that the recognition model outputs the corresponding category according to the training picture;
inputting the picture into the recognition model, and recognizing the category of elements contained in the picture;
and identifying the corresponding relation among the elements, judging whether the elements are illegal elements, and removing the picture if the elements are illegal elements.
The monitoring module 303 performs semantic monitoring on the text and the picture characters, determines that the comment is an illegal comment if the monitoring result is a sensitive comment, a water-filling comment or an aggressive comment, removes the comment, and records the comment into the database.
Further, preparing corresponding sample words with preset quantity for various emotion words, and calibrating the emotion category corresponding to each sample word;
training a recognition model by using the sample words;
performing word segmentation processing on the text and the picture characters;
and inputting the text and the picture characters after the word segmentation processing into the recognition model for recognition, and outputting a first recognition result.
Further, matching each word separated with the sample word;
identifying the successfully matched words and obtaining the emotion types of the successfully matched words;
acquiring the emotion types with the most occurrence times, and outputting a second identification result;
and if the first identification result is inconsistent with the second identification result, sending the comment to an auditor terminal so that the auditor can audit the comment by using the auditor terminal.
Further, the comments are sent to an auditor side, so that the auditor side can manually audit the comments;
if the comment is judged to be an overstimulated comment, acquiring a word with the emotional category marked by the auditor side as overstimulated;
supplementing the terms into the recognition model and removing the comments.
Further, performing word segmentation processing on the comments;
extracting all the participles to be matched with the historical comments of the database;
and if the matching quantity exceeds a preset value, judging the comment to be a watering comment.
If the comment cannot be judged to be an illegal comment, the judgment module 304 acquires the user condition of the comment publishing user, and further judges the comment according to the user condition to determine whether the comment is illegal.
Further, obtaining the user condition for posting the comment, wherein the user condition comprises the comment recently posted by the comment;
extracting the monitoring results of the recent comments of the user, and selecting the category with the most occurrence times in the monitoring results;
and if the category with the most occurrence times is sensitive comment, irrigation comment or aggressive comment, judging the comment to be an illegal comment, and removing the comment.
It should be noted that: when the device and the system for reviewing the news comments provided by the above embodiments execute the method for reviewing the news comments, only the division of the above functional modules is taken as an example, in practical application, the above functions can be distributed and completed by different functional modules according to needs, that is, the device and the internal structure of the device are divided into different functional modules, so as to complete all or part of the above described functions. In addition, the embodiments of the method, the system and the device for reviewing news comments provided by the embodiments belong to the same concept, and specific implementation processes thereof are described in the embodiments of the method for reviewing news comments, which are not described herein again.
It will be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
The non-volatile memory may be ROM, Programmable Read Only Memory (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), or flash memory.
Volatile memory can be RAM, which acts as external cache memory. There are many different types of RAM, such as Static Random Access Memory (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synclink DRAM (SLDRAM), and direct memory bus RAM.
The processor mentioned in any of the above may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the programs of the news review method. The processing module and the storage module may be decoupled, and are respectively disposed on different physical devices, and are connected in a wired or wireless manner to implement respective functions of the processing module and the storage module, so as to support the system chip to implement various functions in the foregoing embodiments. Alternatively, the processing module and the memory may be coupled to the same device.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a computer-readable storage medium, which includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned computer-readable storage media comprise: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A news review auditing method is characterized by comprising the following steps:
obtaining comments initiated by a user side, and identifying texts and pictures in the comments;
extracting characters and elements in the picture, identifying and judging whether the elements contain violation elements, and if so, removing the picture;
semantic monitoring is carried out on the text and the picture characters, if the monitoring result is sensitive comment, irrigation comment or overstrain comment, the comment is judged to be an illegal comment, the comment is removed, and the comment is recorded into the database;
if the comment cannot be judged to be the illegal comment, acquiring the user condition of the comment publishing user, and further judging the comment according to the user condition to determine whether the comment is illegal.
2. The method of claim 1, wherein the semantically monitoring the text and picture text comprises:
preparing corresponding preset number of sample words for various emotion words, and calibrating the emotion category corresponding to each sample word;
training a recognition model by using the sample words;
performing word segmentation processing on the text and the picture characters;
and inputting the text and the picture characters after the word segmentation processing into the recognition model for recognition, and outputting a first recognition result.
3. The method of claim 2, wherein the semantically monitoring the text and picture text further comprises:
matching the respective words separated with the sample words;
identifying the successfully matched words and obtaining the emotion types of the successfully matched words;
acquiring the emotion types with the most occurrence times, and outputting a second identification result;
and if the first identification result is inconsistent with the second identification result, sending the comment to an auditor terminal so that the auditor can audit the comment by using the auditor terminal.
4. The method of claim 3, wherein the sending the comment to an auditor side comprises:
sending the comments to an auditor end so that the auditor end can manually audit the comments;
if the comment is judged to be an overstimulated comment, acquiring a word with the emotional category marked by the auditor side as overstimulated;
supplementing the terms into the recognition model and removing the comments.
5. The method of claim 1, wherein the semantically monitoring the text and picture text comprises:
performing word segmentation processing on the comments;
extracting all the participles to be matched with the historical comments of the database;
and if the matching quantity exceeds a preset value, judging the comment to be a watering comment.
6. The method according to claim 1, wherein the extracting words and elements in the picture, identifying and determining whether the elements contain violation elements, and if so, removing the picture comprises:
preparing training pictures containing enough various elements, and labeling the training pictures with categories;
training a recognition model by using the training picture so that the recognition model outputs the corresponding category according to the training picture;
inputting the picture into the recognition model, and recognizing the category of elements contained in the picture;
and identifying the corresponding relation among the elements, judging whether the elements are illegal elements, and removing the picture if the elements are illegal elements.
7. The method of claim 1, wherein the obtaining of the user condition of the user who published the comment, and the further determination of the comment according to the user condition comprises:
obtaining the user condition for posting the comment, wherein the user condition comprises the comment recently posted by the comment;
extracting the monitoring results of the recent comments of the user, and selecting the category with the most occurrence times in the monitoring results;
and if the category with the most occurrence times is sensitive comment, irrigation comment or aggressive comment, judging the comment to be an illegal comment, and removing the comment.
8. A news review system, comprising:
the device comprises an acquisition device, a display device and a display device, wherein the acquisition device is used for acquiring comments initiated by a user side and identifying texts and pictures in the comments;
the recognition device is used for extracting characters and elements in the picture, recognizing and judging whether the elements contain violation elements or not, and if so, removing the picture;
the monitoring device is used for carrying out semantic monitoring on the text and the picture characters, judging that the comment is an illegal comment if a monitoring result is a sensitive comment, a watering comment or an overstimulated comment, removing the comment and recording the comment into the database;
and the judging device is used for acquiring the user condition of the user who published the comment if the comment cannot be judged to be the illegal comment, and further judging the comment according to the user condition so as to determine whether the comment is illegal.
9. A news review auditing apparatus, comprising:
the acquisition module is used for acquiring comments initiated by a user side and identifying texts and pictures in the comments;
the recognition module is used for extracting characters and elements in the picture, recognizing and judging whether the elements contain violation elements or not, and if so, removing the picture;
the monitoring module is used for carrying out semantic monitoring on the text and the picture characters, judging that the comment is an illegal comment if a monitoring result is a sensitive comment, a watering comment or an overstimulated comment, removing the comment and recording the comment into the database;
and the judging module is used for acquiring the user condition of the user who published the comment if the comment cannot be judged to be the illegal comment, and further judging the comment according to the user condition so as to determine whether the comment is illegal.
10. A computer-readable storage medium, in which a computer program is stored which can be loaded by a processor and which executes the method of any one of claims 1 to 7.
CN202011305016.5A 2020-11-19 2020-11-19 News comment auditing method, system, device and storage medium Active CN112231484B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011305016.5A CN112231484B (en) 2020-11-19 2020-11-19 News comment auditing method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011305016.5A CN112231484B (en) 2020-11-19 2020-11-19 News comment auditing method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN112231484A true CN112231484A (en) 2021-01-15
CN112231484B CN112231484B (en) 2022-11-08

Family

ID=74123839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011305016.5A Active CN112231484B (en) 2020-11-19 2020-11-19 News comment auditing method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN112231484B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010708A (en) * 2021-03-11 2021-06-22 上海麦糖信息科技有限公司 Verification method and system for illegal friend circle content and illegal chat content
CN113132368A (en) * 2021-04-12 2021-07-16 海南晨风科技有限公司 Chat data auditing method and device and computer equipment
CN113239674A (en) * 2021-06-15 2021-08-10 中国银行股份有限公司 User comment management method and device
CN113592465A (en) * 2021-09-29 2021-11-02 飞狐信息技术(天津)有限公司 Method and device for shunting to-be-audited content, server and computer storage medium
CN115641063A (en) * 2022-08-10 2023-01-24 中国民用航空飞行学院 Intelligent verification system for aviation information original data of medium and small airports
CN116204748A (en) * 2022-12-28 2023-06-02 河北省气象服务中心(河北省气象影视中心) Data processing method
CN116822496A (en) * 2023-06-02 2023-09-29 厦门她趣信息技术有限公司 Social information violation detection method, system and storage medium
CN117556146A (en) * 2024-01-10 2024-02-13 石家庄邮电职业技术学院 Network data information processing system, method, equipment and medium
CN117725909A (en) * 2024-02-18 2024-03-19 四川日报网络传媒发展有限公司 Multi-dimensional comment auditing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915673A (en) * 2014-03-11 2015-09-16 株式会社理光 Object classification method and system based on bag of visual word model
CN107807966A (en) * 2017-10-13 2018-03-16 深圳市迅雷网络技术有限公司 A kind of sensitive information screen method and service end
CN109977403A (en) * 2019-03-18 2019-07-05 北京金堤科技有限公司 The recognition methods of malice comment information and device
KR20200084506A (en) * 2019-01-03 2020-07-13 조규상 Information Display Ranking Decision System and Method
CN111522940A (en) * 2020-04-08 2020-08-11 百度在线网络技术(北京)有限公司 Method and device for processing comment information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915673A (en) * 2014-03-11 2015-09-16 株式会社理光 Object classification method and system based on bag of visual word model
CN107807966A (en) * 2017-10-13 2018-03-16 深圳市迅雷网络技术有限公司 A kind of sensitive information screen method and service end
KR20200084506A (en) * 2019-01-03 2020-07-13 조규상 Information Display Ranking Decision System and Method
CN109977403A (en) * 2019-03-18 2019-07-05 北京金堤科技有限公司 The recognition methods of malice comment information and device
CN111522940A (en) * 2020-04-08 2020-08-11 百度在线网络技术(北京)有限公司 Method and device for processing comment information

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010708B (en) * 2021-03-11 2023-08-25 上海麦糖信息科技有限公司 Method and system for auditing illegal friend circle content and illegal chat content
CN113010708A (en) * 2021-03-11 2021-06-22 上海麦糖信息科技有限公司 Verification method and system for illegal friend circle content and illegal chat content
CN113132368A (en) * 2021-04-12 2021-07-16 海南晨风科技有限公司 Chat data auditing method and device and computer equipment
CN113132368B (en) * 2021-04-12 2022-11-04 海南晨风科技有限公司 Chat data auditing method and device and computer equipment
CN113239674A (en) * 2021-06-15 2021-08-10 中国银行股份有限公司 User comment management method and device
CN113592465A (en) * 2021-09-29 2021-11-02 飞狐信息技术(天津)有限公司 Method and device for shunting to-be-audited content, server and computer storage medium
CN115641063A (en) * 2022-08-10 2023-01-24 中国民用航空飞行学院 Intelligent verification system for aviation information original data of medium and small airports
CN116204748A (en) * 2022-12-28 2023-06-02 河北省气象服务中心(河北省气象影视中心) Data processing method
CN116822496A (en) * 2023-06-02 2023-09-29 厦门她趣信息技术有限公司 Social information violation detection method, system and storage medium
CN116822496B (en) * 2023-06-02 2024-04-19 厦门她趣信息技术有限公司 Social information violation detection method, system and storage medium
CN117556146A (en) * 2024-01-10 2024-02-13 石家庄邮电职业技术学院 Network data information processing system, method, equipment and medium
CN117556146B (en) * 2024-01-10 2024-03-22 石家庄邮电职业技术学院 Network data information processing system, method, equipment and medium
CN117725909A (en) * 2024-02-18 2024-03-19 四川日报网络传媒发展有限公司 Multi-dimensional comment auditing method and device, electronic equipment and storage medium
CN117725909B (en) * 2024-02-18 2024-05-14 四川日报网络传媒发展有限公司 Multi-dimensional comment auditing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112231484B (en) 2022-11-08

Similar Documents

Publication Publication Date Title
CN112231484B (en) News comment auditing method, system, device and storage medium
CN107239666B (en) Method and system for desensitizing medical image data
US11361570B2 (en) Receipt identification method, apparatus, device and storage medium
KR100815530B1 (en) Method and system for filtering obscene contents
CN112613501A (en) Information auditing classification model construction method and information auditing method
WO2019218699A1 (en) Fraud transaction determining method and apparatus, computer device, and storage medium
CN111310446B (en) Information extraction method and device for judge document
US7779007B2 (en) Identifying content of interest
CA3117374C (en) Sensitive data detection and replacement
CN114648392B (en) Product recommendation method and device based on user portrait, electronic equipment and medium
CN114549241A (en) Contract examination method, device, system and computer readable storage medium
CN111695357A (en) Text labeling method and related product
CN117195319A (en) Verification method and device for electronic part of file, electronic equipment and medium
CN115690819A (en) Big data-based identification method and system
CN116401343A (en) Data compliance analysis method
CN112989167B (en) Method, device and equipment for identifying transport account and computer readable storage medium
CN115687754A (en) Active network information mining method based on intelligent conversation
CN105868271B (en) Surname statistical method and device
CN112800771B (en) Article identification method, apparatus, computer readable storage medium and computer device
CN113297482A (en) User portrait depicting method and system based on multi-model search engine data
CN112417847A (en) News content safety monitoring method, system, device and storage medium
Aggarwal et al. Geo-localized public perception visualization using GLOPP for social media
CN117333800B (en) Cross-platform content operation optimization method and system based on artificial intelligence
CN117745237A (en) Content inspection method, device, equipment and storage medium
CN114925373B (en) Mobile application privacy protection policy vulnerability automatic identification method based on user comment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant