CN112231484B - News comment auditing method, system, device and storage medium - Google Patents

News comment auditing method, system, device and storage medium Download PDF

Info

Publication number
CN112231484B
CN112231484B CN202011305016.5A CN202011305016A CN112231484B CN 112231484 B CN112231484 B CN 112231484B CN 202011305016 A CN202011305016 A CN 202011305016A CN 112231484 B CN112231484 B CN 112231484B
Authority
CN
China
Prior art keywords
comment
picture
comments
elements
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011305016.5A
Other languages
Chinese (zh)
Other versions
CN112231484A (en
Inventor
谢宇
贺弘联
孔泽平
周珞
陈光林
王炫
张训
汤军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Red Net New Media Group Co ltd
Original Assignee
Hunan Red Net New Media Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Red Net New Media Group Co ltd filed Critical Hunan Red Net New Media Group Co ltd
Priority to CN202011305016.5A priority Critical patent/CN112231484B/en
Publication of CN112231484A publication Critical patent/CN112231484A/en
Application granted granted Critical
Publication of CN112231484B publication Critical patent/CN112231484B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Abstract

The application relates to a news comment auditing method, system, device and storage medium. The method comprises the steps of obtaining comments initiated by a user side, and identifying texts and pictures in the comments; extracting characters and elements in the picture, identifying and judging whether the elements contain violation elements, and if so, removing the picture; semantic monitoring is carried out on the text and the picture characters, if the monitoring result is sensitive comment, irrigation comment or over-excited comment, the comment is judged to be an illegal comment, the comment is removed, and the comment is recorded into a database; if the comment cannot be judged to be the illegal comment, the user condition of the comment-making user is obtained, and the comment is further judged according to the user condition so as to determine whether the comment is illegal. The method and the device have the effects of increasing auditing efficiency and reducing passing rate of illegal comments.

Description

News comment auditing method, system, device and storage medium
Technical Field
The present application relates to the field of review, and in particular, to a method, a system, an apparatus, and a storage medium for review of news reviews.
Background
With the development of the information age, the network propagation speed is faster and faster, online posting, news watching, microblog swiping and the like become popular activities of contemporary people, and meanwhile, contemporary people are also interested in postings, news or microblog message comments and expression viewpoints.
Meanwhile, unfair comments such as control comment, malicious screen swiping, political comment and the like appear on the internet, and the news content and information dissemination are adversely affected.
The existing news comment review is generally manually reviewed by background managers, when review comment contents are too much or the review content is overweight, many illegal or reactive yellow comment contents are taken as normal comments and displayed on a user side interface through review by the managers, and great social safety and content safety hidden dangers are caused.
Disclosure of Invention
In order to increase the auditing strength and reduce the passing rate of illegal comments, the application provides a news comment auditing method, system, device and storage medium.
In a first aspect, the news comment auditing method provided by the application adopts the following technical scheme:
a news review auditing method comprises the following steps:
obtaining comments initiated by a user side, and identifying texts and pictures in the comments;
extracting characters and elements in the picture, identifying and judging whether the elements contain illegal elements, and if so, removing the picture;
semantic monitoring is carried out on the text and the picture characters, if the monitoring result is a sensitive comment, a water-filling comment or an overstimulated comment, the comment is judged to be an illegal comment, the comment is removed, and the comment is recorded into the database;
if the comment cannot be judged to be the illegal comment, acquiring the user condition of the comment publishing user, and further judging the comment according to the user condition to determine whether the comment is illegal.
By adopting the technical scheme, after a user comments on news contents, the server acquires comments initiated by the user, identifies a text part and a picture part in the comments, identifies an element part contained in the picture, judges the element, and if the picture contains illegal elements, such as political-oriented elements or erotic violence elements, and the like, judges that the picture is the illegal picture, deletes the picture, and avoids the bad influence of the picture in the news comments; the method comprises the steps of extracting text parts in pictures and text parts in comments, carrying out semantic monitoring on the texts, deleting the comments if the text parts of the comments are sensitive comments, irrigation comments or over-excited comments, recording the comments into a database, avoiding similar comments next time, not judging the comments, and directly deleting the comments, so that the illegal comments are prevented from appearing in a comment area of news content, judging the comments according to a user who issues the comments if the comments which cannot be judged by a server appear, judging the comments if most recent comments of the user are illegal comments, judging the comments to be illegal comments, avoiding the comments to appear in the comment area of the news content, further enhancing the auditing degree and efficiency of the comments, and reducing the passing rate of the illegal comments.
The invention in a preferred example may be further configured to: the semantic monitoring of the text and the picture words comprises:
preparing a corresponding preset number of sample words for various emotion words, and calibrating the emotion category corresponding to each sample word;
training a recognition model by using the sample words;
performing word segmentation processing on the text and the picture characters;
and inputting the text and the picture characters after the word segmentation processing into the recognition model for recognition, and outputting a first recognition result.
By adopting the technical scheme, training is performed according to a plurality of sample words, emotion types of each word are marked, wherein the emotion types comprise normal comments, positive comments, negative comments and aggressive comments, the aggressive comments are illegal comments, word segmentation processing is performed on text and picture character parts in the comments, the segmented words are input into the recognition model, the emotion types of the text parts and the picture character parts of the comments are obtained, if the emotion types are the aggressive comments, the comments are illegal comments, the comments need to be deleted, and the comments are prevented from appearing in a comment area of news content and causing bad influence.
The present invention in a preferred example may be further configured to: the semantic monitoring of the text and the picture words further comprises:
matching the respective words separated with the sample words;
identifying the successfully matched words and obtaining the emotion types of the successfully matched words;
acquiring the emotion types with the most occurrence times, and outputting a second identification result;
and if the first identification result is inconsistent with the second identification result, sending the comment to an auditor terminal so that the auditor can audit the comment by using the auditor terminal.
By adopting the technical scheme, the divided words are matched with the sample words, the corresponding emotion types are marked on the sample words, the emotion types of the words which are successfully matched are selected, the emotion type with the most emotion types is obtained, the emotion type is judged to be the second identification result of the comment, the first identification result is compared with the second identification result, if the first identification result is inconsistent with the second identification result, manual review needs to be carried out on the comment, the comment is higher in accuracy through the comparison of the first identification result and the second identification result, the emotion judgment on the comment is further accurate through the manual review, and the bad experience of a user due to the fact that too many wrong deletion conditions occur is avoided.
The present invention in a preferred example may be further configured to: the sending the comment to the auditor side comprises the following steps:
sending the comments to an auditor end so that the auditor end can manually audit the comments;
if the comment is judged to be an overstimulated comment, acquiring a word with the emotional category marked by the auditor side as overstimulated;
supplementing the terms into the recognition model and removing the comments.
By adopting the technical scheme, when the first recognition result is inconsistent with the second recognition result, the review needs to be judged by the auditor by using the auditor end to the review so as to enhance the emotion judgment accuracy of the review, if the overexcited word appears in the review, the auditor extracts the overexcited word, supplements the overexcited word to the recognition model, and trains the recognition model further, so that the recognition effect of the recognition model is more accurate, and the recognition error probability is reduced.
The present invention in a preferred example may be further configured to: the semantic monitoring of the text and the picture words comprises:
performing word segmentation processing on the comments;
extracting all the participles to be matched with the historical comments of the database;
and if the matching quantity exceeds a preset value, judging the comment to be a watering comment.
By adopting the technical scheme, the comments are matched with the historical comments of the database, if the number of words in the comments is large and exceeds a preset value, the comments are similar to the previously published comments, the comments are judged to be irrigation comments, and need to be deleted, so that the phenomenon that the user maliciously swipes the news content is avoided.
The invention in a preferred example may be further configured to: preparing training pictures which contain various elements sufficiently, and labeling the training pictures with categories;
training a recognition model by using the training picture so that the recognition model outputs the corresponding category according to the training picture;
inputting the picture into the recognition model, and recognizing the category of elements contained in the picture;
and identifying the corresponding relation among the elements, judging whether the elements are illegal elements, and removing the picture if the elements are illegal elements.
By adopting the technical scheme, when the comments contain the pictures, the server conducts massive training on the recognition models to improve the recognition accuracy of the recognition models, so that the categories of all elements contained in the pictures are recognized, if the elements contained in the pictures contain political induction elements, the pictures need to be deleted, and bad public opinion influence is avoided; meanwhile, if the combination of elements appearing in the picture has a political induction effect, the picture may be deleted to avoid the picture appearing in the comment area to cause bad public opinion influence.
The present invention in a preferred example may be further configured to: the obtaining of the user condition of the user who makes the comment and the further judgment of the comment according to the user condition comprise:
obtaining the user condition for posting the comment, wherein the user condition comprises the comment recently posted by the comment;
extracting the monitoring results of the recent comments of the user, and selecting the category with the most occurrence times in the monitoring results;
and if the category with the most occurrence times is sensitive comment, irrigation comment or aggressive comment, judging the comment to be an illegal comment, and removing the comment.
By adopting the technical scheme, if the condition that the comment cannot be judged occurs, the emotion type of the comment is judged according to the emotion type of the comment recently issued by the user, if most of the comments recently issued by the user are overexcited comments, the comment is also judged to be overexcited, the comment is further judged by using the method, the comment auditing strength is improved, and the passing rate of the improper comment is reduced.
In a second aspect, the news comment auditing system provided by the application adopts the following technical scheme:
a news review system comprising:
the device comprises an acquisition device, a display device and a display device, wherein the acquisition device is used for acquiring comments initiated by a user side and identifying texts and pictures in the comments;
the recognition device is used for extracting characters and elements in the picture, recognizing and judging whether the elements contain violation elements or not, and if so, removing the picture;
the monitoring device is used for carrying out semantic monitoring on the text and the picture characters, judging that the comment is an illegal comment if a monitoring result is a sensitive comment, a watering comment or an overstimulated comment, removing the comment and recording the comment into the database;
and the judging device is used for acquiring the user condition of the comment publishing user if the comment cannot be judged to be the violation comment, and further judging the comment according to the user condition so as to determine whether the comment is violation.
By adopting the technical scheme, after a user reviews news contents, the server acquires comments initiated by the user, identifies a text part and a picture part in the comments, identifies element parts contained in the picture, judges the elements, and if the picture contains illegal elements, such as political-oriented elements or erotic-violent elements, and the like, judges that the picture is the illegal picture, deletes the picture and avoids the picture from being in the news comments and having bad influence; and then extracting a text part in the picture and a text part in the comment, performing semantic monitoring on the text, deleting the comment if the text part of the comment appears as a sensitive comment, a water-filling comment or an overstimulation comment, recording the comment into a database, avoiding similar comments next time, and directly deleting the comment without judging the comment, thereby avoiding that the illegal comment appears in a comment area of news content.
In a third aspect, the news comment auditing device provided by the application adopts the following technical scheme;
a news review auditing apparatus, comprising:
the acquisition module is used for acquiring comments initiated by a user side and identifying texts and pictures in the comments;
the recognition module is used for extracting characters and elements in the picture, recognizing and judging whether the elements contain violation elements or not, and if so, removing the picture;
the monitoring module is used for performing semantic monitoring on the text and the picture characters, judging that the comment is an illegal comment if a monitoring result is a sensitive comment, a water-filling comment or an overstimulated comment, removing the comment, and recording the comment into the database;
and the judging module is used for acquiring the user condition of the user who published the comment if the comment cannot be judged to be the illegal comment, and further judging the comment according to the user condition so as to determine whether the comment is illegal.
By adopting the technical scheme, after a user comments on news contents, the server acquires comments initiated by the user, identifies a text part and a picture part in the comments, identifies an element part contained in the picture, judges the element, and if the picture contains illegal elements, such as political-oriented elements or erotic violence elements, and the like, judges that the picture is the illegal picture, deletes the picture, and avoids the bad influence of the picture in the news comments; and then extracting a text part in the picture and a text part in the comment, performing semantic monitoring on the text, deleting the comment if the text part of the comment appears as a sensitive comment, a water-filling comment or an overstimulation comment, recording the comment into a database, avoiding similar comments next time, and directly deleting the comment without judging the comment, thereby avoiding that the illegal comment appears in a comment area of news content.
In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program that can be loaded by a processor and execute any of the above methods for reviewing news reviews.
In summary, the present application includes at least one of the following beneficial technical effects:
1. in the scheme, the server acquires the comment initiated by the user, performs semantic analysis on the comment, identifies the picture in the comment, deletes the picture if the picture is illegal, removes the comment if the text part in the comment is illegal, so that the review strength of the comment is enhanced, the illegal comment is reduced, the system is displayed after the comment is reviewed, the illegal comment is prevented from being deleted after the comment is displayed in a comment area, and the negative influence caused by the illegal comment is reduced;
2. in the scheme, when the server carries out semantic analysis on the comment, the server carries out different analysis twice on the emotion type of the comment and compares different analysis results of the two times, so that the accuracy of the emotion analysis on the comment is further enhanced;
3. in the scheme, if the emotion types of the comments cannot be judged accurately, the emotion types of the recent comments of the users are analyzed, the comments are judged according to the most emotion types in the recent comments of the users, and the auditing strength is further increased.
Drawings
Fig. 1 is a schematic flow chart in the first embodiment of the present application.
Fig. 2 is a block diagram of the second embodiment of the present application.
Fig. 3 is a schematic diagram of a system in a third embodiment of the present application.
Detailed Description
The present application is described in further detail below with reference to figures 1-3.
The present embodiment is only for explaining the present invention, and it is not limited to the present invention, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present invention.
The first embodiment is as follows:
a news review method, referring to fig. 1, includes:
101. and obtaining comments initiated by a user side, and identifying texts and pictures in the comments.
Specifically, after a user initiates a comment at a user side, the server obtains the comment, wherein the comment at least comprises a text part. And after the server acquires the comment, identifying a text part in the comment, and if the comment contains a picture, acquiring a picture part in the comment. Preferably, the word number requirement of the comment issued by the user side is set, namely, the number of the characters of the comment is at least more than 5, so that the number of meaningless comments is reduced. The user terminal can be a mobile phone terminal or a computer terminal.
102. Extracting characters and elements in the picture, identifying and judging whether the elements contain violation elements, and if so, removing the picture.
Specifically, the server extracts characters and elements in the picture, and pre-processes the picture, wherein the pre-processes mainly comprise graying, binaryzation, noise extraction, inclination correction and the like; then, separating the pictures, projecting the characters subjected to tilt correction to a Y axis, and accumulating all values to obtain a histogram on the Y axis; the image is divided into a plurality of small images, the characteristic vectors extracted from characters scanned by each part of the image are subjected to template rough classification and template fine matching with a characteristic template library, and the character part in the image is identified, so that the server can analyze the emotion category of the character part in the image, and whether the image is deleted is judged.
Further, preparing enough training pictures containing various elements, and labeling the training pictures with categories;
training a recognition model by using the training picture so that the recognition model outputs the corresponding category according to the training picture;
inputting the picture into the recognition model, and recognizing the category of elements contained in the picture;
and identifying the corresponding relation among the elements, judging whether the elements are illegal elements, and removing the picture if the elements are illegal elements.
Specifically, the server performs a large amount of training on the recognition model, specifically prepares a training picture containing various elements in advance, where the elements may specifically be elements having political induction factors and sub-elements possibly forming the political induction factors, for example, the containing elements are a torch and a national flag, and if the positions of the torch elements and the national flag elements coincide, the combining elements are determined to be the elements having the political induction factors; further, the element may specifically be a pornographic element, a violent element, a two-dimensional code factor, or the like.
All training samples are labeled with categories, which can be national flags, pornography, violence, two-dimensional codes, torch and the like. The number of the training pictures is at least 1000 in each category, and further, if a new violation element appears in the comment picture, the picture can be used as the training picture, and the recognition model is further trained, so that the training accuracy is improved.
Inputting the pictures in the comments into an identification model, identifying elements in the pictures, specifically applying a YOLO algorithm, dividing the pictures into grids, and predicting class probability and bounding box of each grid, for example, taking a 100x100 image as an example, dividing the image into grids, such as 7x7; for each mesh, the network predicts the probability that a bounding box corresponds to each element (the "flag", "pornography", "violence", "two-dimensional code", "torch", etc.).
And further judging picture elements according to the position relation of each element, namely whether the elements are overlapped, if the positions of the element of the 'torch' and the element of the 'national flag' are overlapped, judging that the element in the picture is an illegal element, and removing the picture. Besides the basic elements of the picture are judged, the combination of a plurality of elements in the picture is also judged, the auditing strength of the picture is further enhanced, and the picture is prevented from appearing in a comment area to cause bad public opinion influence.
103. And performing semantic monitoring on the text and the picture characters, if the monitoring result is a sensitive comment, a water-filling comment or an over-excited comment, judging the comment to be an illegal comment, removing the comment, and recording the comment into the database.
Specifically, character parts in the picture and texts in the comments are extracted, and semantic monitoring is performed on the characters and the texts in the picture, wherein the semantic monitoring specifically includes emotion analysis on the texts, similarity analysis on the texts and sensitive word monitoring on the texts. The method mainly comprises the steps of monitoring sensitive words and red marked words on problems, wherein the sensitive words represent names of national leaders, leaders of national important institutions, song leaders of provinces, cities and counties, leaders of institutions, certain sensitive events and the like, and the red marked words represent words with sensitive political tendency, violence tendency and unhealthy colors or non-civilized words and the like.
Specifically, a monitoring word list is established, and monitoring words are supplemented in the monitoring word list, wherein the monitoring words can be extracted from a large amount of news text data, novel text data and magazine text data; the method comprises the steps of performing word segmentation on text contents in comments, matching the text contents with monitoring words in a monitoring word list, judging the comments as sensitive comments if the monitoring words in the monitoring word list appear, removing the comments, recording the comments into a database, directly judging the comments as illegal comments if similar comments appear later, and further marking a user who issues the comments after recording the illegal comments into the database, namely, indicating that the user issues one illegal comment.
Further, preparing a preset number of corresponding sample words for each type of emotion words, and calibrating the emotion category corresponding to each sample word;
training a recognition model by using the sample words;
performing word segmentation processing on the text and the picture characters;
and inputting the text and the picture characters after the word segmentation processing into the recognition model for recognition, and outputting a first recognition result.
Specifically, the server acquires sample words of various emotion words with a preset number, wherein each emotion word is one thousand, the types of the emotion words can be positive comments, negative comments, over-excited comments and common comments, the sample words can be supplemented continuously, and the identification accuracy is relatively high when the number of the sample words is more; calibrating the categories of all sample words; and continuously training the recognition model by utilizing the sample words to ensure that the recognition result of the recognition model is continuously accurate, performing word segmentation processing on the text and the picture words in the comment by using the server, inputting the word-segmented text and picture words into the recognition model, recognizing the categories of the text and the picture words, and outputting a first recognition result, wherein the first recognition result is the emotion category of the text and the picture words of the comment.
The format of the picture is not limited.
Further, matching each separated word with the sample word;
identifying successfully matched words and obtaining the emotion types of the successfully matched words;
acquiring the emotion types with the largest occurrence frequency, and outputting a second identification result;
and if the first identification result is inconsistent with the second identification result, sending the comment to an auditor end so that the auditor can utilize the auditor end to audit the comment.
Specifically, after segmenting words of the text and the picture characters of the comment, matching all the segmented words with sample words, selecting the emotion type of each word, and screening out the emotion type with the most occurrence times, wherein the emotion type is a second identification result which is the primary judgment on the emotion types of the text and the picture characters of the comment; and comparing the second identification result with the second identification result, if the second identification result is consistent with the second identification result, judging that the emotion types of the comment text and the picture characters are identification results, and if the second identification result is inconsistent with the second identification result, judging the comment again, namely sending the comment to an auditor side, so that the auditor can manually audit the comment. The auditor terminal may be a computer terminal or a mobile phone terminal.
Further, the comments are sent to an auditor side, so that the auditor side can manually audit the comments;
if the comment is judged to be an overstimulated comment, words marked by the auditor side and having an emotional category of overstimulation are obtained;
supplementing the words into the recognition model and removing the comments.
Specifically, when the first identification result is inconsistent with the second identification result, the server sends the comment to an auditor terminal, so that the auditor uses the auditor terminal to manually audit the comment, and if the comment is judged to be a positive comment, a negative comment or a common comment, the comment is displayed in a comment area of news content; if the comment is judged to be the overstimulation comment, extracting the word with the overstimulation emotion type in the comment, and supplementing the word into the recognition model so that the recognition model can be trained according to the word, and therefore the recognition accuracy of the recognition model is further improved, the workload of a review terminal is reduced, and the intelligence of the server is further improved.
Further, performing word segmentation processing on the comments;
extracting all the participles to be matched with the historical comments of the database;
and if the matching quantity exceeds a preset value, judging the comment to be a watering comment.
Specifically, the comments are segmented, each user has an account corresponding to the user, historical comments of each account are recorded in the database, the segmentation of new comments is matched with the historical comments in the database, if the matching quantity exceeds a preset value, specifically, if the comment segmentation is 10 words, if 80% of the 10 words in the comment are the same as at least one comment in the historical comments, the comment is judged to be an irrigation comment, the comment needs to be removed, the situation that the user maliciously swipes the news content is avoided, and the reading experience of the user is improved.
104. If the comment cannot be judged to be the illegal comment, acquiring the user condition of the comment publishing user, and further judging the comment according to the user condition to determine whether the comment is illegal.
Specifically, if the comment cannot be judged to be an illegal comment, the user condition of the user is obtained, the user condition is specifically the recent comment of the user, the registration date of the user and the community dynamics of the user, if the recent comment of the user is the illegal comment, the comment of the user is judged to be the illegal comment, the user is in an over-excited or violent emotion in a near stage and is difficult to change the attitude suddenly, the user can be further subjected to number sealing treatment by using positive or normal psychological comments, the comment of the user is forbidden, the number sealing time can be evaluated according to the category of the recent comment of the user, if the user is mostly irrigated comments, the user is judged to be maliciously refreshed, and the user is forbidden for 1 year; if the user is overexcited with comments recently, exempting from the responsibility and identifying that the user is not good in recent mental state, and forbidding the user 1 for worship time to cool and quiet the user for a period of time; if the user is sensitive comment recently, the user is determined to have abnormal political thinking, the account of the user is forbidden permanently, and bad public opinion influence caused by the user comment is avoided. Furthermore, the IP address of the computer end is fixed according to the registered IP address of the user, and if the user applies for an account number by using the computer end, all account numbers applied for the IP address are simultaneously forbidden, so that the condition that the user creates a plurality of account numbers to illegally comment news contents is avoided.
Further, obtaining the user condition for posting the comment, wherein the user condition comprises the comment recently posted by the comment;
extracting the monitoring results of the recent comments of the user, and selecting the category with the most occurrence times in the monitoring results;
and if the category with the most occurrence times is sensitive comment, irrigation comment or over-excited comment, judging the comment to be an illegal comment, and removing the comment.
Specifically, the user condition of the comment is obtained, wherein the comment can be a comment recently issued by the user, the recent comments of the user are recorded in the database, the recent comment of the user is retrieved from the database, monitoring results of the recent comments are extracted, the monitoring results are specifically 'aggressive comment', 'positive comment', 'negative comment', 'common comment', 'irrigation comment' or 'sensitive comment', the monitoring result with the largest number of occurrences of the monitoring result in the recent comments is selected, and if the category with the largest number of occurrences is 'aggressive comment', 'irrigation comment', 'sensitive comment', the item-using theory is judged to be an illegal comment, and the comment is removed. Therefore, the auditing strength is further increased, and the condition that illegal comments appear in the comment area is avoided, wherein the comment area can be a display area below news content in a computer-side interface or a mobile phone-side interface.
The implementation principle of the embodiment is as follows:
after a user comments on news contents, a server acquires comments initiated by the user, identifies a text part and a picture part in the comments, identifies an element part contained in the picture, judges the element, and if the picture contains illegal elements, such as elements with political guidance or elements with erotic violence and the like, judges that the picture is the illegal picture, deletes the picture and avoids the bad influence caused by the picture appearing in the news comments;
extracting character parts in the pictures and character parts in the comments, performing semantic monitoring on the characters, deleting the comments if the character parts of the comments appear as sensitive comments, irrigation comments or over-excited comments, and recording the comments into a database, so that similar comments are avoided next time, the comments do not need to be judged again, and the comments can be directly deleted, so that illegal comments are prevented from appearing in the comment area of news content;
if the comments which cannot be judged by the server appear, the comments are judged according to the users who publish the comments, if most of the recent comments of the users are illegal comments, the comments are judged to be illegal comments, the comments are prevented from appearing in the comment area of news content, and therefore bad public opinion influence is caused, the review strength and efficiency of the comments can be further enhanced, and the passing rate of the illegal comments is reduced.
The second embodiment:
a news review system, referring to fig. 2, comprising:
the obtaining device 201 obtains the comment initiated by the user end, and identifies the text and the picture in the comment.
The recognition device 202 extracts characters and elements in the picture, recognizes and judges whether the elements include violation elements, and if so, removes the picture.
Further, preparing enough training pictures containing various elements, and labeling the training pictures with categories;
training a recognition model by using the training picture so that the recognition model outputs the corresponding category according to the training picture;
inputting the picture into the recognition model, and recognizing the category of elements contained in the picture;
and identifying the corresponding relation among the elements, judging whether the elements are illegal elements, and removing the picture if the elements are illegal elements.
And the monitoring device 203 is used for performing semantic monitoring on the text and the picture characters, judging that the comment is an illegal comment if a monitoring result is a sensitive comment, a watering comment or an aggressive comment, removing the comment, and recording the comment into the database.
Further, preparing corresponding sample words with preset quantity for various emotion words, and calibrating the emotion category corresponding to each sample word;
training a recognition model by using the sample words;
performing word segmentation processing on the text and the picture characters;
and inputting the text and the picture characters after the word segmentation processing into the recognition model for recognition, and outputting a first recognition result.
Further, matching each word separated with the sample word;
identifying the successfully matched words and obtaining the emotion types of the successfully matched words;
acquiring the emotion types with the most occurrence times, and outputting a second identification result;
and if the first identification result is inconsistent with the second identification result, sending the comment to an auditor terminal so that the auditor can audit the comment by using the auditor terminal.
Further, the comments are sent to an auditor side, so that the auditor side can manually audit the comments;
if the comment is judged to be an overstimulated comment, words marked by the auditor side and having an emotional category of overstimulation are obtained;
supplementing the terms into the recognition model and removing the comments.
Further, performing word segmentation processing on the comments;
extracting all the participles to be matched with the historical comments of the database;
and if the matching quantity exceeds a preset value, judging the comment to be a watering comment.
If the comment cannot be judged to be an illegal comment, the judgment device 204 acquires the user condition of the comment posting user, and further judges the comment according to the user condition to determine whether the comment is illegal.
Further, obtaining the user condition for posting the comment, wherein the user condition comprises the comment recently posted by the comment;
extracting the monitoring results of the recent comments of the user, and selecting the category with the most occurrence times in the monitoring results;
and if the category with the most occurrence times is sensitive comment, irrigation comment or aggressive comment, judging the comment to be an illegal comment, and removing the comment.
Example three:
a news review system, referring to fig. 3, comprising:
the obtaining module 301 obtains a comment initiated by a user, and identifies a text and a picture in the comment.
The identification module 302 extracts characters and elements in the picture, identifies and judges whether the elements include violation elements, and if so, removes the picture.
Preparing training pictures containing enough various elements, and labeling the training pictures with categories;
training a recognition model by using the training picture so that the recognition model outputs the corresponding category according to the training picture;
inputting the picture into the recognition model, and recognizing the category of elements contained in the picture;
and identifying the corresponding relation among the elements, judging whether the elements are illegal elements, and removing the picture if the elements are illegal elements.
The monitoring module 303 performs semantic monitoring on the text and the picture characters, determines that the comment is an illegal comment if a monitoring result is a sensitive comment, a water-filling comment or an aggressive comment, removes the comment, and records the comment into the database.
Further, preparing corresponding sample words with preset quantity for various emotion words, and calibrating the emotion category corresponding to each sample word;
training a recognition model by using the sample words;
performing word segmentation processing on the text and the picture characters;
and inputting the text and the picture characters after the word segmentation processing into the recognition model for recognition, and outputting a first recognition result.
Further, matching each word separated with the sample word;
identifying the successfully matched words and obtaining the emotion types of the successfully matched words;
acquiring the emotion types with the most occurrence times, and outputting a second identification result;
and if the first identification result is inconsistent with the second identification result, sending the comment to an auditor terminal so that the auditor can audit the comment by using the auditor terminal.
Further, the comments are sent to an auditor side, so that the auditor side can manually audit the comments;
if the comment is judged to be an overstimulated comment, acquiring a word with the emotional category marked by the auditor side as overstimulated;
supplementing the terms into the recognition model and removing the comments.
Further, performing word segmentation processing on the comments;
extracting all the participles to be matched with the historical comments of the database;
and if the matching quantity exceeds a preset value, judging the comment to be a watering comment.
If the comment cannot be judged to be an illegal comment, the judgment module 304 acquires the user condition of the comment publishing user, and further judges the comment according to the user condition to determine whether the comment is illegal.
Further, obtaining the user condition for posting the comment, wherein the user condition comprises the comment recently posted by the comment;
extracting the monitoring results of the recent comments of the user, and selecting the category with the most occurrence times in the monitoring results;
and if the category with the most occurrence times is sensitive comment, irrigation comment or aggressive comment, judging the comment to be an illegal comment, and removing the comment.
It should be noted that: when the device and the system for reviewing the news comments provided by the above embodiments execute the method for reviewing the news comments, only the division of the above functional modules is taken as an example, in practical application, the above functions can be distributed and completed by different functional modules according to needs, that is, the device and the internal structure of the device are divided into different functional modules, so as to complete all or part of the above described functions. In addition, the embodiments of the method, the system and the apparatus for reviewing news comments provided by the embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the embodiments of the method, which is not described herein again.
It will be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
The non-volatile memory may be ROM, programmable Read Only Memory (PROM), erasable Programmable Read Only Memory (EPROM), electrically Erasable Programmable Read Only Memory (EEPROM), or flash memory.
Volatile memory can be RAM, which acts as external cache memory. There are many different types of RAM, such as Static Random Access Memory (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synclink DRAM (SLDRAM), and direct memory bus RAM.
The processor mentioned in any of the above may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the programs of the news review method. The processing module and the storage module may be decoupled, and are respectively disposed on different physical devices, and are connected in a wired or wireless manner to implement respective functions of the processing module and the storage module, so as to support the system chip to implement various functions in the foregoing embodiments. Alternatively, the processing module and the memory may be coupled to the same device.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a computer-readable storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned computer-readable storage medium comprises: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The above description is only a preferred embodiment of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (9)

1. A news review auditing method is characterized by comprising the following steps:
obtaining comments initiated by a user side, and identifying texts and pictures in the comments;
extracting characters and elements in the picture, identifying and judging whether the elements contain violation elements, and if so, removing the picture;
semantic monitoring is carried out on the text and the picture characters, if the monitoring result is sensitive comment, irrigation comment or overstrain comment, the comment is judged to be an illegal comment, the comment is removed, and the comment is recorded into a database;
if the comment cannot be judged to be the violation comment, acquiring the user condition of a user who submits the comment, and further judging the comment according to the user condition to determine whether the comment violates;
extracting characters and elements in the picture, identifying and judging whether the elements contain violation elements, and if so, removing the picture comprises the following steps:
training a recognition model by using a training picture so that the recognition model outputs a corresponding category according to the training picture;
inputting a picture in a comment into the recognition model, and recognizing an element in the picture;
dividing the picture into grids, and predicting class probability and a bounding box of each grid;
predicting, for each of said grids, a probability that a bounding box corresponds to each element;
judging whether the elements are overlapped according to the position relation of each element, and judging whether the picture elements contain the violation elements according to the judgment result;
and if so, removing the picture.
2. The method of claim 1, wherein the semantically monitoring the text and picture text comprises:
preparing corresponding preset number of sample words for various emotion words, and calibrating the emotion category corresponding to each sample word;
training a recognition model by using the sample words;
performing word segmentation processing on the text and the picture characters;
and inputting the text and the picture characters after the word segmentation processing into the recognition model for recognition, and outputting a first recognition result.
3. The method of claim 2, wherein the semantically monitoring the text and picture text further comprises:
matching the respective words separated with the sample words;
identifying words successfully matched, and acquiring the emotion types of the words successfully matched;
acquiring the emotion types with the largest occurrence frequency, and outputting a second identification result;
and if the first identification result is inconsistent with the second identification result, sending the comment to an auditor end so that the auditor can utilize the auditor end to audit the comment.
4. The method of claim 3, wherein the sending the comment to an auditor side comprises:
sending the comments to the auditor side so that the auditor side can manually audit the comments;
if the comment is judged to be the overstimulation comment, acquiring the word marked by the auditor side and with the emotional category being overstimulation;
supplementing the terms into the recognition model and removing the comments.
5. The method of claim 1, wherein the semantically monitoring the text and the picture words comprises:
performing word segmentation processing on the comments;
extracting all participles to be matched with the historical comments of the database;
and if the matching quantity exceeds a preset value, judging that the comment is the irrigation comment.
6. The method of claim 1, wherein the obtaining of the user condition of the user who published the comment, and the further determination of the comment according to the user condition comprises:
obtaining the user condition for posting the comment, wherein the user condition comprises the comment recently posted by the user;
extracting recent monitoring results of the comments of the user, and selecting the category with the most occurrence times in the monitoring results;
and if the category with the largest occurrence frequency is the sensitive comment, the irrigation comment or the exciting comment, judging that the comment is the violation comment, and removing the comment.
7. A news review system, comprising:
the device comprises an acquisition device, a display device and a display device, wherein the acquisition device is used for acquiring comments initiated by a user side and identifying texts and pictures in the comments;
the recognition device is used for extracting characters and elements in the picture, recognizing and judging whether the elements contain violation elements or not, and if so, removing the picture;
the monitoring device is used for performing semantic monitoring on the text and the picture characters, judging that the comment is an illegal comment if a monitoring result is a sensitive comment, a water-filling comment or an overstimulated comment, removing the comment, and recording the comment into a database;
the judging device is used for acquiring the user condition of a user who issues the comment if the comment cannot be judged to be the violation comment, and further judging the comment according to the user condition so as to determine whether the comment is violation;
extracting characters and elements in the picture, identifying and judging whether the elements contain illegal elements, and if so, removing the picture comprises the following steps:
training a recognition model by using a training picture so that the recognition model outputs a corresponding category according to the training picture;
inputting a picture in a comment into the recognition model, and recognizing an element in the picture;
dividing the picture into grids, and predicting class probability and a bounding box of each grid;
predicting, for each of said grids, a probability that a bounding box corresponds to each element;
judging whether the elements are overlapped according to the position relation of each element, and judging whether the picture elements contain violation elements according to the judgment result;
and if so, removing the picture.
8. A news review auditing apparatus, comprising:
the acquisition module is used for acquiring comments initiated by a user side and identifying texts and pictures in the comments;
the recognition module is used for extracting characters and elements in the picture, recognizing and judging whether the elements contain violation elements or not, and if so, removing the picture;
the monitoring module is used for carrying out semantic monitoring on the text and the picture characters, judging that the comment is an illegal comment if a monitoring result is a sensitive comment, a watering comment or an overstimulated comment, removing the comment and recording the comment into a database;
the judging module is used for acquiring the user condition of a user who issues the comment if the comment cannot be judged to be the violation comment, and further judging the comment according to the user condition to determine whether the comment is violation;
extracting characters and elements in the picture, identifying and judging whether the elements contain violation elements, and if so, removing the picture comprises the following steps:
training a recognition model by using a training picture so that the recognition model outputs a corresponding category according to the training picture;
inputting a picture in a comment into the recognition model, and recognizing an element in the picture;
dividing the picture into grids, and predicting class probability and bounding boxes of each grid;
predicting, for each of said meshes, a probability that a bounding box corresponds to each element;
judging whether the elements are overlapped according to the position relation of each element, and judging whether the picture elements contain violation elements according to the judgment result;
and if so, removing the picture.
9. A computer-readable storage medium, characterized in that a computer program is stored which can be loaded by a processor and which executes a method according to any one of claims 1 to 6.
CN202011305016.5A 2020-11-19 2020-11-19 News comment auditing method, system, device and storage medium Active CN112231484B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011305016.5A CN112231484B (en) 2020-11-19 2020-11-19 News comment auditing method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011305016.5A CN112231484B (en) 2020-11-19 2020-11-19 News comment auditing method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN112231484A CN112231484A (en) 2021-01-15
CN112231484B true CN112231484B (en) 2022-11-08

Family

ID=74123839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011305016.5A Active CN112231484B (en) 2020-11-19 2020-11-19 News comment auditing method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN112231484B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010708B (en) * 2021-03-11 2023-08-25 上海麦糖信息科技有限公司 Method and system for auditing illegal friend circle content and illegal chat content
CN113132368B (en) * 2021-04-12 2022-11-04 海南晨风科技有限公司 Chat data auditing method and device and computer equipment
CN113592465A (en) * 2021-09-29 2021-11-02 飞狐信息技术(天津)有限公司 Method and device for shunting to-be-audited content, server and computer storage medium
CN115641063B (en) * 2022-08-10 2023-06-30 中国民用航空飞行学院 Intelligent auditing system for aviation information original data of middle and small airports
CN116204748A (en) * 2022-12-28 2023-06-02 河北省气象服务中心(河北省气象影视中心) Data processing method
CN116822496B (en) * 2023-06-02 2024-04-19 厦门她趣信息技术有限公司 Social information violation detection method, system and storage medium
CN117556146B (en) * 2024-01-10 2024-03-22 石家庄邮电职业技术学院 Network data information processing system, method, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915673A (en) * 2014-03-11 2015-09-16 株式会社理光 Object classification method and system based on bag of visual word model
CN107807966A (en) * 2017-10-13 2018-03-16 深圳市迅雷网络技术有限公司 A kind of sensitive information screen method and service end
CN109977403A (en) * 2019-03-18 2019-07-05 北京金堤科技有限公司 The recognition methods of malice comment information and device
KR20200084506A (en) * 2019-01-03 2020-07-13 조규상 Information Display Ranking Decision System and Method
CN111522940A (en) * 2020-04-08 2020-08-11 百度在线网络技术(北京)有限公司 Method and device for processing comment information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915673A (en) * 2014-03-11 2015-09-16 株式会社理光 Object classification method and system based on bag of visual word model
CN107807966A (en) * 2017-10-13 2018-03-16 深圳市迅雷网络技术有限公司 A kind of sensitive information screen method and service end
KR20200084506A (en) * 2019-01-03 2020-07-13 조규상 Information Display Ranking Decision System and Method
CN109977403A (en) * 2019-03-18 2019-07-05 北京金堤科技有限公司 The recognition methods of malice comment information and device
CN111522940A (en) * 2020-04-08 2020-08-11 百度在线网络技术(北京)有限公司 Method and device for processing comment information

Also Published As

Publication number Publication date
CN112231484A (en) 2021-01-15

Similar Documents

Publication Publication Date Title
CN112231484B (en) News comment auditing method, system, device and storage medium
CN106599155B (en) Webpage classification method and system
KR100815530B1 (en) Method and system for filtering obscene contents
WO2019218699A1 (en) Fraud transaction determining method and apparatus, computer device, and storage medium
US11361570B2 (en) Receipt identification method, apparatus, device and storage medium
CN111310446B (en) Information extraction method and device for judge document
CA3117374C (en) Sensitive data detection and replacement
US7779007B2 (en) Identifying content of interest
CN107807941A (en) Information processing method and device
CN111522724B (en) Method and device for determining abnormal account number, server and storage medium
CN111612284A (en) Data processing method, device and equipment
CN117195319A (en) Verification method and device for electronic part of file, electronic equipment and medium
CN111695357A (en) Text labeling method and related product
CN116401343A (en) Data compliance analysis method
CN115690819A (en) Big data-based identification method and system
CN112989167B (en) Method, device and equipment for identifying transport account and computer readable storage medium
CN115687754A (en) Active network information mining method based on intelligent conversation
CN105868271B (en) Surname statistical method and device
CN115880702A (en) Data processing method, device, equipment, program product and storage medium
CN114581066A (en) Medical order processing method and device, computer equipment and storage medium
CN112434126B (en) Information processing method, device, equipment and storage medium
CN113297482A (en) User portrait depicting method and system based on multi-model search engine data
CN112417847A (en) News content safety monitoring method, system, device and storage medium
CN112800771A (en) Article identification method and device, computer readable storage medium and computer equipment
Aggarwal et al. Geo-localized public perception visualization using GLOPP for social media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant