CN114565800A - Method for detecting illegal picture and picture detection engine - Google Patents
Method for detecting illegal picture and picture detection engine Download PDFInfo
- Publication number
- CN114565800A CN114565800A CN202210452325.8A CN202210452325A CN114565800A CN 114565800 A CN114565800 A CN 114565800A CN 202210452325 A CN202210452325 A CN 202210452325A CN 114565800 A CN114565800 A CN 114565800A
- Authority
- CN
- China
- Prior art keywords
- picture
- detection
- user
- black
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 187
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000001914 filtration Methods 0.000 claims abstract description 47
- 238000007781 pre-processing Methods 0.000 claims abstract description 33
- 238000004422 calculation algorithm Methods 0.000 claims description 23
- 238000013145 classification model Methods 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 16
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 238000007477 logistic regression Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000012360 testing method Methods 0.000 claims description 8
- 238000005520 cutting process Methods 0.000 claims description 5
- 238000007689 inspection Methods 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 6
- 230000009471 action Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012015 optical character recognition Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000002411 adverse Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration using local operators
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method for detecting illegal pictures and a picture detection engine; the picture detection engine comprises a data interface module, a black and white list filtering module, a picture preprocessing module, a text recognition detection module, a theme detection module, an illegal decision-making module, a user detection module, an engine database and an engine management module; the illegal picture detection method realizes identification and detection of illegal pictures by means of the modules of the picture detection engine.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method for detecting illegal pictures and a picture detection engine.
Background
With the increasing growth of network users, the auditing and governing problems of the content issued by the users of the internet platform are increasingly severe, and the information or the content which violates the law or the regulations of the internet platform needs to be discovered and governed in time, so as to avoid the adverse social influence or the negative influence on the normal operation of the internet platform caused by the information issued by the users. Therefore, the internet platform needs to rely on an efficient and accurate user-generated content verification method to fulfill the above-mentioned needs.
In recent years, pictures become one of the main forms of information release for users of internet platforms, and the violation detection requirements for pictures are increasing. The current inspection method for the user generated picture content comprises manual auditing, deep learning, picture clustering, picture character recognition and the like.
However, the pictures contain rich themes and numerous contents, so that more types exist in violation of the pictures; the traditional illegal picture detection method has a good detection effect on a single picture theme or illegal types, but the problem of missed detection or false detection often occurs on complex picture themes and contents, a large amount of labor cost is consumed for manual detection, and adverse social influence is often caused due to untimely manual detection.
Disclosure of Invention
Aiming at the technical limitations, the invention provides a method for detecting illegal pictures and a picture detection engine;
in order to achieve the purpose, the invention adopts the following technical scheme:
the embodiment of the invention provides a method for detecting illegal pictures and a picture detection engine.
The picture detection engine comprises a data interface module, a black and white list filtering module, a picture preprocessing module, a text recognition detection module, a theme detection module, an illegal decision-making module, a user detection module, an engine database and an engine management module.
The data interface module is used for acquiring user issued picture request data, acquiring user information data from an external database and outputting a picture compliance inspection result. The black and white list filtering module is used for filtering black and white lists of users, ips and pictures; the picture preprocessing module is used for reading picture data issued by a user, carrying out picture format conversion, carrying out picture cutting rotation conversion and classifying the pictures according to contents. The text recognition detection module is used for extracting text content containing text pictures and carrying out text violation detection. The theme detection module is used for carrying out violation detection on the pictures according to the associated theme types in the picture request data issued by the user. The user detection module is used for calculating user risk probability according to the user behavior data.
The engine database is used for storing data depended by the picture detection engine, and comprises a violation text database, an associated subject picture database and a black and white list database. The illegal text database stores illegal text keywords, the associated subject picture database stores illegal pictures and subject labels of associated subjects, and the black-and-white list database is used for storing a user id black-and-white list, an ip black-and-white list and a picture black-and-white list.
And the violation decision module is used for judging whether the picture is violated according to the results of the black-and-white list filtering module, the text recognition detection module, the theme detection module and the user detection module. The engine management module is used for optimizing key parameters of the picture detection engine and an engine database.
The illegal picture detection method comprises the following steps:
step S1, the data interface module obtains the user issued picture data, including user data, picture data, and associated subject data;
step S2, the black-and-white list filtering module filters the black-and-white list of the picture data issued by the user, and inputs the corresponding result into the violation decision module to execute a first violation judgment operation to obtain a first violation judgment result; if the first violation judgment result represents that the black-and-white list is hit, outputting a first violation judgment result through a data interface module;
step S3, if the first violation judgment result represents that the black-and-white list is not hit, inputting the picture data issued by the user into a picture preprocessing module to perform picture preprocessing operation, and obtaining a picture preprocessing result; meanwhile, inputting the user data in the picture data issued by the user into a user detection module for user detection operation to obtain a user detection result;
the picture preprocessing result comprises processed picture data and a picture classification result; the user detection result comprises a user risk probability value;
step S4, according to the picture classification result in the picture preprocessing result, the operation is carried out: if the picture classification result is a picture containing text, inputting the picture preprocessing result into a text recognition detection module for text detection to obtain a text violation detection result; if the picture classification result is a non-text picture, inputting the picture preprocessing result into a theme detection module to perform theme violation detection to obtain a theme violation detection result;
and step S5, the violation decision module makes violation decisions according to the user detection result, the text violation detection result and the subject violation detection result to obtain violation judgment results, and the violation judgment results are output by the data interface module.
Compared with the prior art, the invention has obvious advantages and beneficial effects. By means of the technical scheme, the method for detecting the illegal picture and the picture detection engine provided by the invention achieve considerable technical progress and practicability, have wide industrial utilization value and at least have the following advantages:
through picture filtering and picture enhancement of picture preprocessing, the signal-to-noise ratio of the picture is optimized, and the efficiency and the accuracy of picture violation detection are improved; the picture content is detected through picture classification detection, different detection models are adopted for different subject contents, and the picture violation detection accuracy is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.
Drawings
Fig. 1 is a diagram of a picture detection engine for detecting illegal pictures according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description will be made on a method for detecting an illegal picture and a picture detection engine according to the present invention with reference to the accompanying drawings and preferred embodiments.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. A process may be terminated when its operations are completed, but may have additional steps not included in the figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
The following is an explanation of the embodiments of the present invention to which the terms pertain:
picture filtering: i.e. suppress picture noise while preserving as much picture detail as possible.
Enhancing the picture: i.e. enhancing useful information in the image, thereby enhancing the picture interpretation and recognition effect.
OCR: i.e., optical character recognition, refers to a process in which an electronic device (e.g., a scanner or digital camera) examines a character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a character recognition method.
The following specifically describes implementations of the present invention in conjunction with the foregoing noun terms:
the embodiment of the invention provides a method for detecting illegal pictures and a picture detection engine.
Referring to fig. 1, the picture detection engine includes a data interface module, a black and white list filtering module, a picture preprocessing module, a text recognition detection module, a theme detection module, an illegal decision module, a user detection module, an engine database, and an engine management module.
The data interface module is used for acquiring user issued picture request data, acquiring user information data from an external database and outputting a picture compliance inspection result. The black and white list filtering module is used for filtering black and white lists of users, ips and pictures; the picture preprocessing module is used for reading picture data issued by a user, carrying out picture format conversion, carrying out picture cutting rotation conversion and classifying the pictures according to contents. The text recognition detection module is used for extracting text content containing text pictures and carrying out text violation detection. The theme detection module is used for carrying out violation detection on the pictures according to the associated theme types in the picture request data issued by the user. The user detection module is used for calculating user risk probability according to the user behavior data.
The engine database is used for storing data depended by the picture detection engine, and comprises a violation text database, an associated subject picture database and a black and white list database. The illegal text database stores illegal text keywords, the associated subject picture database stores illegal pictures and subject labels of associated subjects, and the black-and-white list database is used for storing a user id black-and-white list, an ip black-and-white list and a picture black-and-white list.
And the violation decision module is used for judging whether the picture is violated according to the results of the black-and-white list filtering module, the text recognition detection module, the theme detection module and the user detection module. The engine management module is used for optimizing key parameters of the picture detection engine and an engine database.
The illegal picture detection method comprises the following steps:
step S1, the data interface module obtains the user issued picture data, including user data, picture data, and associated subject data;
step S2, the black and white list filtering module filters the black and white list of the picture data issued by the user, and inputs the corresponding result into the violation decision module to execute a first violation judgment operation, so as to obtain a first violation judgment result; if the first violation judgment result represents that the black-and-white list is hit, outputting a first violation judgment result through a data interface module;
step S3, if the first violation judgment result represents that the black-and-white list is not hit, inputting the picture data issued by the user into a picture preprocessing module to perform picture preprocessing operation, and obtaining a picture preprocessing result; meanwhile, inputting the user data in the picture data issued by the user into a user detection module for user detection operation to obtain a user detection result;
the picture preprocessing result comprises processed picture data and a picture classification result; the user detection result comprises a user risk probability value;
step S4, according to the picture classification result in the picture preprocessing result, performing operation: if the picture classification result is a picture containing text, inputting the picture preprocessing result into a text recognition detection module for text detection to obtain a text violation detection result; if the picture classification result is a non-text picture, inputting the picture preprocessing result into a theme detection module to perform theme violation detection to obtain a theme violation detection result;
and step S5, the violation decision module makes violation decisions according to the user detection result, the text violation detection result and the subject violation detection result to obtain violation judgment results, and the violation judgment results are output by the data interface module.
As an embodiment, the data of the picture data published by the user in step S1 is JSON in the form of { 'user': id ': ip': }, 'image': data ': type': ',' size ':' } 'tag': [ ] }. The user refers to user data, the data form is a dictionary, the id is a user id, and the ip is a user login ip; "image" refers to picture data in the form of a dictionary, "data" is a picture coding string, "type" is a picture format, and "size" is picture resolution information; "tag" refers to a picture associated subject label, and the data is in the form of a string list. The picture formats comprise jpg, jpeg, png, bmp and svg; the encoding character string is an encoding character string for storing picture data.
As an embodiment, the black and white list filtering in step S2 includes user id black and white list filtering, user ip black and white list filtering, and picture black and white list filtering.
The black and white list filtering of the user id is realized by the following modes: and querying the user id in a black and white list of the user id in the black and white list database by taking the user id as a keyword, and outputting a corresponding query result, wherein the result comprises '0', '1', '2', '0' indicating that the user id is a white list id, '1' indicating that the user id is a black list id, and '2' indicating that no query result exists.
The user ip black and white list filtering is realized by the following modes: and inquiring the user ip as a keyword in the ip black-and-white list in the black-and-white list database, and outputting a corresponding inquiry result, wherein the result comprises '0', '1', '2', '0' indicating that the user ip is a white list ip, '1' indicating that the user ip is a black list ip, and '2' indicating that no inquiry result exists.
The picture black and white list filtering is realized by the following modes: converting the picture into a gray-scale image and performing hash operation to obtain a picture key code, inquiring in a picture black and white list in the black and white list database by taking the picture key code as a key word, and outputting a corresponding inquiry result, wherein the result comprises '0', '1', '2', '0' indicating that the picture is a white list picture, '1' indicating that the picture is a black list picture, and '2' indicating that no inquiry result exists. Wherein the hash operation adopts an MD5 algorithm; the picture black-and-white list stores black-and-white list picture key codes and black-and-white list identifications, and the black-and-white list picture key codes are obtained by performing hash operation after gray scale conversion.
As an example, the first violation determination result in step S2 is that the character string is used to characterize whether the black-and-white list is hit, and includes "0", "1", and "2", where "0" indicates that the black-and-white list is hit, and the determination result is the white list, "1" indicates that the black-and-white list is hit, and the determination result is the black list, and "2" indicates that the black-and-white list is not hit.
The first violation judgment operation is performed according to the following rule:
if the black and white list of the user id is filtered, the black and white list of the user ip is filtered, and the black and white list of the picture has 0 and does not have 1, the first violation judgment result is 0; if the black and white list filtering result of the user id, the black and white list filtering result of the user ip and the black and white list filtering result of the picture are '1', the first violation judgment result is '1'; and if the black and white list filtering result of the user id, the black and white list filtering result of the user ip and the black and white list filtering result of the picture are all '2', the first violation judgment result is '2'.
As an embodiment, the picture preprocessing operation in step S3 includes: the method comprises the following steps of picture filtering, picture enhancement and image classification detection, and specifically comprises the following steps: reading an input picture according to a picture coding mode and converting a color space into an RGB space to obtain first picture data; performing picture filtering and picture enhancement processing on the first picture data to obtain second picture data; and performing picture classification detection on the second picture data to obtain picture classification data.
As an embodiment, the picture filtering is implemented by the following algorithm:
(1) conversion to grayscale for input picturesObtaining a three-dimensional matrix according to the following mapping mode:
(2) obtaining the dimensionality-increased matrix for the three-dimensional matrix according to the following modeAnd a weight matrix :
(3) Obtaining a filtered image:
wherein interp () is an interpolation function; g is a linearized spatial proximity factorFactor of similarity to grayThe calculation method is as follows:
wherein p = (i, j) is a central pixel point, q is a neighborhood pixel point of the central pixel point p,the gray values of the pixel points p and q are respectively.Represents the spatial distance of p and q,indicating the gray scale distance of p, q.The spatial distance standard deviation and the gray level distance standard deviation are based on the Gaussian function respectively.
As an embodiment, the picture addition is implemented by an algorithm:
pixel point of picture (i, j) positionThe transformation is performed in such a way that a pixel of the processed position is obtained :
The depth represents the enhancement intensity of the picture, and generally, depth =2 is taken for middle-range enhancement, and depth =2.5 is taken for high-range enhancement.
As an example, the picture classification detection is performed by:
(1) performing picture feature extraction on the second picture data to obtain first picture feature data, and inputting the trained first picture classification model to obtain a first picture classification result; the first picture classification model is used for distinguishing whether the picture contains text or not; the first picture classification result is 'T' or 'N-T', the 'T' represents that the picture contains the text, and the 'N-T' represents that the picture does not contain the text;
(2) when the first picture classification result is 'T', finishing picture classification detection and outputting a list containing the first picture classification result; when the image classification result is 'N-T', inputting the first image characteristic data into a second image classification model to obtain a second image classification result, finishing image classification detection and merging and outputting the first image classification result and the second image classification result; the second picture classification model is used for identifying the theme tags related to the detected pictures, and the second picture classification results are lists containing picture theme tag character strings.
The image feature extraction adopts an HOG algorithm, namely a histogram of oriented gradients algorithm, which is a mature technical means and is not described herein any more.
The first picture classification model is obtained by the following method: acquiring first model original data including picture data and a label of whether the picture contains a text or not in a manual screening mode; splitting the first model original data into a first model training set and a first model testing set; and training a first picture classification model through a first model training set by adopting a Logistic Regression algorithm (Logistic Regression), evaluating and optimizing by depending on a first model testing set, and outputting the first picture classification model meeting the requirements of recall rate and accuracy.
The Logistic Regression algorithm (Logistic Regression) is a mature technical means, and those skilled in the art can smoothly implement the above description, which is not described herein again.
The second image classification model is a convolutional neural network classifier, and the implementation method is a mature technical means, so that the detailed description is omitted; the classification result and the output identification of the second image classification model are shown in table 1:
TABLE 1 output identification corresponding to each classification result of the second picture classification model
As an example, the user detection operation in step S3 is performed by:
the user detection module performs feature extraction on the input user information and the equipment environment information to obtain user feature data, and inputs the user feature data to the trained user analysis model to obtain a user risk probability value.
The user detection result is a user risk probability numerical value, and represents whether the user who sends the user generated text request has malicious publishing risk, wherein 0 represents no violation, 1 represents violation, and the rest numerical values represent violation possibility.
The user analysis model is obtained by the following method: carrying out data cleaning and feature extraction on an original user operation data set to obtain a user analysis model data set; splitting the user analysis model data set into a user analysis model training set and a user analysis model testing set; training a user analysis model by using a machine learning algorithm depending on a user analysis model training set, and evaluating the user analysis model by using a user analysis model test set; and adjusting parameters to continuously train the model until the recall rate and the accuracy rate meet preset threshold values, and outputting the user analysis model.
The raw user data set is obtained directly from a user database external to the text detection engine, and the raw user operation data set includes, but is not limited to, the following data fields: the method comprises the steps of operating objects, operating types, operating time, login ip addresses during operation, violation identifications, violation type labels and violation time.
It is understood that the machine learning algorithm adopted in the training of the user analysis model includes: the logistic regression algorithm, the decision tree, the genetic algorithm, the support vector machine (SVN), the K-means algorithm, and the random forest and naive bayes algorithm are different in program design when different algorithms are adopted, but are mature technical means, and a person skilled in the art can completely and smoothly realize the algorithms according to the description of the above embodiments, and details are not repeated herein.
As an embodiment, the text recognition detection module performs text detection by:
(1) performing text recognition and extraction on an input picture to obtain text content to be detected;
(2) detecting the text to be detected according to a preset rule of rule-breaking text by a regular matching mode, wherein if the matching is successful, the text detection result is 'rule-breaking'; if the result is not matched, performing word segmentation processing on the text to be detected, and removing safety words to obtain a keyword list;
(3) querying the illegal text database by taking character strings in the keyword list as keywords, and if the keywords hit the illegal text database, determining that the text detection result is illegal; and if the keyword does not hit the illegal text database, the text detection result is safe.
The text violation detection result includes "0", "1", "0" indicates "safe", and "1" indicates "violation".
The text recognition and extraction can be realized by using an OCR algorithm adopting a "CNN + BLSTM + CTC" architecture, which is a mature technical means, and a person skilled in the art can completely and smoothly realize the algorithm according to the description of the above embodiment, and details are not described herein.
As an example, the topic detection module implements topic violation detection by:
(1) matching different detection models according to picture classification results obtained by picture preprocessing, inputting picture data into corresponding detection models for detection to obtain violation detection results of the corresponding models, and assembling the violation detection results into a violation detection result list; wherein elements in the violation detection result list are duplets including corresponding absolute risk factors and violation detection results;
(2) calculating a topic violation risk probability value according to the violation detection result list; the method comprises the following steps:
wherein M is a detection model set corresponding to the classification result of the input picture,in order to detect the absolute risk factor of the model,the detection result of the corresponding detection model is obtained; the detection model and the risk factor corresponding to the image classification result are shown in table 2.
Table 2 detection model, output result, and absolute risk factor corresponding to each classification result
Wherein the output result comprises "0", "1"; a "0" is used to characterize the picture as not violating the rule, and a "1" is used to characterize the picture as violating the rule.
As an embodiment, the first topic detection model is obtained by:
obtaining first subject original data related to subjects of people and human body parts in a manual screening mode, wherein the first subject original data comprise picture data and a label for judging whether the picture violates rules or not; splitting the first subject original data into a first subject model training set and a first subject model testing set; and training a first topic detection model through a first topic model training set by adopting a Logistic Regression algorithm (Logistic Regression), evaluating and optimizing by means of a first topic model test set, and outputting the first topic detection model meeting the requirements of recall rate and accuracy.
The training methods of the second theme detection model, the third theme detection model, the fourth theme detection model and the fifth theme detection model are similar to the training method of the first theme detection model, and can be realized only by replacing the original training picture set with the corresponding theme picture, which is not repeated herein.
As an example, the flag filter is implemented by:
(1) cutting the picture data into a plurality of image blocks through a Normalized cutting algorithm (Normalized-Cut);
(2) respectively carrying out picture similarity calculation on each image block and a picture with a subject label of 'Logo' in an associated subject picture database to obtain similarity calculation results, and assembling all the similarity calculation results into a similarity numerical value list; the picture similarity calculation adopts a Hamming distance similarity calculation method;
(3) if the maximum value in the similarity numerical value list is larger than the similarity threshold valueThe flag filter outputs a "1", otherwise a "0" is output.
Preferably, the similarity thresholdAnd when the value is 70%, the accuracy of the detection result is better.
As an example, the violation decision module in step S5 makes the violation decision by the following rule:
if the first judgment result output by the blacklist filtering module represents that the blacklist is hit, the violation decision result is '1'; if the first judgment result output by the blacklist filtering module represents hit in the white list, the violation decision result is '0';
if the first violation judgment result represents that the black-and-white list is not hit, performing violation decision according to the detection result of the text recognition detection module or the theme detection module: when the text violation detection result of the text recognition detection module is '1', the violation decision result is '1'; when the text violation detection result of the text recognition detection module is '0', if the topic violation risk probability value of the topic detection module is greater than the preset topic risk probability threshold value, the violation decision result is '1', otherwise, the violation decision result is '0'; and when no text violation detection result exists and the topic violation risk probability value of the topic detection module is greater than the preset topic risk probability threshold, the violation decision result is '1', otherwise, the violation decision result is '0'.
The picture detection engine also comprises an engine management module, and the engine management module is used for supporting the key parameter optimization of the picture engine.
The key parameter optimization refers to that operation and maintenance personnel of the image detection engine add, modify and delete data in the illegal text database, the associated subject image database and the black and white list database according to business needs through a database operation interface provided by an engine management module.
The present invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computers having computer-usable program code embodied therein, which may be non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like).
Finally, it is noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "include", "including" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article, or terminal device including a series of elements includes not only those elements but also other elements not explicitly listed or inherent to such process, method, article, or terminal device. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. A method for illegal picture detection,
the illegal picture detection method comprises the following steps:
step S1, the data interface module obtains the user issued picture data, including user data, picture data, and associated subject data;
step S2, the black-and-white list filtering module filters the black-and-white list of the picture data issued by the user, and inputs the corresponding result into the violation decision module to execute a first violation judgment operation to obtain a first violation judgment result; if the first violation judgment result represents that the black-and-white list is hit, outputting a first violation judgment result through a data interface module;
step S3, if the first violation judgment result represents that the black-and-white list is not hit, inputting the picture data issued by the user into a picture preprocessing module to perform picture preprocessing operation, and obtaining a picture preprocessing result; meanwhile, inputting the user data in the picture data issued by the user into a user detection module for user detection operation to obtain a user detection result;
the picture preprocessing result comprises processed picture data and a picture classification result; the user detection result comprises a user risk probability value;
step S4, according to the picture classification result in the picture preprocessing result, performing operation: if the picture classification result is a picture containing text, inputting the picture preprocessing result into a text recognition detection module for text detection to obtain a text violation detection result; if the picture classification result is a non-text picture, inputting the picture preprocessing result into a theme detection module to perform theme violation detection to obtain a theme violation detection result;
step S5, the violation decision module makes violation decisions according to the user detection result, the text violation detection result and the subject violation detection result to obtain violation judgment results, and the violation judgment results are output by the data interface module;
in step S2, the black and white list filtering includes user id black and white list filtering, user ip black and white list filtering, and picture black and white list filtering.
2. The method of claim 1,
the text detection is realized by depending on an illegal text database, and the illegal text database stores illegal text keywords;
the topic violation detection is realized by depending on an associated topic picture database, and the associated topic picture database stores violation pictures and topic labels of associated topics;
the black and white list filtering depends on a black and white list database, and the black and white list database is used for storing a user id black and white list, an ip black and white list and a picture black and white list.
3. The method of claim 1,
the black and white list filtering of the user id is realized by the following modes: inquiring a user id in a black and white list of the user id in a black and white list database by taking the user id as a keyword, and outputting a corresponding inquiry result, wherein the result comprises '0', '1', '2', '0' indicating that the user id is a white list id, '1' indicating that the user id is a black list id, and '2' indicating that no inquiry result exists;
the filtering of the black and white list of the user ip is realized by the following modes: inquiring a user ip in an ip black-and-white list in a black-and-white list database by taking the user ip as a keyword, and outputting a corresponding inquiry result, wherein the result comprises '0', '1', '2', '0' indicating that the user ip is a white list ip, '1' indicating that the user ip is a black list ip, and '2' indicating that no inquiry result exists;
the black and white list filtering of the picture is realized in the following way: converting the picture into a gray-scale image and performing hash operation to obtain a picture key code, inquiring in a picture black-and-white list in a black-and-white list database by taking the picture key code as a keyword, and outputting a corresponding inquiry result, wherein the result comprises ' 0 ', ' 1 ', ' 2 ', ' 0 ' indicating that the picture is a white list picture, 1 ' indicating that the picture is a black list picture, and ' 2 ' indicating that no inquiry result exists; wherein the hash operation adopts an MD5 algorithm; the picture black-and-white list stores black-and-white list picture key codes and black-and-white list identifications, and the black-and-white list picture key codes are obtained by performing hash operation after gray scale conversion.
4. The method of claim 1,
the picture preprocessing operation in step S3 includes: the method comprises the following steps of picture filtering, picture enhancement and image classification detection, and specifically comprises the following steps: reading an input picture according to a picture coding mode and converting a color space into an RGB space to obtain first picture data; performing picture filtering and picture enhancement processing on the first picture data to obtain second picture data; and performing picture classification detection on the second picture data to obtain picture classification data.
5. The method of claim 4,
the picture filtering is realized by the following algorithm: converting an input picture into a gray-scale image and obtaining a dimensional-increased three-dimensional matrix according to a preset mapping mode; obtaining an increased dimension matrix IX and a weight matrix EX for the three-dimensional matrix according to a preset transformation mode; obtaining a filtered image through spatial interpolation;
the picture enhancement is realized by an algorithm:
pixel point of picture (i, j) positionThe transformation is performed in such a way that a pixel of the processed position is obtained:
Where depth represents the picture enhancement strength, depth =2 for mid-range enhancement, and depth =2.5 for enhancement.
6. The method of claim 4,
the picture classification detection is carried out in the following way:
performing picture feature extraction on the second picture data to obtain first picture feature data, and inputting the trained first picture classification model to obtain a first picture classification result; the first picture classification model is used for distinguishing whether the picture contains text or not; the first picture classification result is 'T' or 'N-T', the 'T' represents that the picture contains the text, and the 'N-T' represents that the picture does not contain the text;
when the first picture classification result is 'T', finishing picture classification detection and outputting a list containing the first picture classification result; when the image classification result is 'N-T', inputting the first image characteristic data into a second image classification model to obtain a second image classification result, finishing image classification detection and merging and outputting the first image classification result and the second image classification result; the second picture classification model is used for identifying the theme tags related to the detected pictures, and the second picture classification results are lists containing picture theme tag character strings.
7. The method of claim 6,
the first picture classification model is obtained by the following method: acquiring first model original data including picture data and a label of whether the picture contains a text or not in a manual screening mode; splitting the first model original data into a first model training set and a first model testing set; training a first picture classification model through a first model training set by adopting a logistic regression algorithm, evaluating and optimizing by means of a first model testing set, and outputting the first picture classification model meeting the requirements of recall rate and accuracy;
the second image classification model is a convolutional neural network classifier.
8. The method of claim 1,
the user detection operation in step S3 is performed by:
the user detection module performs feature extraction on the input user information and the equipment environment information to obtain user feature data, and inputs the user feature data to the trained user analysis model to obtain a user risk probability value.
9. The method of claim 1,
the theme detection module realizes theme violation detection in the following way:
matching different detection models according to the picture classification result obtained by picture preprocessing, inputting picture data into the corresponding detection model for detection to obtain violation detection results of the corresponding model, and splicing the violation detection results into a violation detection result list;
and calculating a topic violation risk probability value according to the violation detection result list, wherein the method comprises the following steps:
10. A picture detection engine for violation picture detection,
the image detection engine comprises a data interface module, a black and white list filtering module, an image preprocessing module, a text recognition detection module, a theme detection module, an illegal decision-making module, a user detection module, an engine database and an engine management module;
the data interface module is used for acquiring user issued picture request data, acquiring user information data from an external database and outputting a picture compliance inspection result; the black and white list filtering module is used for filtering black and white lists of users, ips and pictures; the picture preprocessing module is used for reading picture data issued by a user, carrying out picture format conversion, carrying out picture cutting rotation conversion and classifying the pictures according to contents; the text recognition detection module is used for extracting text contents containing text pictures and carrying out text violation detection; the theme detection module is used for carrying out violation detection on the pictures according to the associated theme types in the picture request data issued by the user; the user detection module is used for calculating user risk probability according to user behavior data; the engine database is used for storing data depended by the picture detection engine and comprises a violation text database, an associated subject picture database and a black and white list database; the violation decision module is used for judging whether the picture is violated according to the results of the black-and-white list filtering module, the text recognition detection module, the theme detection module and the user detection module; the engine management module is used for supporting the optimization of key parameters of the picture engine.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210452325.8A CN114565800B (en) | 2022-04-24 | 2022-04-24 | Method for detecting illegal picture and picture detection engine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210452325.8A CN114565800B (en) | 2022-04-24 | 2022-04-24 | Method for detecting illegal picture and picture detection engine |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114565800A true CN114565800A (en) | 2022-05-31 |
CN114565800B CN114565800B (en) | 2022-07-29 |
Family
ID=81721316
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210452325.8A Active CN114565800B (en) | 2022-04-24 | 2022-04-24 | Method for detecting illegal picture and picture detection engine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114565800B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150135264A1 (en) * | 2013-09-17 | 2015-05-14 | Amigon Technologies Ltd. | Method and system for prevention of malware infections |
CN105824871A (en) * | 2015-01-23 | 2016-08-03 | 阿里巴巴集团控股有限公司 | Picture detecting method and equipment |
CN106257496A (en) * | 2016-07-12 | 2016-12-28 | 华中科技大学 | Mass network text and non-textual image classification method |
US20170034094A1 (en) * | 2011-05-02 | 2017-02-02 | Facebook, Inc. | Reducing Photo-Tagging Spam |
CN109657088A (en) * | 2018-09-30 | 2019-04-19 | 阿里巴巴集团控股有限公司 | A kind of picture risk checking method, device, equipment and medium |
CN111095258A (en) * | 2017-06-29 | 2020-05-01 | 最佳应用有限责任公司 | Computer-assisted system and method for creating customized products |
CN111291210A (en) * | 2020-01-14 | 2020-06-16 | 广州视源电子科技股份有限公司 | Image material library generation method, image material recommendation method and related device |
CN111324764A (en) * | 2020-02-18 | 2020-06-23 | 北京金山安全软件有限公司 | Image detection method and device, electronic equipment and storage medium |
CN111767493A (en) * | 2020-07-07 | 2020-10-13 | 杭州安恒信息技术股份有限公司 | Method, device, equipment and storage medium for displaying content data of website |
CN112686336A (en) * | 2021-01-28 | 2021-04-20 | 杭州电子科技大学 | Burn surface of a wound degree of depth classification system based on neural network |
-
2022
- 2022-04-24 CN CN202210452325.8A patent/CN114565800B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170034094A1 (en) * | 2011-05-02 | 2017-02-02 | Facebook, Inc. | Reducing Photo-Tagging Spam |
US20150135264A1 (en) * | 2013-09-17 | 2015-05-14 | Amigon Technologies Ltd. | Method and system for prevention of malware infections |
CN105824871A (en) * | 2015-01-23 | 2016-08-03 | 阿里巴巴集团控股有限公司 | Picture detecting method and equipment |
CN106257496A (en) * | 2016-07-12 | 2016-12-28 | 华中科技大学 | Mass network text and non-textual image classification method |
CN111095258A (en) * | 2017-06-29 | 2020-05-01 | 最佳应用有限责任公司 | Computer-assisted system and method for creating customized products |
CN109657088A (en) * | 2018-09-30 | 2019-04-19 | 阿里巴巴集团控股有限公司 | A kind of picture risk checking method, device, equipment and medium |
CN111291210A (en) * | 2020-01-14 | 2020-06-16 | 广州视源电子科技股份有限公司 | Image material library generation method, image material recommendation method and related device |
CN111324764A (en) * | 2020-02-18 | 2020-06-23 | 北京金山安全软件有限公司 | Image detection method and device, electronic equipment and storage medium |
CN111767493A (en) * | 2020-07-07 | 2020-10-13 | 杭州安恒信息技术股份有限公司 | Method, device, equipment and storage medium for displaying content data of website |
CN112686336A (en) * | 2021-01-28 | 2021-04-20 | 杭州电子科技大学 | Burn surface of a wound degree of depth classification system based on neural network |
Non-Patent Citations (1)
Title |
---|
贺马强: "不良图片过滤系统设计和仿真", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN114565800B (en) | 2022-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106529380B (en) | Image recognition method and device | |
CN110555372A (en) | Data entry method, device, equipment and storage medium | |
WO2020164278A1 (en) | Image processing method and device, electronic equipment and readable storage medium | |
CN108319672B (en) | Mobile terminal bad information filtering method and system based on cloud computing | |
CN111694946A (en) | Text keyword visual display method and device and computer equipment | |
Niu et al. | A novel image retrieval method based on multi-features fusion | |
CN108734159B (en) | Method and system for detecting sensitive information in image | |
CN110704687A (en) | Character layout method, device and computer readable storage medium | |
CN113792659B (en) | Document identification method and device and electronic equipment | |
JP6882362B2 (en) | Systems and methods for identifying images, including identification documents | |
CN110598019A (en) | Repeated image identification method and device | |
CN113469214A (en) | False news detection method and device, electronic equipment and storage medium | |
Chugh et al. | [Retracted] Image Retrieval Using Different Distance Methods and Color Difference Histogram Descriptor for Human Healthcare | |
Tian et al. | Image classification based on the combination of text features and visual features | |
CN107203638B (en) | Monitoring video processing method, device and system | |
CN114565800B (en) | Method for detecting illegal picture and picture detection engine | |
Obaidullah et al. | Comparison of different classifiers for script identification from handwritten document | |
CN111988327B (en) | Threat behavior detection and model establishment method and device, electronic equipment and storage medium | |
CN113111882A (en) | Card identification method and device, electronic equipment and storage medium | |
Jain et al. | Categorization of spam images and identification of controversial images on mobile phones using machine learning and predictive learning | |
CN114005004B (en) | Fraud website identification method and system based on picture instance level characteristics | |
CN111507850A (en) | Authority guaranteeing method and related device and equipment | |
CN113158745B (en) | Multi-feature operator-based messy code document picture identification method and system | |
CN116737726A (en) | Method and device for realizing data resource classification based on data fingerprint | |
Ahmed et al. | Image splicing detection and localisation using efficientnet and modified u-net architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |