CN108734184B - Method and device for analyzing sensitive image - Google Patents

Method and device for analyzing sensitive image Download PDF

Info

Publication number
CN108734184B
CN108734184B CN201710248908.8A CN201710248908A CN108734184B CN 108734184 B CN108734184 B CN 108734184B CN 201710248908 A CN201710248908 A CN 201710248908A CN 108734184 B CN108734184 B CN 108734184B
Authority
CN
China
Prior art keywords
sample
sensitive
training
images
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710248908.8A
Other languages
Chinese (zh)
Other versions
CN108734184A (en
Inventor
杨现
常江龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SuningCom Co ltd
Original Assignee
SuningCom Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SuningCom Co ltd filed Critical SuningCom Co ltd
Priority to CN201710248908.8A priority Critical patent/CN108734184B/en
Publication of CN108734184A publication Critical patent/CN108734184A/en
Application granted granted Critical
Publication of CN108734184B publication Critical patent/CN108734184B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0641Shopping interfaces
    • G06Q30/0643Graphical representation of items or shoppers

Abstract

The embodiment of the invention discloses a method and a device for analyzing a sensitive image, relates to the technical field of image recognition, and can improve the automation level of advertisement image recognition and detection and reduce the manual examination cost. The invention comprises the following steps: clustering sample images in the training sample set, training corresponding various identification models through a convolutional neural network according to the clustered sample images, and identifying corresponding various sensitive pictures from a picture library to be detected by using the identification models obtained through training. The method is suitable for sensitive picture identification on the online platform.

Description

Method and device for analyzing sensitive image
Technical Field
The invention relates to the technical field of image recognition, in particular to a method and a device for analyzing a sensitive image.
Background
With the development of internet technology and the construction of various online trading platforms, online marketing platforms and other network platforms, a great amount of internet advertisements are put on the network platform by large operators and large and small shops at all times. In order to standardize the publishing behavior of internet advertisements and protect the legal rights and interests of consumers, in a new advertising law promulgated in 2015, it is clearly specified that internet advertising activities also need to comply with various regulations of the advertising law.
In the current practical application, the means for monitoring the internet advertisement activities by each network platform is mainly to judge and early warn the advertisements which are possibly illegal by detecting sensitive images. Most of the existing sensitive image recognition methods refer to pornographic images, corresponding detection means and analysis methods are mainly developed in the field of monitoring and recognizing obscene pornographic information according to a public security management punishment method and a criminal method, and the detection means mainly detects sensitive organs. Such as: and manually designing image features with fixed colors, shapes and textures, and matching according to the manually set image features to obtain a suspected sensitive image.
However, the existing method has low recognition accuracy, and often misreads advertisement and publicity images of poultry products, underwear, sporting goods, family planning goods and other goods under normal conditions into sensitive images, and the misreporting problem is solved mainly by means of complaints from a reporting party or manual processing by monitoring personnel, which is difficult to meet the requirement of real-time monitoring of mass internet advertisements launched on a network platform at present, and particularly cannot meet the requirement of filtering and monitoring of mass goods publicity images by an e-commerce platform, so that a detection means with high automation degree needs to be developed so as to control labor cost.
Disclosure of Invention
The embodiment of the invention provides a method for analyzing a sensitive image, which can improve the automation level of identification and detection of advertisement pictures and reduce the manual examination cost.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, an embodiment of the present invention provides a method, including:
in the extracted training sample set, clustering sample images in the training sample set according to the sensitive type corresponding to each sample;
training various corresponding recognition models through a convolutional neural network according to the clustered sample images;
and identifying various corresponding sensitive pictures from the picture library to be detected by using the identification model obtained by training.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the clustering sample images in the training sample set according to the sensitivity type corresponding to each sample includes:
extracting the sensitive features of each sample image from the training sample set through a preset neural network model, wherein the preset neural network model is trained through imagenet;
and clustering the sample images of which the similarity degrees of the sensitive features meet the test rule to the same sample subset through a preset clustering algorithm.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner, the method further includes:
in a subset of samples:
according to the distance between the sample images and the clustering center, sequencing the sample images in the subset from near to far, and selecting the sample image sequenced at the front designated digit as a positive sample;
training a model classifier by using the obtained positive sample;
and carrying out classification calculation on the sample images in the sample subset through the trained model classifier, and removing the sample images with the calculated values lower than a preset threshold.
With reference to the first possible implementation manner of the first aspect, in a third possible implementation manner, the method further includes:
training a depth residual error network with a specified number of layers by using the extracted pre-training data set, wherein the specified number of layers is more than or equal to 50;
and correcting the sample subset through a depth residual error network obtained through training.
With reference to the first aspect, in a fourth possible implementation manner of the first aspect, the method further includes:
identifying various sensitive pictures corresponding to the picture library to be detected, and extracting difficult samples from the identification result after obtaining the identification result;
and updating parameters of the corresponding various types of recognition models according to the difficult samples.
With reference to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner, the extracting a hard-case sample from the recognition result includes:
obtaining score values of all attributes in the sensitive picture, wherein the score values of all attributes in the sensitive picture are obtained through calculation of the identification model;
sorting the attributes of the acquired sensitive pictures according to the sequence of the score values from large to small;
and acquiring the added value of the score values of the attributes of the first designated digit, and judging the attribute as the difficult sample when the added value is greater than a preset confidence threshold value. .
With reference to the first aspect, in a sixth possible implementation manner of the first aspect, the method further includes:
acquiring candidate images from a provider service platform according to a preset service rule, and updating the picture library by using the acquired candidate images;
and/or extracting the training sample set from a sample library pointed by the test rule according to a preset test rule.
In a second aspect, an embodiment of the present invention provides an apparatus, including:
the clustering module is used for clustering the sample images in the training sample set according to the sensitive types corresponding to the samples in the extracted training sample set;
the training module is used for training various corresponding recognition models through a convolutional neural network according to the clustered sample images;
and the analysis module is used for identifying various corresponding sensitive pictures from the picture library to be detected by utilizing the identification model obtained by training.
With reference to the second aspect, in a first possible implementation manner of the second aspect, the clustering module is specifically configured to extract the sensitive features of each sample image from the training sample set through a preset neural network model; clustering the sample images of which the similarity degrees of the sensitive characteristics meet the test rule to the same sample subset through a preset clustering algorithm; wherein the preset neural network model is trained through imagenet.
With reference to the second aspect, in a second possible implementation manner of the second aspect, the method further includes: a filtering module to, in a subset of samples: according to the distance between the sample images and the clustering center, sequencing the sample images in the subset from near to far, and selecting the sample image sequenced at the front designated digit as a positive sample; training a model classifier by using the obtained positive sample; then, carrying out classification calculation on the sample images in the sample subset through the trained model classifier, and removing the sample images with the calculated values lower than a preset threshold;
further comprising: the correction module is used for training a depth residual error network with the appointed number of layers by using the extracted pre-training data set, wherein the appointed number of layers is more than or equal to 50; and correcting the sample subset through a depth residual error network obtained by training.
With reference to the second aspect, in a third possible implementation manner of the second aspect, the method further includes:
the updating module is used for identifying various sensitive pictures corresponding to the picture library to be detected and acquiring score values of various attributes in the sensitive pictures after identification results are obtained, wherein the score values of various attributes in the sensitive pictures are obtained through calculation of the identification model; sequencing the attributes of the acquired sensitive pictures according to the sequence of the score values from big to small;
and acquiring the added value of the score values of the attributes of the first designated digit, and judging the attribute as the difficult sample when the added value is greater than a preset confidence threshold value. (ii) a And updating parameters of the corresponding various recognition models according to the difficult sample.
The method and the device for analyzing the sensitive images provided by the embodiment of the invention have the advantages that the sample images in the training sample set are clustered, the corresponding various identification models are trained through the convolutional neural network according to the clustered sample images, and then the corresponding various sensitive images are identified from the image library to be detected by utilizing the identification models obtained through training, so that whether the images uploaded by the merchants of the e-commerce service platform belong to the corresponding various sensitive images is identified. The automatic detection and scanning of the pictures of the e-commerce service platform uploaded by the commercial customers are realized, the automation level of the advertisement picture identification detection is improved, and the manual examination cost is reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a possible system architecture according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method provided by an embodiment of the present invention;
FIGS. 3 and 4 are schematic diagrams of embodiments provided by embodiments of the present invention;
fig. 5, 6 and 7 are schematic structural diagrams of apparatuses provided by embodiments of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The embodiment of the present invention may be specifically implemented in a system environment as shown in fig. 1, and specifically includes: the system comprises a e-commerce service platform, a monitoring server and a database;
the monitoring server can collect candidate images from the e-commerce service platform according to preset service rules, and the collected candidate images are used for updating the picture library in the database. Specifically, the picture library includes advertisement pictures published on the e-commerce service platform, or pictures published on a commodity detail page, a browsing page, and other pages for presentation to consumers. Various pictures can be collected from the e-commerce service platform in real time (for example, according to a certain update period, such as 10 minutes, 1 hour, etc.) by the monitoring server to serve as the candidate images, and the candidate images are imported into the picture library in the database.
The monitoring server may specifically be a separately made server device, such as: rack, blade, tower or cabinet type server equipment, or hardware equipment with strong computing power such as workstations and large computers; or may be a server cluster consisting of a plurality of server devices.
The database is mainly used for storing a picture library, and specifically may be a Redis database or other types of distributed databases, relational databases, and the like, and specifically may be a data server including a storage device and a storage device connected to the data server, or a server cluster system for a database composed of a plurality of data servers and storage servers.
The e-commerce service platform can be operated on line at present, comprises various service subsystems and is a platform system for on-line transaction and commodity sale. On the hardware level, the e-commerce service platform is specifically composed of a series of server clusters which are mutually communicated, the specific construction mode and the adopted architecture standard of the e-commerce service platform can refer to the common technologies used by the domestic large type network purchasing platforms at present, and the description is omitted in the embodiment.
An embodiment of the present invention provides a method for analyzing a sensitive image, as shown in fig. 2, including:
and S1, clustering the sample images in the training sample set according to the sensitive type corresponding to each sample in the extracted training sample set.
The monitoring server may further establish a test rule base in advance, where the test rule base includes a preset test rule, and the test rule may be specifically set by a technician and input to the monitoring server, for example: technical personnel design some test templates, wherein one set of test templates comprises a set test rule for a specific application scenario, a corresponding training sample set and an algorithm model required for testing, so that a monitoring server extracts the training sample set from a sample library pointed by the test rule according to the preset test rule, and the test rule can be automatically called (or operated by the technical personnel) according to the current specific test environment.
In this embodiment, the sensitive type may be understood as: different kinds of commodities or articles are represented by different pictures of business directions, and different kinds of sensitive types can be preset for the different commodities or articles in the pictures, such as: for 3 kinds of commodities with different business directions, namely poultry meat commodities, underwear commodities and family planning commodities, the types of the commodities represented by the advertisement pictures put in the commodities distributed in the commodities are different, sensitive types 1, 2 and 3 can be set, the advertisement pictures of the poultry meat commodities are clustered to the sensitive type 1, the advertisement pictures of the underwear commodities are clustered to the sensitive type 2, and the advertisement pictures of the family planning commodities are clustered to the sensitive type 3. Therefore, the advertising pictures collected by the monitoring server from the E-commerce service platform are classified and distinguished according to the sensitive types, such as: in the process of identifying and filtering the sensitive image, the monitoring server inputs an identification model of the sensitive image into a url (Uniform resource Locator) of the image to be processed and outputs an attribute classification result of the image to be processed.
And S2, training recognition models corresponding to various types through a convolutional neural network according to the clustered sample images.
Wherein, training various corresponding recognition models through the convolutional neural network can be understood as: training a recognition classification model based on a convolutional neural network, specifically, a currently common convolutional neural network technology can be adopted, a recognition model for recognizing a sensitive picture is constructed according to a specific actual service scene, and recognition models corresponding to different sensitive types can be respectively constructed for different sensitive types.
In this embodiment, when the recognition models corresponding to the sensitive types are specifically trained, different recognition algorithms and reference data for recognizing and matching may be configured for different types of sensitive types, for example: corresponding business knowledge rules can be extracted from each business subsystem in the e-commerce service platform, and the business knowledge rules are used for expressing some unique characteristics of the commodities or articles in the business direction, such as: the business knowledge rule of the poultry meat commodities comprises the types of animal limbs and viscera, and typical legend information, color information and contour information of the animal limbs and the viscera, so that for the sensitive type 1 commodities, the reference data for identifying and matching can comprise the typical legend information and the contour information of the animal limbs and the viscera, and in the process of identification and judgment, images conforming to the reference data are judged to be non-sensitive pictures; for another example: the business knowledge rules of the underwear commodities comprise the shape, color and surface material of common underwear (for example, which areas in the picture are cloth and which areas are the skin of a model can be simply judged through color and luster), then for the sensitive type 2 commodities, the reference data for identifying and matching can comprise the shape, color and surface material, and in the process of identification and judgment, the images which are in accordance with the reference data and in which sensitive organs do not exist (the judgment of the sensitive organs can adopt a traditional judgment mode, such as obscene picture identification means adopted by public security agencies) are judged to be non-sensitive pictures
And S3, recognizing the corresponding various sensitive pictures from the picture library to be detected by using the recognition model obtained by training.
The picture library to be detected comprises pictures uploaded by merchants of the e-commerce service platform, and the monitoring server executes the process of the embodiment, so that whether the pictures uploaded by the merchants of the e-commerce service platform belong to various corresponding sensitive pictures or not is identified.
According to the provision of new advertising laws, it is necessary to enhance the monitoring and management of advertisement pictures published on e-commerce service platforms. However, at present, the problems of low efficiency, high risk, high workload and the like exist when the checking processing is performed through user feedback and artificial naked eyes, and therefore, automatic identification needs to be performed through the monitoring server in the embodiment so as to be capable of timely sending out warning to operators and merchants of the e-commerce service platform.
A specific method for identifying training samples of a filtering algorithm and filtering in the process of identifying and filtering sensitive pictures in the e-commerce business is provided. Different from the traditional method for manually designing the color shape texture image characteristics, the method adopts the convolutional neural network, so that the labor cost for manually designing the characteristics is reduced. The method for analyzing the sensitive images provided by the embodiment of the invention is characterized in that the sample images in the training sample set are clustered, the corresponding various identification models are trained through the convolutional neural network according to the clustered sample images, and then the corresponding various sensitive images are identified from the image library to be detected by utilizing the identification models obtained through training, so that whether the images uploaded by the merchants of the e-commerce service platform belong to the corresponding various sensitive images is identified. The automatic detection and scanning of the pictures of the e-commerce service platform uploaded by the commercial customers are realized, the automation level of the advertisement picture identification detection is improved, and the manual examination cost is reduced.
Specifically, the clustering sample images in the training sample set according to the sensitivity type corresponding to each sample includes:
and extracting the sensitive features of each sample image from the training sample set through a preset neural network model. And clustering the sample images of which the similarity degrees of the sensitive characteristics meet the test rule to the same sample subset through a preset clustering algorithm.
Wherein the preset neural network model is trained by imagenet (a database for image recognition training in computer vision system recognition projects). For example: and extracting the sensitive characteristics of the candidate images by using the neural network model trained by imagenet and then using the trained neural network model. And then extracting and combining the similar sensitive features into a subset by adopting a preset clustering algorithm.
Further comprising a specific way of filtering, screening and sorting each sample subset, taking a sample subset combination as an example, in a sample subset:
and sequencing the sample images in the subset from near to far according to the distance from the cluster center, and selecting the sample image sequenced at the front designated digit as a positive sample. The resulting positive samples are used to train the model classifier. And then carrying out classification calculation on the sample images in the sample subset through the trained model classifier, and removing the sample images with the calculated values lower than a preset threshold. For example, in each candidate set in the subset, a model classifier is trained by using 100 samples closest to the cluster center as positive samples (for example, a classifier based on one class svm can be adopted); and classifying and calculating each image in the subset by using a classifier, and eliminating non-local images with lower scores according to classification results.
Optionally, this embodiment further provides a specific way of correcting the sample subset, including:
and training a depth residual error network with the specified number of layers by using the extracted pre-training data set, wherein the specified number of layers is more than or equal to 50. And then correcting the sample subset through a depth residual error network obtained through training. For example: the 1000-class classification recognition data set used in the open competition held by ImageNet is used as a depth residual error network of 50 layers of pre-training data set training, parameters of a model obtained by pre-training are subjected to fine tuning by using the sample subset subjected to filtering, screening and sorting, so that overfitting caused by too little sensitive image training data is avoided, and complicated feature extraction steps in the traditional recognition algorithm are also avoided.
For example: as shown in fig. 4, for the design of a specific unit structure of a depth residual error network, let a certain hidden layer in the depth network be h (x) -x → f (x), if it can be assumed that a plurality of nonlinear layer combinations can approximate a complex function, it can also be assumed that the residual error of the hidden layer approximates a complex function. That is, the hidden layer may be denoted as h (x) ═ f (x) + x. Thus, a completely new residual error structure unit is obtained, and the output of the residual error unit is obtained by adding the output and input elements of the cascade of the plurality of convolution layers (thereby ensuring that the dimensionality of the output and input elements of the convolution layers is the same) and then activating by the ReLU. By cascading such structures, a deep residual network is obtained.
Further, this embodiment also provides a way of further optimizing the supervised recognition result, which specifically includes:
and after identifying various sensitive pictures corresponding to the picture library to be detected and obtaining an identification result, extracting a difficult sample from the identification result. And updating parameters of the corresponding various recognition models according to the difficult sample. Therefore, the difficult case is utilized to optimize the parameters of the convolutional neural network, and the recognition capability of the algorithm model is enhanced.
Wherein, the extracting of the sample of the difficult case from the identification result comprises:
and acquiring the score value of each attribute in the sensitive picture, wherein the score value of each attribute in the sensitive picture is calculated through the identification model.
And sequencing the attributes of the acquired sensitive pictures according to the sequence of the score values from big to small. Wherein, the attribute of the sensitive picture can be understood as: information associated with the image data of the sensitive picture, such as: the image data includes information such as name, source site address, date, resolution, size, and category label, and such related information is usually added to the image data as attribute information of the image data.
And acquiring the added value of the score values of the attributes of the first designated digit, and judging the attribute as the difficult sample when the added value is greater than a preset confidence threshold value. For example: the attributes of one sensitive picture comprise 10 attributes such as name, source website address, date, resolution, size, classification label … and the like, and the 10 attributes are calculated through the recognition model, and the first 3 attributes with the maximum score are obtained as follows: the source website address (score value 0.4), the name (score value 0.3) and the classification label (score value 0.1), and the confidence threshold value is 0.7, then the score value 0.8 of the source website address + name + classification label is greater than 0.7, and the sensitive picture is judged as a sample of difficulty cases.
The recognition capability of the algorithm can be enhanced through Progressive CNN, and the recognition capability of the recognition model on the illegal pictures which are difficult to distinguish is enhanced by adding the recognition capability into convolutional neural network training data according to the result of detecting the difficult cases, so that the effect of the difficult cases in the samples is enhanced.
Through this embodiment, realized uploading the main map, the detail picture of electricity merchant service platform and shining the obscene violation picture that the single picture appears and carry out automated inspection scanning to the merchant, especially promoted the intelligent level to advertisement picture management, reduced artifical audit cost, finally also reduced management platform risk.
And different from the traditional simple dichotomy only divided into sensitive images and non-sensitive images, the method provided by the invention divides the illegal images into a plurality of classes (sensitive types) aiming at the possible scenes of the E-commerce, increases the pertinence to the special class image identification, and simultaneously improves the identification accuracy. For example: in actual tests, 200 million images are newly uploaded by a merchant management platform in a e-commerce service platform every day, the cost is huge through manual verification, and 100 man-hours are needed. After the embodiment is adopted, the number of the pictures needing manual further verification every day is reduced to within 500, the labor cost is reduced by 4000 times, meanwhile, manual participation is reduced, and risks caused by misoperation are reduced.
An embodiment of the present invention further provides a device for analyzing a sensitive image as shown in fig. 5, where the device may specifically operate on a monitoring server as shown in fig. 1, and the device includes:
the clustering module is used for clustering the sample images in the training sample set according to the sensitive types corresponding to the samples in the extracted training sample set;
the training module is used for training various corresponding recognition models through a convolutional neural network according to the clustered sample images;
and the analysis module is used for identifying various corresponding sensitive pictures from the picture library to be detected by utilizing the identification model obtained by training.
The clustering module is specifically used for extracting the sensitive features of each sample image from the training sample set through a preset neural network model; clustering sample images with the similarity degree of the sensitive characteristics meeting the test rule to the same sample subset through a preset clustering algorithm; wherein the preset neural network model is trained through imagenet.
Further, as shown in fig. 6, the method further includes: a filtering module to, in a subset of samples: according to the distance between the sample images and the clustering center, sequencing the sample images in the subset from near to far, and selecting the sample image sequenced at the front designated digit as a positive sample; training a model classifier by using the obtained positive sample; then, carrying out classification calculation on the sample images in the sample subset through the trained model classifier, and removing the sample images with the calculated values lower than a preset threshold;
further comprising: the correction module is used for training a depth residual error network with a specified number of layers by using the extracted pre-training data set, wherein the specified number of layers is more than or equal to 50; and correcting the sample subset through a depth residual error network obtained by training.
Further, as shown in fig. 7, the updating module is configured to identify various types of sensitive pictures corresponding to the picture library to be detected, and obtain score values of various attributes in the sensitive pictures after obtaining an identification result, where the score values of various attributes in the sensitive pictures are obtained through calculation by the identification model; sequencing the attributes of the acquired sensitive pictures according to the sequence of the score values from big to small;
and acquiring the added value of the score values of the attributes of the first designated digit, and judging the attribute as the difficult sample when the added value is greater than a preset confidence threshold value. (ii) a And updating parameters of the corresponding various recognition models according to the difficult sample.
The device for analyzing the sensitive images provided by the embodiment of the invention is characterized in that the sample images in the training sample set are clustered, corresponding various identification models are trained through a convolutional neural network according to the clustered sample images, and then the corresponding various sensitive images are identified from the image library to be detected by utilizing the identification models obtained through training, so that whether the images uploaded by the merchants of the e-commerce service platform belong to the corresponding various sensitive images is identified. The automatic detection and scanning of the pictures of the e-commerce service platform uploaded by the commercial customers are realized, the automation level of the advertisement picture identification detection is improved, and the manual examination cost is reduced.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. A method of analyzing a sensitive image, comprising:
in the extracted training sample set, clustering sample images in the training sample set according to the sensitive type corresponding to each sample;
training various corresponding recognition models through a convolutional neural network according to the clustered sample images;
recognizing various corresponding sensitive pictures from a picture library to be detected by using a recognition model obtained by training;
the clustering of the sample images in the training sample set according to the sensitive types corresponding to the samples comprises: extracting the sensitive features of each sample image from the training sample set through a preset neural network model, wherein the preset neural network model is trained through imagenet; clustering sample images with the similarity of the sensitive characteristics meeting the test rule to the same sample subset through a preset clustering algorithm;
further comprising: training a depth residual error network with a specified number of layers by using the extracted pre-training data set, wherein the specified number of layers is more than or equal to 50; correcting the sample subset through a depth residual error network obtained through training;
the method comprises the steps that a certain hidden layer in a depth network is H (x) -x → F (x), the hidden layer is represented as H (x) -F (x) + x, the output of a residual error unit is added between the output and input elements of a plurality of convolution layer cascades, and then the residual error unit is activated by ReLU, and the obtained structure is cascaded to obtain the depth residual error network.
2. The method of claim 1, further comprising:
in a subset of samples:
according to the distance between the sample images and the clustering center, sequencing the sample images in the subset from near to far, and selecting the sample image sequenced at the front designated digit as a positive sample;
training a model classifier by using the obtained positive sample;
and carrying out classification calculation on the sample images in the sample subset through the trained model classifier, and removing the sample images with the calculated values lower than a preset threshold.
3. The method of claim 1, further comprising:
identifying various sensitive pictures corresponding to the picture library to be detected, and extracting difficult samples from the identification result after obtaining the identification result;
and updating parameters of the corresponding various types of recognition models according to the difficult samples.
4. The method of claim 3, wherein the extracting of the hard case sample from the recognition result comprises:
obtaining score values of all attributes in the sensitive picture, wherein the score values of all attributes in the sensitive picture are obtained through calculation of the identification model; sorting the attributes of the acquired sensitive pictures according to the sequence of the score values from large to small;
and acquiring the added value of the score values of the attributes of the first designated digit, and judging the attribute as the difficult sample when the added value is greater than a preset confidence threshold value.
5. The method of claim 1, further comprising:
acquiring candidate images from a provider service platform according to a preset service rule, and updating the picture library by using the acquired candidate images;
and/or extracting the training sample set from a sample library pointed by the test rule according to a preset test rule.
6. An apparatus for analyzing a sensitive image, comprising:
the clustering module is used for clustering the sample images in the training sample set according to the sensitive types corresponding to the samples in the extracted training sample set;
the training module is used for training various corresponding recognition models through a convolutional neural network according to the clustered sample images;
the analysis module is used for identifying various corresponding sensitive pictures from the picture library to be detected by utilizing the identification model obtained by training;
the clustering module is specifically used for extracting the sensitive features of each sample image from the training sample set through a preset neural network model; clustering sample images with the similarity degree of the sensitive characteristics meeting the test rule to the same sample subset through a preset clustering algorithm; wherein the preset neural network model is trained through imagenet;
further comprising: a filtering module to, in a subset of samples: sorting the sample images in the subset from near to far according to the distance from the clustering center, and selecting the sample image which is sorted at the front designated digit as a positive sample; training a model classifier by using the obtained positive sample; then, carrying out classification calculation on the sample images in the sample subset through the trained model classifier, and removing the sample images with the calculated values lower than a preset threshold; further comprising: the correction module is used for training a depth residual error network with a specified number of layers by using the extracted pre-training data set, wherein the specified number of layers is more than or equal to 50; correcting the sample subset through a depth residual error network obtained by training;
the method comprises the steps that a certain hidden layer in a depth network is H (x) -x → F (x), the hidden layer is represented as H (x) -F (x) + x, the output of a residual error unit is added between the output and input elements of a plurality of convolution layer cascades, and then the residual error unit is activated by ReLU, and the obtained structure is cascaded to obtain the depth residual error network.
7. The apparatus of claim 6, further comprising:
the updating module is used for identifying various sensitive pictures corresponding to the picture library to be detected and acquiring score values of various attributes in the sensitive pictures after identification results are obtained, wherein the score values of various attributes in the sensitive pictures are obtained through calculation of the identification model; sequencing the attributes of the acquired sensitive pictures according to the sequence of the score values from big to small; acquiring the added value of the score values of the attributes of the previously ordered specified digits, and judging the attribute as a difficult sample when the added value is greater than a preset confidence threshold; and updating parameters of the corresponding various identification models according to the difficult sample.
CN201710248908.8A 2017-04-17 2017-04-17 Method and device for analyzing sensitive image Active CN108734184B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710248908.8A CN108734184B (en) 2017-04-17 2017-04-17 Method and device for analyzing sensitive image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710248908.8A CN108734184B (en) 2017-04-17 2017-04-17 Method and device for analyzing sensitive image

Publications (2)

Publication Number Publication Date
CN108734184A CN108734184A (en) 2018-11-02
CN108734184B true CN108734184B (en) 2022-06-07

Family

ID=63923944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710248908.8A Active CN108734184B (en) 2017-04-17 2017-04-17 Method and device for analyzing sensitive image

Country Status (1)

Country Link
CN (1) CN108734184B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144399B (en) * 2018-11-06 2024-03-05 富士通株式会社 Apparatus and method for processing image
CN109919170B (en) * 2018-11-29 2023-12-05 创新先进技术有限公司 Change evaluation method, change evaluation device, electronic device and computer-readable storage medium
CN109831699B (en) * 2018-12-28 2021-07-20 广州华多网络科技有限公司 Image auditing processing method and device, electronic equipment and storage medium
CN109829069B (en) * 2018-12-28 2021-03-12 广州华多网络科技有限公司 Image auditing processing method and device, electronic equipment and storage medium
CN110110982A (en) * 2019-04-26 2019-08-09 特赞(上海)信息科技有限公司 The checking method and device of intention material
CN110222846B (en) * 2019-05-13 2021-07-20 中国科学院计算技术研究所 Information security method and information security system for internet terminal
CN110210356A (en) * 2019-05-24 2019-09-06 厦门美柚信息科技有限公司 A kind of picture discrimination method, apparatus and system
CN110456955B (en) * 2019-08-01 2022-03-29 腾讯科技(深圳)有限公司 Exposed clothing detection method, device, system, equipment and storage medium
CN111311316B (en) * 2020-02-03 2023-05-23 支付宝(杭州)信息技术有限公司 Method and device for depicting merchant portrait, electronic equipment, verification method and system
CN111626778A (en) * 2020-05-25 2020-09-04 陶乐仪 Advertisement pushing system and method
CN111726648A (en) * 2020-06-28 2020-09-29 百度在线网络技术(北京)有限公司 Method, device and equipment for detecting image data and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060095202A1 (en) * 2004-11-01 2006-05-04 Hitachi, Ltd. Method of delivering difference map data
CN103679132A (en) * 2013-07-15 2014-03-26 北京工业大学 A sensitive image identification method and a system
CN106228185A (en) * 2016-07-20 2016-12-14 武汉盈力科技有限公司 A kind of general image classifying and identifying system based on neutral net and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060095202A1 (en) * 2004-11-01 2006-05-04 Hitachi, Ltd. Method of delivering difference map data
CN103679132A (en) * 2013-07-15 2014-03-26 北京工业大学 A sensitive image identification method and a system
CN106228185A (en) * 2016-07-20 2016-12-14 武汉盈力科技有限公司 A kind of general image classifying and identifying system based on neutral net and method

Also Published As

Publication number Publication date
CN108734184A (en) 2018-11-02

Similar Documents

Publication Publication Date Title
CN108734184B (en) Method and device for analyzing sensitive image
CN107239891B (en) Bidding auditing method based on big data
CN105426356A (en) Target information identification method and apparatus
CN116188475B (en) Intelligent control method, system and medium for automatic optical detection of appearance defects
CN113761259A (en) Image processing method and device and computer equipment
Drew et al. Automatic identification of replicated criminal websites using combined clustering
CN112258254B (en) Internet advertisement risk monitoring method and system based on big data architecture
Trappey et al. An intelligent content-based image retrieval methodology using transfer learning for digital IP protection
CN110457992A (en) Pedestrian based on Bayes's optimisation technique recognition methods, device and system again
CN110968664A (en) Document retrieval method, device, equipment and medium
CN111402014A (en) Capsule network-based E-commerce defective product prediction method
CN113220875A (en) Internet information classification method and system based on industry label and electronic equipment
CN111245815B (en) Data processing method and device, storage medium and electronic equipment
CN111200607B (en) Online user behavior analysis method based on multilayer LSTM
CN112131477A (en) Library book recommendation system and method based on user portrait
CN112445985A (en) Similar population acquisition method based on browsing behavior optimization
CN113837836A (en) Model recommendation method, device, equipment and storage medium
CN111353803B (en) Advertiser classification method and device and computing equipment
CN113642329A (en) Method and device for establishing term recognition model and method and device for recognizing terms
CN111125351A (en) Business condition briefing generation method and device, electronic equipment and storage medium
CN108564422A (en) A kind of system based on matrimony vine data analysis
Liu Fruit Traceability and Quality Inspection System Based on Blockchain and Computer Vision
Meizenty et al. Rice Quality Detection Based on Digital Image Using Classification Method
CN112445992A (en) Information processing method and device
Santoso et al. Identification of Hoax News in the Using Community TF-RF and C5. 0 Tree Decision Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 210000, 1-5 story, Jinshan building, 8 Shanxi Road, Nanjing, Jiangsu.

Applicant after: SUNING.COM Co.,Ltd.

Address before: 210042 Suning Headquarters, No. 1 Suning Avenue, Xuanwu District, Nanjing City, Jiangsu Province

Applicant before: SUNING COMMERCE GROUP Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant