CN103679132A

CN103679132A - A sensitive image identification method and a system

Info

Publication number: CN103679132A
Application number: CN201310301729.8A
Authority: CN
Inventors: 刘毅; 肖创柏; 段娟; 卞春晓
Original assignee: Beijing University of Technology
Current assignee: Nanjing Multimodal Intelligent Technology Co., Ltd.
Priority date: 2013-07-15
Filing date: 2013-07-15
Publication date: 2014-03-26
Anticipated expiration: 2033-07-15
Also published as: CN103679132B

Abstract

The invention discloses a sensitive-image identification method and a system, and belongs to the technical field of image identification. The sensitive-image identification method and the system are characterized in that the following steps are comprised: a step 1, grid dividing characteristic extraction fused with skin color detection is carried out, and original bag-of-words expressing vectors of images are obtained through a bag-of-words model; a step 2, image characteristic optimization is carried out, and dimension-reduced optimization image vector expressions are obtained through the utilization of a random forest; a step 3, identification model training is carried out, that is to say through the utilization of a one-class support vector machine, a one class classifier is trained in optimization vector space; and a step 4, image identification is carried out, i.e., if the images completely do not contain skin color pixels in the pretreatment process of the step 1, the images are directly determined to be normal images; and otherwise, optimization characteristic expressions are obtained after processing, and the optimization characteristic expressions enter the one-class classification model obtained through the training, so that identification results of the images are finally obtained. According to the invention, a one-class classification algorithm is utilized to solve sensitive-image identification problem, and a plurality of techniques are fused in the processing process, and the characteristic optimization processing is carried out, so that the accuracy and the efficiency of the sensitive-image identification are improved.

Description

A kind of nude picture detection method and system

Technical field

The present invention relates to image-recognizing method, relate in particular to a kind of nude picture detection method and system based on a class sorting algorithm that Face Detection and image word bag represent that merges.

Background technology

In traditional Sensitive Image Detection Method, Face Detection technology and mode identification method are widely adopted; Wherein, the method that the first kind is popular is passed through design Face Detection algorithm, the area ratio of the area of skin color in computed image, and determine according to certain threshold value whether this image is sensitive image, obviously, due to the impact of background color, illumination condition, picture quality, these class methods are very easy to occur erroneous judgement and fail to judge, therefore most sensitive image detection algorithm combines Face Detection and mode identification method, and Face Detection is as supplementary means.In these class methods, general step is: 1) extract characteristics of image; 2) train classification models; 3) image recognition.Before this, nearly all method based on pattern-recognition and machine learning is all considered as the identification problem of sensitive image a kind of two class classification problems: image is divided into two classes, be sensitive image and normal picture, utilize sorting algorithm by the training set of this two classes image is learnt, obtain two class disaggregated models.Yet in two class disaggregated models, the type of sensitive image is comparatively complete, covered the sensitive image of most of kind, but under true internet environment, the type of normal picture is magnanimity, and the training set of any normal picture all cannot be contained most normal picture type.Therefore,, in the method based on pattern-recognition, sensitive image is considered as to a kind of two class classification problems and can causes data nonbalance, and then limited the generalization ability of model.

In the nude picture detection method based on pattern-recognition and machine learning, the feature extraction of image and the vector representation of image are one of important steps, and effectively feature can significantly improve the combination property of the model training.The method of image characteristics extraction has multiple, and the method wherein extensively adopting has color characteristic, textural characteristics, contour feature, local features etc.In the Local Feature Extraction of image, SIFT algorithm is because of its outstanding performance, obtained application the most widely, but that its shortcoming is the calculating of key point is consuming time more.In field of machine vision, a kind of effective image vector method for expressing is word bag model (Bag of Words Model, BoW), and research has shown this model presentation video effectively, and be widely used in target identification and Images Classification field, obtained good effect.

Before this, had the nude picture detection method based on word bag model, but these class methods are all based on two class sorting algorithms, the less consideration of Skin Color Information simultaneously, accuracy rate and the time complexity of algorithm are undesirable, and the generalization ability of algorithm has much room for improvement.

Summary of the invention

In view of this, the invention provides a kind of fusion nude picture detection method and system, be intended to realize and more effectively detect sensitive image.Nude picture detection method of the present invention is characterised in that: to achieve these goals, the embodiment of the present invention provides following scheme:

Step (1). to described computer input as hypograph:

Comprise m ₁the first sensitive image subset PornSet1 of width sensitive image, described sensitive image refer to can make us interested image, lower with,

Comprise m ₂the second sensitive image subset PornSet2 of width sensitive image,

Comprise m ₃the normal picture subset NormalSet of width normal picture,

Above-mentioned m ₁, m ₂, m ₃be limited positive integer;

Step (2). every width sensitive image of described the first sensitive image subset PornSet1 is divided into M * N grid, M wherein, N is limited positive integer, and each sizing grid is 16 * 16 pixels; In each grid, carry out according to the following steps Face Detection operation:

Step (2.1) if. the pixel value r of the pixel in the image in described the first sensitive image subset PornSet1 in RGB color space, g, b meets the following conditions, and thinks that this pixel in described grid is skin pixel; Following r, g, the interval of b value is [0,255],

r＞90&g＞38&b＞18&|r-g|＞12，

{max{r,g,b}-min{r,g,b}}＞12&r＞g&r＞b，

Step (2.2) if. the ratio that accounts for whole pixels at skin pixel described in described each grid is more than or equal to threshold value s _g=0.3, determine that this grid is doubtful sensitive sub-region;

Step (3). each in the every width image in described the first sensitive image subset PornSet1 is judged to be to described doubtful sensitive sub-region, around central point, in the grid of 16 * 16 pixels, adopt yardstick invariant features conversion SIFT Feature Descriptor to generate the proper vectors of 128 dimensions, the all doubtful sensitive sub-region of every width image of the first sensitive image subset PornSet1, through after described feature extraction operation, obtains the set F={F that comprises R proper vector ₁, F ₂..., F _r..., F _r; The proper vector set PornFeatureSet1 of all doubtful sensitive sub-region of all images in the first sensitive image subset PornSet1, hereinafter to be referred as PornFeatureSet1;

Step (4). the prime word bag proper vector of calculating according to the following steps every width image of described the first sensitive image subset PornSet1 represents,

Step (4.1). the proper vector set PornFeatureSet1 to described the first sensitive image subset PornSet1, carry out according to the following steps K means clustering algorithm, obtain reflecting a class visual dictionary of common trait in the first sensitive image subset PornSet1:

Step (4.1.1). set:

N, the proper vector sum in the proper vector set PornFeatureSet1 of described the first sensitive image subset PornSet1,

C, clusters number, is set amount, C value is 200,

C, the sequence number of cluster classification, c=1,2 ..., c ..., C,

for the sequence number cluster S that is c _cin v _cindividual proper vector, cluster S _cin comprise V _cindividual proper vector, v _c=1,2 ..., v _c..., V _c

μ _c, be the sequence number cluster S that is c _ccluster centre, i.e. S _cin the mean value of institute's directed quantity,

Clustering criteria function is:

J (c) = Σ_{c = 1}^{C} Σ_{v_{c} = 1}^{V_{c}} {| | S_{v_{c}} - μ_{c} | |}^{2},

Step (4.1.2). change each cluster classification, recalculate μ _c, until

till being equal to or less than the convergence threshold T of setting, 0 < T < 1, sets T=1 * 10 ^-4, complete after, obtain altogether C cluster centre, all cluster centres form a class visual dictionary; The proper vector set of one class visual dictionary is expressed as D={D ₁, D ₂..., D _c..., D _ccomprise altogether the proper vector of C cluster centre;

Step (4.2). the character representation vector of the final image forming of image word bag model represents with B, has reflected the frequency that the vector in D occurs in F, is called prime word bag proper vector and represents, distribution histogram, is expressed as B={B ₁, B ₂..., B _c..., B _c, determine as follows each component B in B _cvalue:

Each proper vector F in step (4.3) .F _reuclidean distance d (F with each proper vector in a described class visual dictionary D _r, D _c), with x, represent F _r, with y, represent D _c,

wherein P is the dimension of proper vector, is 128 dimensions,

Before calculating, the initial value of each component in B is 0; After calculating, if in D with F _rthe nearest proper vector of Euclidean distance is D _c, prime word bag proper vector represents the variable B in B _cvalue increase to B' _c, B' _c=B _c+ 1, i.e. component B' _crepresent the vision word D in dictionary D _cthe frequency occurring in F;

Said process is called minor increment mapping, so in the complete F of circular treatment after all R feature, obtains the value of each component in prime word bag vector representation that the C of every width image ties up, has calculated the prime word bag vector representation B of every width image; In the first sensitive image subset PornSet1, every width image is processed by step (4.2), and the prime word bag proper vector that obtains all images represents to gather PornBowFeatureSet1;

Step (5). by the described method of step (2)-step (3), normal image subset NormalSet is extracted to local feature, obtain its proper vector set, with NormalFeatureSet, represent, and by the described method of step (4.2), utilize method described in step 4.4 to carry out minor increment mapping to a class visual dictionary D who obtains in the characteristic set of the every width image in described NormalSet and step (4.1.2), the word bag proper vector that obtains all images of normal picture subset represents to gather NormalBowFeatureSet

Step (6). utilize random forests algorithm to carry out class test to the PornBowFeatureSet1 obtaining in step (5) and NormalBowFeatureSet, to determine the vision word D in a class visual dictionary D _ceach component B in corresponding prime word bag proper vector B _cto the contribution of classifying quality, step is as follows:

Step (6.1). utilize following method to generate the training set of random forest:

Step (6.1.1). from PornBowFeatureSet1 and NormalBowFeatureSet, have respectively and put back to the common training set TrainSet of formation of sample that randomly draws 50%, the proper vector B in TrainSet _trepresent B _tthe prime word bag proper vector that is the image in TrainSet represents, with this, generates a classification tree, chooses arbitrarily a vector as the root node of every classification tree, and classification tree is a binary class tree; The leaf node of classification tree is B _tin certain component B _tthe property value that is called leafy node,

Step (6.1.2). repeat above-mentioned process ntree time of randomly drawing, ntree is a limited positive integer, and its value is made as 500; In each extraction, the sample of not drawing is called the outer data of bag, uses

represent, in randomly drawing, all generate a classification tree at every turn, with τ, represent, τ=(1,2 ..., τ ..., ntree), all classification trees have formed random forest,

Step (6.2). utilize following method to carry out the assessment of variable importance:

Step (6.2.1). by described in step (6.1.2), each random generation after trainset, the leafy node of classification tree is divided in the following manner:

Prime word bag proper vector to C dimension represents B _t, specifying a positive integer mtry < C, mtry is a limited positive integer, its value is made as 64; On each leafy node, from C property value, randomly draw mtry as candidate feature, in each division of leaf node, be calculated as follows Geordie impurity level index:

G (f) = 1 - Σ_{l = 1}^{2} {f_{l}}^{2},

Wherein, f _lthe directed quantity B of institute in TrainSet _tin ntree time is extracted, be classified as the probability of l, the classification results of classification tree may value be 1,2}, 1 and 2 respectively in order to represent sensitive image and normal picture,

In classification tree fission process, the attribute that selection has minimum Geordie impurity level index G (f) is the property value of leaf node after division;

Obtain as stated above the final classification results of every classification tree in random forest,

The classification results of step (6.2.2) random forest adopts most ballot methods to determine:

Every classification tree in random forest all draws a classification results, and its value is for { any in 1,2}, counts the classification results that maximum values is random forest in ntree classification results value;

Step (6.3). the importance of each component in computed image prime word bag proper vector B in the steps below:

Step (6.3.1). set:

the outer data of bag the concrete class label of middle sample u, the true classification of the image that u is corresponding,

the classification result of this sample of classification tree τ prediction,

Step (6.3.2). randomly draw the outer data of bag the τ time

and B _tcorresponding property value replacing in the following manner, to the sample B after displacement ' carry out classification tree classification, obtain new class label and be

Described attribute substitute mode is as follows:

Data outside bag

in randomly draw and obtain vectorial X=(X ₁, X ₂..., X _c..., X _c), by B _tin the attribute B of same one dimension _tvalue be replaced into X _c, the sample B after being replaced ';

Step (6.3.3). be calculated as follows sample B ' in C dimension prime word bag proper vector B _tin importance VI in decision tree τ of the variable of c dimension ^(τ)(B _t):

In above formula,

represent before displacement the classification tree equal situation that predicts the outcome,

represent after displacement the classification tree equal situation that predicts the outcome;

Step (6.3.4). variable B _timportance be defined as VI ^(τ)(B _t) mean value on all classification trees in random forest, computing formula is as follows:

VI (B_{t}) = \frac{Σ_{τ = 1}^{ntree} {VI}^{(τ)} (B_{t})}{ntree}

VI (the B calculating by above-mentioned formula _t) value be the importance values of the variable of t in image prime word bag vector representation B dimension;

Step (7). by the described method of step (6), utilize random forest to carry out the assessment of variable importance to image prime word bag character representation vector B, obtain the importance values of all C component, variable importance values is sorted and records front K most important variable, in a class visual dictionary, classification is contributed to K maximum vision word, K is a limited positive integer, K < C, and K value is 80; The set of significant variable index is designated as P={p ₁, p ₂... p _k, p _k;

Step (8). the image vector being optimized represents, mode of operation is as follows:

After in the second sensitive image subset, piece image is obtained prime word bag proper vector and is represented by step (4), under only retaining in this vector, be marked on indexed set and close the variable occurring in P, and keeping its original ordinal relation, the optimized image that so just obtains a new K dimension represents vectorial O

Step (9). nude picture detection is considered as to an a kind of class classification application scene: the second sensitive image subset PornSet2 is used for training a class sorter;

Step (9.1). in the selection of one-class classification, adopt a class Support Vector Machine, the decision function of setting a class Support Vector Machine is as follows:

f (O) = sgn ((Σ_{i = 1}^{m_{2}} α_{i} Θ (O_{i}, O_{j})) - ρ),

Wherein, α _ifor Lagrangian coefficient, ρ has determined the distance of the relative initial point of classifying face, and for obtaining decision function, input data are that the optimized image of described the second sensitive image subset PornSet2 represents vector, parameter alpha _i, ρ utilizes Open-Source Tools bag libsvm in training process, to calculate gained, and after disaggregated model is determined, parameter value no longer changes; Θ is kernel function, Θ (O _i, O _j) be inner product operation; Two vectorial O _iand O _jbetween kernel function computing formula as follows:

Θ (O_{i}, O_{j}) = e^{- γ | | O_{i}, O_{j} | |}, γ &GreaterEqual; 0,

Wherein, the value of γ is 8;

After calculating the parameters of decision function f (O), obtain complete available decision function, this decision function is that the data set of a series of vector sum parameters is fit, is directly used in subsequent calculations,

Step (10). for an image I to be identified, obtain as follows the classification of this image:

Step (10.1). by the described method of step (2), image I is carried out to feature detection operation, if find that in step (2) this image is not completely containing doubtful area of skin color, directly differentiating this image is normal picture, no longer enters subsequent processing steps; Otherwise, carry out following treatment step:

Step (10.2). extract image I local feature after, obtain the characteristic set FeatureSet_I of image I, institute's directed quantity in FeatureSet_I one by one with step (4.1.2) in the class dictionary that obtains carry out minor increment mapping, obtain original image word bag and represent vectorial B;

Step (10.3). obtain original image word bag and represent after the B of vector, by the described method of step (8), be optimized processing, and the image representation vector O that is optimized;

Step (10.4). utilize f (O) to differentiate the classification of the test data x of image I to be identified, x is the optimization proper vector of image to be identified, by in its substitution f (O), calculate the value of f (O), if f (O) >=0, differentiating the image to be identified that test data x is corresponding is sensitive image; If f (O) < 0, differentiating the image to be identified that test data x is corresponding is normal picture.

Nude picture detection system of the present invention is characterised in that:

Image Face Detection unit, for determining the doubtful sensitizing range of image;

Image characteristics extraction unit, for extracting the local feature of doubtful sensitizing range;

Image representation vector acquiring unit, for obtaining final superior vector, for follow-up training and identification;

Data preparation unit, for preparing the important given data of follow-up disaggregated model training stage and cognitive phase;

Disaggregated model training unit, for training a class model of cognition, training sample is only sensitive image;

Image identification unit, the function by calling each unit that comprises feature detection as or the decision function of given data and training gained, obtain complete image recognition flow process.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, provide the accompanying drawing of required use in the embodiment of the present invention or description of the Prior Art below and be briefly described, apparently, accompanying drawing in the following describes is only one embodiment of the present of invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the Images Classification and identification schematic diagram based on visual word bag model and mode identification method;

Fig. 2 merges the schematic diagram of Face Detection and image characteristic extracting method in the present invention;

Fig. 3 is the schematic diagram that obtains a class vision dictionary in the present invention;

Fig. 4 is that in the embodiment of the present invention, the mapping of set of image characteristics and a class vision dictionary obtains original image vision word bag and represents vectorial schematic diagram;

Fig. 5 calculates the schematic diagram of variable importance in the embodiment of the present invention;

Fig. 6 is the schematic diagram of the characteristic optimization process of original image representation vector in the embodiment of the present invention;

Fig. 7 is the complete diagram of embodiment of the present invention training stage and data preparatory stage;

Fig. 8 is the schematic diagram of image recognition cognitive phase in the embodiment of the present invention (i.e. test);

Fig. 9 is the ROC curve of two kinds of nude picture detection methods in the embodiment of the present invention;

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills are not making creative work

The every other embodiment obtaining under prerequisite, belongs to the scope of protection of the invention.

In the image recognition based on machine learning and mode identification method, image need to adopt certain algorithm table to be shown as vector form, for follow-up training and identification.Due to the successful Application of word bag model in Images Classification and identification field, the present invention adopts this model to carry out vectorization description to image.Word bag model is applied to text classification field at first, for a document representation is become to vector form; Field of machine vision has obtained good effect during this model is introduced to image recognition and classified.The flow process of the complete image-recognizing method based on this model is referring to Fig. 1, wherein, the learning process of solid arrow presentation class model, pecked line arrow represents the generative process of vision dictionary, line segment dotted line presentation video identifying.

In visual word bag model, by the vision dictionary of clustering algorithm synthetic image training set, for using vector representation piece image, all local features in image and the vision word in vision dictionary calculate distance one by one, and by all characteristic allocation to component corresponding to nearest vision word, thereby obtain a local feature distribution histogram, i.e. image prime word bag vector representation, hereinafter referred is image B oW vector.

In sensitive image, Skin Color Information is a kind of important feature, if ignore this information completely, directly in entire image, implements feature extraction operation, can introduce a large amount of ground unrests, due to all standing in feature extraction region, can increase the feature extraction phases time simultaneously.Based on this, consider, Face Detection and feature extraction are merged in the present invention, only in doubtful sensitizing range, extract feature, can reduce noise pick up speed.

For time performance, consider, Face Detection algorithm must be very fast, utilizes minimum time cost, gets rid of the irrelevant background area of the overwhelming majority.Therefore, the embodiment of the present invention adopts the rule-based colour of skin based on RGB color space being proposed by people such as Peer to define, and redefines by preferably it being carried out to threshold value, is shown below:

r＞90and g＞38and b＞18and|r-g|＞12

(1)

max{r,g,b}-min{r,g,b}＞12and r＞g and r＞b

In colour of skin definition (1), r, g, b represents that respectively certain pixel is at the component value of three Color Channels of RGB color space, the span of component value is [0,255].

In characteristic extraction step, by the even grid division of image, sizing grid is 16 * 16, as shown in Figure 2.Travel through all grids, in each grid, utilize formula (1) to carry out Face Detection, if the ratio of skin pixel meets formula (2) in certain grid:

\frac{SkinPixels}{AllPixels} &GreaterEqual; 0.3 - - - (2)

This net region is judged to doubtful sensitizing range.Wherein, SkinPixels represents by formula (1), to be judged in grid total number of skin pixel, and AllPixels represents all number of pixels in this region.

Feature extraction operation is only carried out in doubtful sensitizing range, and the loose colour of skin definition of formula (1) guarantees that the region being excluded more than 98% does not contain skin pixel, but a part of non-area of skin color is also judged to doubtful sensitizing range simultaneously.As shown in Figure 2, wherein, the area of skin color (doubtful sensitizing range) that right figure bend grid representation is real, " colour of skin " region (reality is normal blocks) of black grid representation erroneous judgement.

Feature extraction operation is only carried out in doubtful sensitizing range, and the loose colour of skin definition of formula (1) guarantees that the region being excluded more than 98% does not contain skin pixel, but a part of non-area of skin color is also judged to doubtful sensitizing range simultaneously.As described above, by minimum time cost, can determine roughly sensitizing range, and reduce in image the local feature irrelevant with sensitizing range, be conducive to accelerate subsequent treatment speed.

In popular image local feature extracting method, SIFT algorithm, because of its outstanding performance, becomes the feature extraction algorithm of current main flow, applies very extensive.Correlative study shows, in image scene classification (scene classification) task, utilizes regular grid to carry out feature sampling and can obtain good effect.

Based on above-mentioned consideration, the present invention utilizes the method to extract the SIFT feature of image, i.e. dense SIFT.Adopt the method to carry out the main cause of feature extraction herein as follows: 1) the method is directly determined the position of unique point, makes counting yield higher (having avoided the process of calculated characteristics point); 2) it is considered herein that complicated and diversified sensitive image is actually the specific scene image of a class.The feature of entire image can be effectively described in the sampling of regular grid feature, to scene classification better effects if.In dense SIFT method, image is divided into regular grid, and the central point of each grid of take is unique point, and utilizes SIFT descriptor extract the feature of grid and generate its proper vector, computing velocity is very fast, and the local feature of Description Image fully and effectively.In the present invention, in SIFT algorithm, skip the position calculation step of key point, take net center of a lattice as key point, utilize SIFT Feature Descriptor to generate the feature description vectors of 128 dimensions of this block.

At the first sensitive image subset PornSet1, carry out above-mentioned feature extraction algorithm, obtain characteristic set PornFeatureSet1, wherein comprised the proper vector that in all PornSet1, all images extract.

Described K means clustering algorithm calculation procedure is:

Wherein, n represents the quantity of the training sample of cluster, in embodiments of the present invention, is the quantity of all proper vectors in PornFeatureSet1, represents the number of cluster.Preferably, be set to 200, in the class code book that final cluster generates, comprise 200 cluster centres, be called code book vision word.

In above-mentioned cluster calculation process, the clustering criteria function in K means clustering algorithm is:

J (c) = Σ_{c = 1}^{C} Σ_{v_{c} = 1}^{V_{c}} {| | S_{v_{c}} - μ_{c} | |}^{2} - - - (3)

Wherein, c is the sequence number of cluster classification, c=1, and 2 ..., c ..., C,

for the sequence number cluster S that is c _cin v _cindividual proper vector, cluster S _cin comprise V _cindividual proper vector, v _c=1,2 ..., v _c..., V _c, μ _csequence number is the cluster S of c _ccluster centre, i.e. S _cin the mean value of institute's directed quantity.

The change of the situation of cluster each time of mean algorithm all can produce a kind of cluster situation not occurring, so repeatedly goes on, and finally can traverse all cluster situations, algorithm will inevitably be restrained, but in actual applications,, for accelerating cluster speed, tend to set a convergence threshold.In embodiments of the present invention, this convergence threshold is set to T=1 * 10 ^-4.

Obtain after a category dictionary, the feature set of certain image and this dictionary carry out minor increment mapping, can obtain dimension and be 200 image word bag vector representation, as shown in Figure 4, for example, if first vector distance in vector that certain characteristics of image is concentrated and code book is (shown in dotted arrow) recently, the value of first of final word bag vector representation the dimension adds 1 (be 0 before, add 1 and become 1 afterwards).After all Feature Mapping are complete, just can obtain the vision word distribution histogram of one 200 dimension, i.e. image word bag vector representation.

Described minor increment is calculated by Euclidean distance:

d (x, y) = {[Σ_{i = 1}^{p} {(x_{i} - y_{i})}^{2}]}^{1 / 2} - - - (4)

Wherein, x _iand y _ifor the vector of distance to be calculated, p is the dimension of image SIFT proper vector, i.e. 128 dimensions.

After the prime word bag vector representation that obtains image, enter the importance appraisal procedure of image representation component of a vector, the reason of carrying out this operation is: if adopt one-class classification, a category dictionary is only obtained by the local feature cluster of sensitive image, the vision word S that the vision word in the category dictionary finally obtaining is formed by the local feature of sensitive part _pand the noise vision word S being formed by the local feature that is similar to area of skin color _nform.We expect that the feature in sensitive image is mapped to S as much as possible _pin, and feature in normal picture is mapped to S as much as possible _nin.In inventive embodiments, sensitive image and the normal picture distribution situation on a category dictionary is shown in Fig. 5, and transverse axis represents the index of the vision word in code book, and the longitudinal axis represents the probability that some vision words occur in the word bag vector representation of such image.Can see, when the local feature of sensitive image and normal picture is mapped on a class code book, all comparatively extensive in distribution range, cause the word bag vector representation of sensitive image and normal picture to lack enough discriminations, affect the accuracy of identification of following model training.

The importance assessment of vector adopts random forests algorithm, and random forest is a kind of assembled classification algorithm that Breiman proposes.As a kind of sorting technique, because of its excellent performance, be widely used.

Utilize Bagging method to form different training set: from PornBowFeatureSet1 and NormalBowFeatureSet, to have respectively and put back to the sample of randomly drawing 50% left and right and form a training set TrainSet, with this, generate a classification tree; Vector in TrainSet represents with X, X=(X ₁..., X _j..., X _c); Repeat to extract ntree time, symbiosis becomes ntree classification tree; In each extraction, the sample of not drawing is called the outer data of bag, uses B _trepresent, t=(1,2 ..., t ..., ntree);

The random feature of selecting divides the inside node of classification tree: the prime word bag proper vector to C dimension represents, specifies a positive integer mtry < C; At each inner node, from C characteristic component, randomly draw mtry feature as candidate feature, select best divisional mode in this mtry feature to divide node; In the growth course of whole forest, the value of mtry remains unchanged; Ntree classification tree let alone growth, do not carry out beta pruning; The classification results of random forest adopts most ballot methods, and the majority of the Output rusults of all classification trees in random forest of take is net result;

Set:

Y _i, the concrete class label of sample i in the outer data of bag,

the classification result of this sample of decision tree t prediction.

The importance of the variable that is calculated as follows the j dimension in the proper vector of the dimension of C in sample in decision tree t:

{VI}^{(t)} (X_{j}) = (Σ_{i &Element; {\overset{&OverBar;}{B}}^{(t)}} I (y_{i} = {\hat{y}}_{i}^{(t)}) - Σ_{i &Element; {\overset{&OverBar;}{B}}^{(t)}} I (y_{i} = {\hat{y}}_{i, π_{j}}^{(t)})) / | {\overset{&OverBar;}{B}}^{(t)} |

In above formula,

Randomly draw bag outer data in X _jafter corresponding attribute displacement, the class label of the sample after predictive variable displacement is again

like this, variable X _jimportance be defined as VI ^(t)(X _j) mean value on all decision trees in random forest:

VI (X_{j}) = \frac{Σ_{t = 1}^{ntree} {VI}^{(t)} (x_{j})}{ntree}

Wherein, VI (X _j) be the importance of the j variable in the vector representation of image prime word bag

In embodiments of the present invention, the quantity ntree of decision tree is set as 500, and another one parameter division variable (splitting variables) is set as 64.

For computed image word bag represents the importance of each component in vector, after the prime word bag that extracts feature and obtain all images on sensitive image training set and part normogram image set represents vector, utilize above-mentioned random forest to calculate the importance of vision word in a described class dictionary, and get front 80 most important variablees, the index of writing down its corresponding vision word, the set of index is designated as P={p ₁, p ₂... p _i, p ₈₀.

After obtaining important variable index, in follow-up model training and image recognition, can utilize this given data, optimize original image word bag and represent, mode of operation is as follows:

Suppose image through feature extraction and obtain original word bag with a category dictionary mapping to represent vector, be expressed as

only retain the variable that in S, subscript j occurs in index set P, and keep its ordinal relation in S, so just obtain the optimized image vector representation of 80 new dimensions.Fig. 6 is shown in by this process schematic diagram.

For training a class disaggregated model, training sample only adopts sensitive image as training set, and the sample number for the sensitive image training set of disaggregated model training in the embodiment of the present invention is 1067 width, comprises polytype sensitive image.The class Support Vector Machine algorithm that training algorithm adopts Scholkopf to propose, this algorithm is widely used in a plurality of technical fields, and has excellent performance.

It is unique negative class that one class Support Vector Machine is set initial point, and all raw data, as positive class members, utilize kernel function that raw data is mapped to higher dimensional space, finds the lineoid that these data can be cut apart with initial point; Its essence is to draw a minimum sphere face that can comprise test sample book by solving quadratic programming problem, and according to this hypersphere, new data is classified, this quadratic programming problem is: a class Support Vector Machine draws a minimum sphere face that can comprise test sample book by solving quadratic programming problem, and according to this lineoid, new data is classified, this quadratic programming problem is:

\min_{w, ξ, ρ}

\frac{1}{2} ω^{T} ω + \frac{1}{&upsi; m_{2}} Σ_{i = 1}^{m_{2}} ξ_{i} - ρ,

s.t. ω ^Tφ(x _i)≥ρ-ξ _i,ξ _i≥0,i＝1,2,…m ₂.

In above formula, each symbol implication is as follows:

X _i, be that the optimized image of the sensitive image in the second sensitive image subset PornSet2 represents vector,

ω, is the normal vector of classification lineoid,

M ₂, be the number of training sample, i.e. amount of images in the second sensitive image subset PornSet2,

ρ, this parameter has determined the distance of the relative initial point of classifying face, i.e. intercept,

ξ _i, be slack variable, in order to punishment, deviate from the point of lineoid;

υ ∈ [0,1], is predefined parameter, and value is made as 0.25;

φ, the mapping from former vector space to high dimension vector space;

For quadratic programming problem described in solution procedure, introduce Lagrange multiplier, after abbreviation, obtain following dual problem, formula is as follows:

min

\frac{1}{2} Σ_{i, j = 1}^{m_{2}} α_{i} α_{j} Θ (x_{i}, x_{j}),,

s.t.

0 \leq α_{i} \leq \frac{1}{&upsi; m_{2}}, Σ_{i = 1}^{m_{2}} α = 1 .

In above formula, α _ifor Suzanne Lenglen day coefficient, Θ is kernel function, Θ (x _i, x _j) be φ (x _i) and φ (x _j) inner product operation; Two vector Z _αand Z _βbetween kernel function computing formula as follows:

Θ (Z_{α}, Z_{β}) = e^{- γ | | Z_{α}, Z_{β} | |}, γ &GreaterEqual; 0,

In above formula, the value of γ is 8;

Solve after described dual problem, obtain α _iand the support vector of classification lineoid, can obtain following decision function:

f (x) = sgn (Σ_{i = 1}^{m_{2}} α_{i} Θ (x_{i}, x_{j}) - ρ),

Wherein, ρ can calculate by following formula with any one support vector:

ρ = (ω \cdot φ (x_{i})) = Σ_{j = 1}^{m_{2}} α_{j} Θ (x_{j}, x_{i}),

Obtain after decision function, can utilize f (x) to differentiate the classification of data x to be tested, the optimization proper vector that x is image to be identified, by its substitution f (x), if f (x) >=0, differentiating the image to be identified that test data x is corresponding is sensitive image; If f (x) < 0, differentiating the image to be identified that test data x is corresponding is normal picture;

Each step is model I training and data preparatory stage above, and its complete flow process is shown in Fig. 7.For obtaining the classification results of an image to be identified, implement following treatment step:

In processing and train early stage, obtain a category dictionary, significant variable index and a class disaggregated model, in cognitive phase, can directly utilize above data and model.

In an embodiment of the present invention, wish identification piece image, first carries out grid division and enters Face Detection step it, if this image does not contain skin pixel completely, system directly sentences it as normal picture.

If containing colour of skin grid (doubtful sensitizing range), extract the feature of this image, obtain the characteristic set of image, then carry out minor increment mapping with a class code book, obtain original image word bag and represent vector.

Utilize process the index of the significant variable obtaining early stage, original image word bag represented to vector carries out characteristic optimization, obtain the optimization proper vector of dimensionality reduction,

Optimize the class disaggregated model that proper vector input training obtains, whether through calculating, differentiating this image is sensitive image.

Described in the embodiment of the present invention, as shown in Figure 8, wherein in dotted line frame, data are training stage gained to the process flow diagram of nude picture detection method, in identifying, can directly use.

The embodiment of the present invention, in the image measurement experiment based on said method, is tested 937 width sensitive images and 1995 width normal pictures.It is to be noted, the image of all participations test has neither part nor lot in any training process (comprising cluster, the assessment of variable importance of a category dictionary, the training of a class disaggregated model etc.), so test result can effectively reflect that algorithm is in the combination property of real network environment.

For comparing the inventive method and additive method, be provided with in an embodiment two kinds of control methodss: the method based on Face Detection, hereinafter referred the method is method I; Method based on pattern-recognition, the training algorithm of disaggregated model adopts support vector machine, and nude picture detection problem is considered as to two traditional class classification problems, and hereinafter referred the method is method II.

Method I is a kind of simple nude picture detection method based on Face Detection, utilizes the colour of skin definition rule of the propositions such as Kovac, implements Face Detection in entire image.According to pertinent literature, when the ratio that accounts for entire image when area of skin color is more than or equal to 30%, be judged to sensitive image.Obviously, very easily there is erroneous judgement in the method, therefore, and the benchmark algorithm in method as a comparison.

Method II is a kind of method based on two class classification, so its training set comprises sensitive image training set and normal picture training set.When the method is extracted feature, traditional local feature of employing, does not merge the feature extracting method of the fusion Face Detection that the inventive method proposes.In addition, the method adopts visual word bag model presentation video feature equally.

In addition, the inventive method, on the basis of word bag model, adopt a class classification algorithm training disaggregated model, and the feature to image is optimized processing in intermediate steps.For this optimization process of surface can effectively improve the image recognition accuracy under this algorithm frame, be provided with another one control methods: remove characteristic optimization step, all the other steps are identical with the inventive method, and hereinafter referred the method is method III.

Performance for justice evaluation algorithms of different, has adopted parameter optimization equally to the model training step in method II and method III, and each method based on pattern-recognition is worked under its optimal parameter.

The index of test and appraisal comprises recognition accuracy, average handling time, in addition, has also adopted ROC curve to assess two traditional class sorting technique method II and the method for the invention.

Four kinds of diverse ways recognition accuracy (%) on same test set is in Table 1:

Table 1

Four kinds of diverse ways are to the accuracy rate of different types of nude picture detection (%), in Table 2:

Table 2

In table 2, dissimilar sensitive image is stated in each list of first trip: the exposed class sensitive image of personage's part in PN presentation video; TN represents the naked sensitive image of personage; CB represents chest feature class sensitive image; CS represents responsive organ sites feature class image; TP represents sexual behaviour feature class image; The sensitive image of LQ presentation video quality low (as colour cast, low resolution etc.).

By table 1 and table 2, can be found out, the inventive method all has higher accuracy rate in the differentiation at normal picture and sensitive image, and performance balance more.It is pointed out that the embodiment of the present invention only utilizes sensitive image as training set, just obtained than traditional based on the better effect of two class sorting algorithms, show that the inventive method proposes nude picture detection to be considered as to a class problem be feasible.

It can also be seen that, in the nude picture detection method based on word bag model, like the present invention is equally considered as nude picture detection problem an a kind of class classification task, and original image word bag represents that vector is not directly for model training and identification, but characteristics of needs optimization.Characteristic optimization method in the embodiment of the present invention can greatly improve the accuracy rate of identification, is one of important measures that guarantee this one-class classification effect.

In three kinds of identifications based on mode identification method, identification piece image, average 2.163 seconds consuming time of method II, average 0.827 second consuming time of the embodiment of the present invention.Its reason is:

First, in the inventive method, the image that does not contain skin pixel completely can be normal picture by Direct Recognition, and described loose colour of skin definition rule guarantees, the image overwhelming majority being excluded is to be normal picture really; Secondly, in the inventive method, in merging the feature extraction phases of Face Detection, most of ground unrest is disallowable, only in sensitizing range, extracts the validity that feature has not only improved feature, has also reduced the quantity of feature, is conducive to follow-up further processing; Secondly, under the prerequisite of index that obtains significant variable, characteristic optimization is consuming time almost negligible, and the optimization that this step is further reduced to the image vector statement of 200 dimensions 80 dimensions represents, can accelerate training and the recognition speed of model.

The ROC curve of the inventive method and method II is shown in Fig. 9.

As seen from Figure 7, the embodiment of the present invention is very approaching with traditional method based on two class classification on recognition performance, and the class disaggregated model that the surperficial embodiment of the present invention is trained on the training set that only comprises sensitive image has been obtained good recognition effect.If can cover the sensitive image of type as much as possible in training set, the inventive method should have better effect.

On the other hand, because this model I is only trained and obtained by sensitive image, therefore its generalization ability is in theory better than the disaggregated model of the two class methods gained based on Nonblanced training sets.

Above embodiment institute in steps and test result show, the inventive method can realize the identification to sensitive image quickly and accurately.

One of ordinary skill in the art will appreciate that all or part of flow process in above-described embodiment method, to come the hardware that instruction is relevant to complete by computer program, described program can be stored in a computer read/write memory medium, described program is when carrying out, the flow process that can comprise the embodiment of said method, wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM) obtain random store-memory body (Random Access Memory, RAM) etc.

Above-mentioned explanation to the disclosed embodiments, makes professional and technical personnel in the field can realize or use the present invention.To the multiple modification of these embodiment, be that or else apparent, defined herein General Principle can depart from the situation of the spirit or scope of the present invention for those skilled in the art, realize in other embodiments.Therefore, the present invention will can not be limited and embodiment illustrated herein, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims

1. a nude picture detection method, is characterized in that, in computing machine, realizes according to the following steps successively:

Step (1). to described computer input as hypograph:

Comprise m ₃the normal picture subset NormalSet of width normal picture,

Above-mentioned m ₁, m ₂, m ₃be limited positive integer;

r＞90&g＞38&b＞18&|r-g|＞12，

{max{r,g,b}-min{r,g,b}}＞12&r＞g&r＞b，

Step (3). each in the every width image in described the first sensitive image subset PornSet1 is judged to be to described doubtful sensitive sub-region, around central point, in the grid of 16 * 16 pixels, adopt yardstick invariant features conversion SIFT Feature Descriptor to generate the proper vectors of 128 dimensions, the all doubtful sensitive sub-region of every width image of the first sensitive image subset PornSet1, through after described feature extraction operation, obtains the set F={F that comprises R proper vector ₁, F ₂, F _r..., F _r; The proper vector set PornFeatureSet1 of all doubtful sensitive sub-region of all images in the first sensitive image subset PornSet1, hereinafter to be referred as PornFeatureSet1;

Step (4.1.1). set:

C, clusters number, is set amount, C value is 200,

C, the sequence number of cluster classification, c=1,2 ..., c ..., C,

for the sequence number cluster S that is c _cin v _cindividual proper vector, cluster S _cin comprise V _cindividual proper vector,

Clustering criteria function is:

Step (4.1.2). change each cluster classification, recalculate μ _c, until

till being equal to or less than the convergence threshold T of setting, 0 < T < 1, sets T=1 * 10 ^-4, complete after, obtain altogether C cluster centre, all cluster centres form a class visual dictionary; The proper vector set of one class visual dictionary is expressed as D={D ₁, D ₂..., D _c..., D _c, comprise altogether the proper vector of C cluster centre;

wherein P is the dimension of proper vector, is 128 dimensions,

Step (6.3.1). set:

the outer data of bag

the concrete class label of middle sample u, the true classification of the image that u is corresponding,

the classification result of this sample of classification tree τ prediction,

Step (6.3.2). randomly draw the outer data of bag the τ time

Described attribute substitute mode is as follows:

Data outside bag

In above formula,

Figure DEST_PATH_FDA00004618085100000410

Figure DEST_PATH_FDA00004618085100000411

Wherein, the value of γ is 8;

2. a kind of nude picture detection method according to claim 1 and a kind of nude picture detection system based on one-class classification of obtaining is characterized in that, comprising: