CN102360435B - Undesirable image detecting method based on connotative theme analysis - Google Patents

Undesirable image detecting method based on connotative theme analysis Download PDF

Info

Publication number
CN102360435B
CN102360435B CN 201110329875 CN201110329875A CN102360435B CN 102360435 B CN102360435 B CN 102360435B CN 201110329875 CN201110329875 CN 201110329875 CN 201110329875 A CN201110329875 A CN 201110329875A CN 102360435 B CN102360435 B CN 102360435B
Authority
CN
China
Prior art keywords
image
word
words
sigma
gaussian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201110329875
Other languages
Chinese (zh)
Other versions
CN102360435A (en
Inventor
田春娜
高新波
王华青
李东阳
袁博
赵林
李洁
蒲倩
王代富
季秀云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN 201110329875 priority Critical patent/CN102360435B/en
Publication of CN102360435A publication Critical patent/CN102360435A/en
Application granted granted Critical
Publication of CN102360435B publication Critical patent/CN102360435B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an undesirable image detecting method based on connotative theme analysis, which is substantially used for solving the problem of wrong judgment on normal images resulting from semantic information consideration failure in the present undesirable information detecting method. The scheme is as follows: extracting a skin region of an image by a double-blending Gaussian model; generating a codebook base containing distinguishing features in the skin region by a word bag model, and representing each training image to a group of word co-occurrence vectors with weights via a word frequency-inverse identification file frequency method; forming all co-occurrence vectors to a co-occurrence matrix, performing LDA model creation on the co-occurrence matrix to obtain the theme of the image; inputting the mixed theme of the training image in a BP neural network to train an undesirable image classifier; and obtaining the theme of an image to be measured, inputting the theme to the undesirable image classifier, and judging whether the theme is an undesirable image so as to finish the undesirable image detection. As shown in the test, the invention can be used for better distinguish the undesirable images and the normal images, so that the invention can be used for filtering the erotic information in the images.

Description

Bad image detection method based on implicit theme analysis
Technical Field
The invention belongs to the cross field of computer vision and pattern recognition, and particularly relates to a semantic classification method for bad images based on implicit topic analysis, which can be used for filtering pornographic information in images.
Background
With the explosion of the internet in the 90 s of the 20 th century, various kinds of information on the network are rapidly growing and spreading. Particularly, with the coming of the 3G era and the introduction of the converged network concept, images are rapidly spread in the field of instant messaging by taking multimedia messages, mobile phone video streams and the like as carriers, wherein the images contain a large amount of unhealthy information such as obscene pornography and the like. The spread of a large amount of bad information has adverse effects on physical and psychological health of people, so that the advanced bad information filtering method has profound significance. For filtering bad information, how to correctly complete the detection of the bad information becomes a key, and the task of detecting the bad information comprises the following two steps: extracting and describing image identification information; and analyzing and classifying the implicit semantic theme of the image. The following are an overview and analysis of the current research situation, the development dynamics and the application thereof in the detection of poor image information, respectively.
(1) Image authentication information extraction and description
A large number of skin areas and human body private organs are usually exposed in a poor image, and therefore, the skin areas need to be detected first, and then characteristic points with identification performance in the skin areas need to be extracted. Early poor image information detection neglected the representativeness of features in human skin color regions, and only skin colors were used as clues for discrimination, as proposed by Fleck et al, "finishing Naked peoples proceedings of the European Conference of Computer vision.1996, 2: 593- "method. Skin tone descriptions require a suitable color space. Research shows that in YCbCrC in color spacebCrOn the chrominance component, human skin regions have good clustering characteristics. In consideration of the influence of factors such as race, illumination color cast and the like on the skin color range, researchers often adopt a Gaussian mixture model GMM to count the distribution condition of skin colors.
(2) Image implicit semantic topic analysis and classification
At present, most bad information detection methods do not consider semantic information implied by an image, so that natural images containing a large number of skin color-like areas or naked normal areas of a human body are judged as bad information. Therefore, how to effectively distinguish image properties according to semantic features of an image is a key to improve the bad information detection technology. In order to analyze the semantic content of the image, inspired by a text-based implicit topic analysis method, a topic model represented by an implicit Dirichlet allocation LDA model is used by computer vision researchers to represent the semantics of the image. The LDA model is based on a bag-of-words model, i.e., the image is treated as a combination of a group of visual words, and the visual words have no sequential relationship. The bag of words model includes three steps: feature detection, feature description, and codebook generation. Features are typically detected using gaussian difference operators; sift (scale innovative feature transformation) descriptors have invariance such as scale, rotation and affine, and are commonly used for describing features; and finally, carrying out K-mean clustering on the description of the training image to obtain a codebook of the image. Firstly, extracting salient feature points from an image by using a Gaussian difference operator, then carrying out SIFT description on the features, and mapping the SIFT descriptor into a certain determined visual word in a codebook by using a vector quantization method. Thus, an image can be seen as being composed of several visual words, which can be represented as co-occurrence vectors of words depending on how often the visual words in the codebook occur in an image. The method comprises the steps of forming a co-occurrence matrix by co-occurrence vectors of a plurality of images, carrying out LDA modeling on the co-occurrence matrix, wherein an LDA model is a three-layer Bayesian network of image-subject-visual words, namely, the images are regarded as being composed of a plurality of implied subjects, the subjects are composed of a plurality of visual words, and the LDA model can deduce the mixed probability of the implied subjects in each image, so that the mixed probability of the images from the combination of high-dimensional word occurrence frequency to low-dimensional subjects can be reduced.
Poor image detection based on the topic model is just started, and Sheng Tang et al are in "PornProbe: (iii) the AnLDA-SVM based Port mapping Detection SystemthACMINETIONAL Conference on multimedia.2009, 2: 593-. Image semantic description based on topic analysis is one of the development directions with great potential for solving the problem of bad information detection.
The existing bad information classification method mainly has the following three main problems:
(1) the extracted human skin color region often contains interference information such as hair, which is not beneficial to the subsequent extraction of bad features.
(2) The skin color region textures of the images are similar, and the poor image features with the identifiability are often submerged in a large number of normal skin color texture features, so that the identifiability features cannot be effectively applied to a poor image classification task.
(3) The connection between the features and the image semantics is disjointed, and the correct detection rate is low due to the lack of the description of the image semantic information.
Disclosure of Invention
The invention aims to provide a bad image detection method based on implicit theme analysis aiming at the defects of the prior art, so that interference information such as hair regions in human skin color regions and the like can be removed, the image characteristic identification is increased, the semantic content of images is analyzed, and the detection rate of bad images is improved.
The technical idea for realizing the invention is as follows: firstly, extracting a skin area of an image, describing the skin area by a word bag model under cosine distance measurement, endowing words in the image with a certain weight by adopting a word frequency-inverse discriminative document frequency method tf-iddf, forming a vector by using the weight of the words to represent the co-occurrence characteristic of visual words in the image, analyzing a theme implied by the image by using an LDA model, and realizing poor image detection according to the similarity of the theme. The implementation process comprises the following steps:
(1) in the color space YCbCr, a double gaussian mixture model is constructed:
(1a) manually cutting an image I containing a skin area;
(1b) converting an image I from an RGB color space to a color space YCbCrWherein Y represents a luminance component, CbIs the blue chrominance component, CrIs the red chrominance component;
(1c) after removing the luminance component Y, at CbCrA Gaussian mixture model is adopted in the chromaticity space to establish a skin color model, and the probability density function of the Gaussian mixture model is as follows:
G ( x | K , ω , μ , Σ ) = Σ n = 1 K ω n N D ( x | μ n , Σ n )
where K is the number of Gaussian components and ω is (ω)1,ω2,…,ωK) Is the weight, Σ, of K independent Gaussian components in the hybrid modelKωn=1,∑=(∑1,∑2,…,∑K) And μ ═ μ (μ)1,μ2,…,μK) Respectively the covariance matrix and the mean vector of the respective gaussian components, N D ( x | μ n , Σ n ) = ( 2 π ) - D 2 | Σ n | - 1 2 exp { - 1 2 ( x - μ n ) T Σ n - 1 ( x - μ n ) } is a D-dimensional normal density function of the nth gaussian component;
estimating parameters omega, mu, sigma and K of the Gaussian mixture model by using an expectation maximization EM algorithm and a minimum description length criterion MDL, and establishing a skin color model;
(1d) manually cutting an image I containing a hair area, repeating the steps (1b) to (1c), and establishing a hair area model;
(1e) cascading a skin color model and a hair region model to establish a double-Gaussian mixture model;
(2) removing a hair region in a skin color region by using a Bayesian model;
(3) detecting the significant characteristic points in the image I in the skin color area by using a Gaussian difference operator, and removing the characteristic points of which the characteristic points are concentrated at the edge part of the skin color area to obtain an effective characteristic point set V';
(4) describing effective feature points in the feature point set V' by using Scale Invariant Feature Transform (SIFT) descriptors, and representing each feature point into a 128-dimensional feature vector f;
(5) obtaining SIFT descriptors of effective feature points of all images of normal and bad images in the training set through the steps (1) to (4), carrying out K-means clustering under cosine distance measure on all SIFT descriptors to obtain C clustering centers, and defining each clustering center as oneObtaining a codebook set W ═ W of the image for each visual word1,w2,L,wCW represents a visual word, and C represents the number of visual words in the codebook;
(6) calculating the distance between each SIFT descriptor and each visual word in the codebook through a vector quantization method for the SIFT descriptor of the effective feature point of each image in the training set, and quantizing the SIFT descriptor into the codebook word closest to the SIFT descriptor;
(7) according to the codebook words obtained in the step (5), counting word frequency-inverse discriminant document frequency tf-iddf values of all words in the jth image, and arranging the values into a co-occurrence vector d with weight according to the sequence of the words appearing in the codebookjTo represent the jth image;
(8) forming a co-occurrence matrix by the co-occurrence vectors of all the training images, and performing LDA modeling on the co-occurrence vectors by adopting an LDA model based on a Gibbs sampling algorithm to obtain a mixed theme distribution theta of the training images;
(9) inputting the mixed theme distribution theta of the training image and the class mark thereof into a BP neural network, and training a bad image classifier based on the BP neural network;
(10) obtaining SIFT descriptors of effective feature points of an image to be detected according to the steps (1) to (4), expressing the image to be detected into co-occurrence vectors of codebook words by using the vector quantization method and the tf-iddf method in the steps (6) to (7), and inputting the co-occurrence vectors into an LDA model to obtain theme distribution theta' of the image to be detected;
(11) and inputting the theme distribution theta' of the image to be detected into a bad image classifier based on a BP neural network, judging whether the image to be detected is a bad image or not, and finally finishing the detection of the bad image.
Compared with the existing bad information image detection method, the method has the following advantages:
1) the invention uses a double Gaussian mixture model, namely a Bi-GMM skin color model, so that the skin color detection is more robust, and the accuracy of skin area extraction is improved.
2) The invention uses the word frequency-inverse discriminative document frequency method to describe the co-occurrence frequency of the words, thereby improving the discriminative performance of the obvious visual features of the image and further improving the classification rate of the image.
3) According to the invention, the latent Dirichlet allocation LDA topic model is used for representing the semantics of the image, so that the influence of the skin color-like region on the recognition result is reduced.
The experimental result shows that compared with the existing method, the method for detecting the bad information image has the advantages that the accuracy rate of extracting the skin color area is higher, the identifiability of the image characteristics is stronger, and the detection rate of the bad image is obviously improved.
Drawings
The technical process of the present invention can be described in detail with reference to the following drawings.
FIG. 1 is a general flow chart of the present invention for bad information image detection;
FIG. 2 is a sub-flow diagram of the Bi-GMM skin tone model based on the double Gaussian mixture model of the present invention;
fig. 3 is a schematic diagram of an existing latent dirichlet allocation LDA model.
Detailed Description
Referring to fig. 1, the bad image detection method based on topic analysis of the present invention mainly includes the following two stages:
codebook training phase
Step 1, constructing a double-Gaussian mixture Bi-GMM model.
Referring to fig. 2, the specific implementation of this step is as follows:
1a) manually cutting an image I containing a skin area;
1b) converting an image I from an RGB color space to a color space YCbCrWherein Y represents a luminance component, CbIs the blue chrominance component, CrIs the red chrominance component;
1c) after removing the luminance component Y, at CbCrA Gaussian mixture model is adopted in the chromaticity space to establish a skin color model, and the probability density function of the Gaussian mixture model is as follows:
G ( x | ω , μ , Σ ) = Σ n = 1 K ω n N D ( x | μ n , Σ n )
where K is the number of Gaussian components and ω is (ω)1,ω2,…,ωK) Is the weight of K independent Gaussian components in a Gaussian mixture model, wherein omega n is more than 0 and less than 1, sigmaKωn=1,∑=(∑1,∑2,…,∑K) And μ ═ μ (μ)1,μ2,…,μK) Respectively, a covariance matrix and a mean vector of the respective Gaussian components, wherein N D ( x | μ n , Σ n ) = ( 2 π ) - D 2 | Σ n | - 1 2 exp { - 1 2 ( x - μ n ) T Σ n - 1 ( x - μ n ) } Is a D-dimensional normal density function of the nth gaussian component;
1d) determining the number K of Gaussian components of the probability density function of the Gaussian mixture model, the weight omega, the mean vector mu and the value of each parameter of the covariance matrix sigma:
1d1) randomly initializing the number K of Gaussian components;
1d2) estimating parameter values of a Gaussian mixture model weight omega, a mean vector mu and a covariance matrix sigma under the initialized K value by using an expectation maximization algorithm EM;
1d3) calculating the distance between every two Gaussian components in the Gaussian mixture model by using a distance formula d (l, m), selecting the two Gaussian components with the closest distance, and combining the two Gaussian components into one Gaussian component so as to reduce the number K of the Gaussian components by 1, wherein the formula d (l, m) is as follows:
d ( l , m ) = N ω ‾ l 2 log ( | Σ ( l , m ) | | Σ ‾ l | ) + N ω ‾ m 2 log ( | Σ ( l , m ) | | Σ ‾ m | )
wherein l and m respectively represent the ith and mth Gaussian components in the model, N represents the number of data samples,
Figure BDA0000102509420000062
respectively representing the weights of the ith and mth gaussian components,
Figure BDA0000102509420000063
covariance matrices, Σ, representing the ith and mth gaussian components, respectively(l,m)A covariance matrix representing the ith and mth gaussian components;
continuously combining two closest Gaussian components to obtain a new K value and calculating a corresponding minimum description length criterion MDL (K, theta), terminating iteration when K is 1, selecting the K value corresponding to the minimum MDL (K, theta) value in the iteration process as an optimal value, wherein the MDL (K, theta) formula is as follows:
MDL ( K , θ ) = - log G x ( x | K , θ ) + 1 2 L log ( NM )
where K represents the number of gaussian components, θ ═ represents the estimated parameters, (ω, μ, Σ), M represents the dimensionality of the data in the sample, N represents the number of data samples,
Figure BDA0000102509420000065
1d4) and (4) estimating the optimal values of the parameters omega, mu and sigma under the optimal K value by using an expectation maximization algorithm EM, and establishing a skin color model.
1e) Manually cutting an image I containing a hair area, and repeating the steps 1b) to 1d) to establish a hair area model;
1f) and cascading the skin color model and the hair region model to establish a double-Gaussian mixture model.
And 2, removing the hair region in the skin color region by using a Bayesian model.
After the double Gaussian mixture model is obtained, for each pixel point of the skin color area of the image I, the probability P of the skin color area model is calculated by using a Bayes formula1And probability P of belonging to hair region model2Probability P as belonging to the skin color region model1When the size is larger, the pixel point is reserved, otherwise, the pixel point is erased, finally, a hair area in a skin color area is removed, and the Bayesian formula is as follows:
p ( V i | θ , K ) = p ( θ , K | V i ) p ( V i ) p ( θ , K )
wherein, ViWhere i is 1 and 2 denotes the skin color region and the hair region, θ is (ω, μ, Σ), ω, μ, Σ denotes the weight, mean vector, and covariance matrix of the gaussian mixture model, and K denotes the number of gaussian components.
And 3, detecting salient feature points such as corners, spots and the like in the skin color area of the image by using a Gaussian difference DoG operator.
3a) Defining a difference of gaussians DoG operator: d (x, y, σ) ═ L (x, y, k)iσ)-L(x,y,kjσ), where L (x, y, k σ) is the image I (x, y) and a scale-variable Gaussian function
Figure BDA0000102509420000071
Convolution of (2);
3b) removing non-identifying characteristic points:
since the extreme values of the DoG image defined in step 2a) have larger principal curvatures across the edges and smaller principal curvatures in the vertical edge direction, feature points satisfying these principal curvatures are found and removed, so as to remove the influence of the edges on extracting the feature points, thereby detecting a set of salient visual feature points such as blobs, corners, and the like { F1(x,y,σ),F2(x,y,σ)L,FN(x, y, σ) }, where (x, y) denotes the coordinates of the feature point F, and s denotes the scale of the feature point F;
and 4, further removing the feature points positioned at the edge of the skin color area in the feature points detected in the step 3: since there are still a large number of salient feature points in the edge region where the skin intersects the background after the image is processed in step 3, these points are not discriminative for classification of bad information, and for this reason, to further remove these feature points, there remain discriminative local feature points F (x, y, σ).
And step 5, describing the local feature points F (x, y, sigma) retained in the step 4 by adopting Scale Invariant Feature Transform (SIFT) features, wherein each feature point is represented as a 128-dimensional feature vector F.
Step 6, training a codebook:
6a) performing the steps 1-5 on each image in a training set containing M normal images and poor images to obtain a feature matrix formed by feature vectors of the M images;
6b) performing K-means clustering on the feature vectors in the feature matrix under cosine distance measure to obtain C clustering centers of the features, defining each clustering center as a visual word, and forming a codebook set W ═ W of the image by using the C visual words1,w2,L,wCW represents a visual word, and C represents the number of visual words in the codebook;
6c) calculating the distance between each SIFT descriptor and each visual word in the codebook through a vector quantization method for the SIFT descriptor of the effective feature point of each image in the training set, and quantizing the SIFT descriptor into the codebook word closest to the SIFT descriptor;
6d) counting the ith word wiThe frequency of occurrence in the j-th image is nijCounting the frequency of all words in the jth image as
Figure BDA0000102509420000081
Calculating the word frequency Tf of the ith word in the jth image according to the following formulai,j
Tf i , j = n i , j Σ c = 1 c n c , j ;
6e) Statistics of contained words wiNumber m of normal images1,iStatistics include the word wiM of defective images2,iAnd calculating the ratio of the two words as the inverse discriminative document word frequency iddf by the following formula:
iddf i = log m 1 , i m 2 , i ;
6f) calculating the ith word t in the jth imageiOf (tf-iddf)i,jThe value:
(tf-iddf)i,j=tfi,j×iddfi
6g) counting word frequency-inverse discriminant document frequency tf-iddf values of all words in the jth image, and arranging the values into a weighted co-occurrence vector d according to the sequence of the words appearing in the codebookjComing watchShowing the jth image, and forming a co-occurrence matrix by the co-occurrence vectors of all the training images;
step 7, establishing an implicit Dirichlet allocation LDA model:
7a) by ziJ denotes the word wiAssign z to topic j, respectively1,z2,...,zCRandomly initialized to an integer between 1 and T, i.e. the word w1,w2,...,wCRandomly assigned to T topics;
7b) for each topic j from 1 to T, its posterior probability P (z) is calculated using the co-occurrence vectorsi=j|z-i,wi) Value, selecting P (z) in each themei=j|z-i,wi) The topic j with the largest value is ziA value of (b), wherein P (z)i=j|z-i,wi) The formula is as follows:
P ( z i = j | z - i , w i ) = n - i , j ( w i ) + β n - i , j ( g ) + Cβ · n - i , j ( d i ) + α n - i , · ( d i ) + Tα Σ j = 1 T n - i , j ( w i ) + β n - i , j ( g ) + Cβ · n - i , j ( d i ) + α n - i , · ( d i ) + Tα
in the formula, z-iDenotes all zk(k ≠ i) of the allocation,
Figure BDA0000102509420000085
representation assignment to topics j and wiThe word frequency-inverse discriminative document frequency tf-iddf value of the same word,
Figure BDA0000102509420000086
is the sum of the tf-iddf values assigned to all words of topic j, beta is empirically assigned to 0.01, C represents the number of words,
Figure BDA0000102509420000087
is an image diThe sum of the tf-iddf values of the words assigned to topic j,
Figure BDA0000102509420000088
is diThe sum of the tf-iddf values of all words in the set assigned with a topic, T representing the number of topics, and a being assigned as
Figure BDA0000102509420000091
7c) Will the word wiDispensing token ziThe value of i is cycled from 1 to C, and the word-assigned token z is obtained by step 7b)1,z2,...,zCOf each word w, completes each word w1,w2,...wCReassignment of the topic of (1);
7d) repeating steps 7b) -7c) when the posterior probability P (z) is reachedi=j|z-i,wi) When the value of (a) is not changed much, the iteration is terminated to obtain z1,z2,...zCThe optimal value is obtained, so that the assignment relation of the words and the topics is determined;
7e) after the distribution relation between the words and the topics is determined, the parameters of the topics of each image d in the training set are estimated, and the parameter estimation formula is as follows:
θ ^ j ( d ) = n j ( d ) + α n · ( d ) + Tα
wherein,
Figure BDA0000102509420000093
indicating for image d, a distribution of the terms over the T subjects,
Figure BDA0000102509420000094
representing the number of words in image d that are assigned to subject j,
Figure BDA0000102509420000095
representing the number of all words in image d to which a topic is assigned, and a is assigned as
Figure BDA0000102509420000096
7f) Obtaining a topic distribution theta of each training image d through the step 7e)(d)Counting the mixed topic distribution of each training image
Figure BDA0000102509420000097
Wherein
Figure BDA0000102509420000098
Is the probability of occurrence of the subject i in the image d, and T is the number of subjects.
The LDA model established by steps 7a) -7f) above is shown in fig. 3.
Step 8, based on the classifier design of the BP neural network, distributing the mixed subject of each training image
Figure BDA0000102509420000099
And inputting the class mark into a BP neural network to obtain a classifier D.
Second, image testing stage
And step A, inputting the test image into a double-Gaussian mixture model, and detecting a skin area according to Bayes discrimination.
And step B, adopting a Gaussian difference DoG operator in the training stage steps 3 and 4 to detect the significant feature points in the image skin area of the skin area processed in the step A, and further removing the non-discriminative feature points.
And C, performing Scale Invariant Feature Transform (SIFT) description on the feature points obtained in the step B, wherein each feature point is represented as a 128-dimensional feature vector.
And D, quantizing the SIFT descriptor in the image into a word in a codebook by a vector quantization method, and expressing the test image into a co-occurrence vector of the word by a word frequency-inverse discriminative document frequency method tf-iddf.
Step E, inputting the co-occurrence vector of the test image into the implicit Dirichlet distribution LDA model, and determining the theme distribution theta '([ theta ]) of the image'1,θ′2,...θ′T}。
And step F, inputting the theme distribution theta' of the image to be detected into the BP neural network classifier D obtained in the step 8 of the codebook training stage, counting 5 themes with the maximum probability of belonging to the image to be detected, and judging whether the image belongs to an undesirable image or not by combining a threshold value method.

Claims (1)

1. A bad image detection method based on implicit theme analysis comprises the following processes:
(1) in the color space YCbCrIn the method, a double-Gaussian mixture model is constructed:
(1a) manually cutting an image I containing a skin area;
(1b) converting an image I from an RGB color space to a color space YCbCrWherein Y represents a luminance component, CbIs the blue chrominance component, CrIs the red chrominance component;
(1c) removing brightnessAfter component Y, at CbCrA Gaussian mixture model is adopted in the chromaticity space to establish a skin color model, and the probability density function of the Gaussian mixture model is as follows:
G ( x | K , ω , μ , Σ ) = Σ n = 1 K ω n N D ( x | μ n , Σ n )
where K is the number of Gaussian components and ω is (ω)12,…,ωK) Is the weight of the K independent Gaussian components in the mixture model, SigmaKωn=1,Σ=(Σ12,…,ΣK) And μ ═ μ (μ)12,…,μK) Respectively the covariance matrix and the mean vector of the respective gaussian components, N D ( x | μ n , Σ n ) = ( 2 π ) - D 2 | Σ n | - 1 2 exp { - 1 2 ( x - μ n ) T Σ n - 1 ( x - μ n ) } is a D-dimensional normal density function of the nth gaussian component;
estimating parameters omega, mu, sigma and K of the Gaussian mixture model by using an expectation maximization EM algorithm and a minimum description length criterion MDL, and establishing a skin color model, wherein the parameter estimation steps are as follows:
(1c1) randomly initializing the number K of Gaussian components;
(1c2) estimating parameter values of a Gaussian mixture model weight omega, a mean vector mu and a covariance matrix sigma under the initialized K value by using an expectation maximization algorithm EM;
(1c3) calculating the distance between every two Gaussian components in the Gaussian mixture model by using a distance formula d (l, m), selecting the two Gaussian components with the closest distance, and combining the two Gaussian components into one Gaussian component so as to reduce the number K of the Gaussian components by 1, wherein the formula d (l, m) is as follows:
d ( l , m ) = N ω ‾ l 2 log ( | Σ ( l , m ) | | Σ ‾ l | ) + N ω ‾ m 2 log ( | Σ ( l , m ) | | Σ ‾ m | )
wherein l and m respectively represent the ith and mth Gaussian components in the model, N represents the number of data samples,
Figure FDA00002890835500014
respectively representing the weights of the ith and mth gaussian components,
Figure FDA00002890835500021
covariance matrices, Σ, representing the ith and mth gaussian components, respectively(l,m)A covariance matrix representing the ith and mth gaussian components;
continuously combining two Gaussian components with the nearest distance to obtain a new K value, calculating a corresponding minimum description length criterion MDL (K, omega), terminating iteration when K =1, selecting the K value corresponding to the minimum MDL (K, omega) value in the iteration process as an optimal value, wherein the MDL (K, omega) formula is as follows:
MDL ( K , Ω ) = - log G x ( x | K , Ω ) + 1 2 L log ( NM )
where K denotes the number of gaussian components, Ω ═ ω, μ, Σ denotes the estimated parameters, M denotes the dimensionality of the data in the samples, N denotes the number of data samples, L = K ( 1 + M + ( M + 1 ) M 2 ) - 1 ;
(1c4) estimating the optimal value of the parameter omega, mu, sigma under the optimal K value by using a maximum expectation algorithm EM;
(1d) manually cutting an image I containing a hair area, repeating the steps (1b) to (1c), and establishing a hair area model;
(1e) cascading a skin color model and a hair region model to establish a double-Gaussian mixture model;
(2) removing a hair region in a skin color region by using a Bayesian model;
(3) detecting the significant characteristic points in the image I in the skin color area by using a Gaussian difference operator, and removing the characteristic points of which the characteristic points are concentrated at the edge part of the skin color area to obtain an effective characteristic point set V';
(4) describing effective feature points in the feature point set V' by using Scale Invariant Feature Transform (SIFT) descriptors, and representing each feature point into a 128-dimensional feature vector f;
(5) obtaining SIFT descriptors of effective feature points of all images of normal and bad images in a training set through the steps (1) to (4), carrying out K-means clustering under cosine distance measure on all SIFT descriptors to obtain C clustering centers, defining each clustering center as a visual word, and obtaining a codebook set W of the images1,w2,…,wCW represents a visual word, and C represents the number of visual words in the codebook;
(6) calculating the distance between each SIFT descriptor and each visual word in the codebook through a vector quantization method for the SIFT descriptor of the effective feature point of each image in the training set, and quantizing the SIFT descriptor into the codebook word closest to the SIFT descriptor;
(7) according to the codebook words obtained in the step (5), counting word frequency-inverse discriminant document frequency tf-iddf values of all words in the jth image, and arranging the values into a co-occurrence vector d with weight according to the sequence of the words appearing in the codebookjTo represent the jth image;
the statistics of the word frequency-inverse discriminative document frequency tf-iddf values of all visual words in the jth image is carried out according to the following steps:
(7a) counting the ith word wiThe frequency of occurrence in the j-th image is nijCounting the frequency of all words in the jth image as
Figure FDA00002890835500031
Calculating the word frequency tf of the ith word in the jth image according to the following formulai,j
tf i , j = n i , j Σ c = 1 C n c , j ;
(7b) Statistics of contained words wiNumber m of normal images1,iStatistics include the word wiM of defective images2,iAnd calculating the ratio of the two words as the inverse discriminative document word frequency iddf by the following formula:
iddf i = log m 1 , i m 2 , i ;
(7c) calculating the ith word t in the jth imageiOf (tf-iddf)i,jThe value:
(tf-iddf)i,j=tfi,j×iddfi
(8) all training image co-occurrence vectors form a co-occurrence matrix, LDA modeling is carried out on the co-occurrence vectors by adopting an LDA model based on a Gibbs sampling algorithm to obtain the mixed topic distribution theta of the training images, and the LDA modeling steps are as follows:
(8a) by ziQ denotes the word wiAssigning z to the subject q, respectively1,z2,...,zCRandomly initialized to an integer between 1 and T, i.e. the word w1,w2,...,wCRandomly assigned to T topics;
(8b) for each topic from 1 to TqAll utilize the co-occurrence vector to calculate the posterior probability P (z)i=q|z_i,wi) Value, selecting P (z) in each themei=q|z_i,wi) The topic q with the largest value is ziA value of (b), wherein P (z)i=q|z_i,wi) The formula is as follows:
Figure FDA00002890835500034
in the formula, z \ uiRepresents for all zkK ≠ i of assignment relationship,
Figure FDA00002890835500035
representation assignment to topics q and wiThe tf-iddf value of the same word,
Figure FDA00002890835500036
is the sum of the tf-iddf values assigned to all words of the topic q, beta is empirically assigned to 0.01, C represents the number of words,
Figure FDA00002890835500041
is an image diThe sum of the tf-iddf values assigned to the words of the topic q,
Figure FDA00002890835500042
is diThe sum of the tf-iddf values of all words in the set assigned with a topic, T representing the number of topics, and a being assigned as
Figure FDA00002890835500043
(8c) Will the word wiDispensing token ziThe value of i is cycled from 1 to C, and the word assignment token z is obtained by step (8b)1,z2,...,zCOf each word w, completes each word w1,w2,...wCReassignment of the topic of (1);
(8d) repeating the steps (8b) - (8c) when the posterior probability P (z)i=q|z_i,wi) When the value of (a) is not changed much, the iteration is terminated to obtain z1,z2,...zCThe optimal value is obtained, so that the assignment relation of the words and the topics is determined;
(8e) after the assignment relationship between the words and the topics is determined, the parameters of the topics of each image d are estimated, and the parameter estimation formula is as follows:
θ ^ q ( d ) = n q ( d ) + α n · ( d ) + Tα
wherein,
Figure FDA00002890835500045
indicating for image d, a distribution of the terms over the T subjects,
Figure FDA00002890835500046
representing the number of words in image d that are assigned to subject q,
Figure FDA00002890835500047
representing the number of all words in image d to which a topic is assigned, and a is assigned as
Figure FDA00002890835500048
(8f) Obtaining the theme distribution theta of the images through the step (8e), and counting the mixed theme distribution of each training image θ ( d ) = { θ 1 ( d ) , θ 2 ( d ) , . . . θ T ( d ) } , Wherein
Figure FDA000028908355000410
Is the probability of occurrence of topic i, T is the number of topics;
(9) inputting the mixed theme distribution theta of the training image and the class mark thereof into a BP neural network, and training a bad image classifier based on the BP neural network;
(10) obtaining SIFT descriptors of effective feature points of an image to be detected according to the steps (1) to (4), expressing the image to be detected into co-occurrence vectors of codebook words by using the vector quantization method and the tf-iddf method in the steps (6) to (7), and inputting the co-occurrence vectors into an LDA model to obtain theme distribution theta' of the image to be detected;
(11) and inputting the theme distribution theta' of the image to be detected into a bad image classifier based on a BP neural network, judging whether the image to be detected is a bad image or not, and finally finishing the detection of the bad image.
CN 201110329875 2011-10-26 2011-10-26 Undesirable image detecting method based on connotative theme analysis Expired - Fee Related CN102360435B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110329875 CN102360435B (en) 2011-10-26 2011-10-26 Undesirable image detecting method based on connotative theme analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110329875 CN102360435B (en) 2011-10-26 2011-10-26 Undesirable image detecting method based on connotative theme analysis

Publications (2)

Publication Number Publication Date
CN102360435A CN102360435A (en) 2012-02-22
CN102360435B true CN102360435B (en) 2013-06-12

Family

ID=45585762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110329875 Expired - Fee Related CN102360435B (en) 2011-10-26 2011-10-26 Undesirable image detecting method based on connotative theme analysis

Country Status (1)

Country Link
CN (1) CN102360435B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440644A (en) * 2013-08-08 2013-12-11 中山大学 Multi-scale image weak edge detection method based on minimum description length

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855492B (en) * 2012-07-27 2015-02-04 中南大学 Classification method based on mineral flotation foam image
CN102968623B (en) * 2012-12-07 2015-12-23 上海电机学院 Face Detection system and method
CN103295031B (en) * 2013-04-15 2016-12-28 浙江大学 A kind of image object method of counting based on canonical risk minimization
CN103559510B (en) * 2013-11-12 2017-01-18 中国科学院自动化研究所 Method for recognizing social group behaviors through related topic model
CN103870563B (en) * 2014-03-07 2017-03-29 北京奇虎科技有限公司 It is determined that the method and apparatus of the theme distribution of given text
CN104918046B (en) * 2014-03-13 2019-11-05 中兴通讯股份有限公司 A kind of local description compression method and device
CN104134059B (en) * 2014-07-25 2017-07-14 西安电子科技大学 Keep the bad image detecting method under the mixing deformation model of colouring information
CN104318562B (en) * 2014-10-22 2018-02-23 百度在线网络技术(北京)有限公司 A kind of method and apparatus for being used to determine the quality of the Internet images
CN107273863B (en) * 2017-06-21 2019-07-23 天津师范大学 A kind of scene character recognition method based on semantic stroke pond
CN107480684B (en) * 2017-08-24 2020-06-05 成都澳海川科技有限公司 Image processing method and device
CN107968951B (en) * 2017-12-06 2019-07-23 重庆智韬信息技术中心 The method that Auto-Sensing and shielding are carried out to live video
CN108960042B (en) * 2018-05-17 2021-06-08 新疆医科大学第一附属医院 Echinococcus proctostermias survival rate detection method based on visual saliency and SIFT characteristics
CN109344857B (en) * 2018-08-14 2022-05-13 重庆邂智科技有限公司 Text similarity measurement method and device, terminal and storage medium
CN109583502B (en) * 2018-11-30 2022-11-18 天津师范大学 Pedestrian re-identification method based on anti-erasure attention mechanism
CN112446228B (en) * 2019-08-27 2022-04-01 北京易真学思教育科技有限公司 Video detection method and device, electronic equipment and computer storage medium
CN110956195B (en) * 2019-10-11 2023-06-02 平安科技(深圳)有限公司 Image matching method, device, computer equipment and storage medium
CN111612102B (en) * 2020-06-05 2023-02-07 华侨大学 Satellite image data clustering method, device and equipment based on local feature selection

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050014560A1 (en) * 2003-05-19 2005-01-20 Yacob Blumenthal Method and system for simulating interaction with a pictorial representation of a model
CN1323370C (en) * 2004-05-28 2007-06-27 中国科学院计算技术研究所 Method for detecting pornographic images
CN101996314B (en) * 2009-08-26 2012-11-28 厦门市美亚柏科信息股份有限公司 Content-based human body upper part sensitive image identification method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440644A (en) * 2013-08-08 2013-12-11 中山大学 Multi-scale image weak edge detection method based on minimum description length

Also Published As

Publication number Publication date
CN102360435A (en) 2012-02-22

Similar Documents

Publication Publication Date Title
CN102360435B (en) Undesirable image detecting method based on connotative theme analysis
CN106971174B (en) CNN model, CNN training method and CNN-based vein identification method
Ramachandra et al. Towards making morphing attack detection robust using hybrid scale-space colour texture features
CN105354554A (en) Color and singular value feature-based face in-vivo detection method
CN102103690A (en) Method for automatically portioning hair area
Abidin et al. Copy-move image forgery detection using deep learning methods: a review
CN104951940A (en) Mobile payment verification method based on palmprint recognition
Folego et al. From impressionism to expressionism: Automatically identifying van Gogh's paintings
CN104966075B (en) A kind of face identification method and system differentiating feature based on two dimension
CN1912889A (en) Deformed fingerprint identification method based on local triangle structure characteristic collection
CN112990120B (en) Cross-domain pedestrian re-identification method using camera style separation domain information
CN108205676A (en) The method and apparatus for extracting pictograph region
CN106778714B (en) LDA face identification method based on nonlinear characteristic and model combination
CN111428701B (en) Small-area fingerprint image feature extraction method, system, terminal and storage medium
CN109800762A (en) A kind of fuzzy license plate recognizer based on the Dynamic Matching factor
Van et al. Kinship verification based on local binary pattern features coding in different color space
Das et al. A robust method for detecting copy-move image forgery using stationary wavelet transform and scale invariant feature transform
CN116152870A (en) Face recognition method, device, electronic equipment and computer readable storage medium
CN104361339B (en) Slap shape Graph Extraction and recognition methods
Anbu et al. A comprehensive survey of detecting tampered images and localization of the tampered region
Raghavendra et al. Multimodal biometric score fusion using gaussian mixture model and Monte Carlo method
Sahbi et al. Coarse to fine face detection based on skin color adaption
Tome et al. Scenario-based score fusion for face recognition at a distance
CN111127407B (en) Fourier transform-based style migration forged image detection device and method
CN109165542A (en) Pedestrian detection method based on simplified convolutional neural network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130612

Termination date: 20181026

CF01 Termination of patent right due to non-payment of annual fee