CN102360435B

CN102360435B - Undesirable image detecting method based on connotative theme analysis

Info

Publication number: CN102360435B
Application number: CN 201110329875
Authority: CN
Inventors: 田春娜; 高新波; 王华青; 李东阳; 袁博; 赵林; 李洁; 蒲倩; 王代富; 季秀云
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2011-10-26
Filing date: 2011-10-26
Publication date: 2013-06-12
Anticipated expiration: 2031-10-26
Also published as: CN102360435A

Abstract

The invention discloses an undesirable image detecting method based on connotative theme analysis, which is substantially used for solving the problem of wrong judgment on normal images resulting from semantic information consideration failure in the present undesirable information detecting method. The scheme is as follows: extracting a skin region of an image by a double-blending Gaussian model; generating a codebook base containing distinguishing features in the skin region by a word bag model, and representing each training image to a group of word co-occurrence vectors with weights via a word frequency-inverse identification file frequency method; forming all co-occurrence vectors to a co-occurrence matrix, performing LDA model creation on the co-occurrence matrix to obtain the theme of the image; inputting the mixed theme of the training image in a BP neural network to train an undesirable image classifier; and obtaining the theme of an image to be measured, inputting the theme to the undesirable image classifier, and judging whether the theme is an undesirable image so as to finish the undesirable image detection. As shown in the test, the invention can be used for better distinguish the undesirable images and the normal images, so that the invention can be used for filtering the erotic information in the images.

Description

Bad image detection method based on implicit theme analysis

Technical Field

The invention belongs to the cross field of computer vision and pattern recognition, and particularly relates to a semantic classification method for bad images based on implicit topic analysis, which can be used for filtering pornographic information in images.

Background

With the explosion of the internet in the 90 s of the 20 th century, various kinds of information on the network are rapidly growing and spreading. Particularly, with the coming of the 3G era and the introduction of the converged network concept, images are rapidly spread in the field of instant messaging by taking multimedia messages, mobile phone video streams and the like as carriers, wherein the images contain a large amount of unhealthy information such as obscene pornography and the like. The spread of a large amount of bad information has adverse effects on physical and psychological health of people, so that the advanced bad information filtering method has profound significance. For filtering bad information, how to correctly complete the detection of the bad information becomes a key, and the task of detecting the bad information comprises the following two steps: extracting and describing image identification information; and analyzing and classifying the implicit semantic theme of the image. The following are an overview and analysis of the current research situation, the development dynamics and the application thereof in the detection of poor image information, respectively.

(1) Image authentication information extraction and description

A large number of skin areas and human body private organs are usually exposed in a poor image, and therefore, the skin areas need to be detected first, and then characteristic points with identification performance in the skin areas need to be extracted. Early poor image information detection neglected the representativeness of features in human skin color regions, and only skin colors were used as clues for discrimination, as proposed by Fleck et al, "finishing Naked peoples proceedings of the European Conference of Computer vision.1996, 2: 593- "method. Skin tone descriptions require a suitable color space. Research shows that in YC_bC_rC in color space_bC_rOn the chrominance component, human skin regions have good clustering characteristics. In consideration of the influence of factors such as race, illumination color cast and the like on the skin color range, researchers often adopt a Gaussian mixture model GMM to count the distribution condition of skin colors.

(2) Image implicit semantic topic analysis and classification

At present, most bad information detection methods do not consider semantic information implied by an image, so that natural images containing a large number of skin color-like areas or naked normal areas of a human body are judged as bad information. Therefore, how to effectively distinguish image properties according to semantic features of an image is a key to improve the bad information detection technology. In order to analyze the semantic content of the image, inspired by a text-based implicit topic analysis method, a topic model represented by an implicit Dirichlet allocation LDA model is used by computer vision researchers to represent the semantics of the image. The LDA model is based on a bag-of-words model, i.e., the image is treated as a combination of a group of visual words, and the visual words have no sequential relationship. The bag of words model includes three steps: feature detection, feature description, and codebook generation. Features are typically detected using gaussian difference operators; sift (scale innovative feature transformation) descriptors have invariance such as scale, rotation and affine, and are commonly used for describing features; and finally, carrying out K-mean clustering on the description of the training image to obtain a codebook of the image. Firstly, extracting salient feature points from an image by using a Gaussian difference operator, then carrying out SIFT description on the features, and mapping the SIFT descriptor into a certain determined visual word in a codebook by using a vector quantization method. Thus, an image can be seen as being composed of several visual words, which can be represented as co-occurrence vectors of words depending on how often the visual words in the codebook occur in an image. The method comprises the steps of forming a co-occurrence matrix by co-occurrence vectors of a plurality of images, carrying out LDA modeling on the co-occurrence matrix, wherein an LDA model is a three-layer Bayesian network of image-subject-visual words, namely, the images are regarded as being composed of a plurality of implied subjects, the subjects are composed of a plurality of visual words, and the LDA model can deduce the mixed probability of the implied subjects in each image, so that the mixed probability of the images from the combination of high-dimensional word occurrence frequency to low-dimensional subjects can be reduced.

Poor image detection based on the topic model is just started, and Sheng Tang et al are in "PornProbe: (iii) the AnLDA-SVM based Port mapping Detection System^thACMINETIONAL Conference on multimedia.2009, 2: 593-. Image semantic description based on topic analysis is one of the development directions with great potential for solving the problem of bad information detection.

The existing bad information classification method mainly has the following three main problems:

(1) the extracted human skin color region often contains interference information such as hair, which is not beneficial to the subsequent extraction of bad features.

(2) The skin color region textures of the images are similar, and the poor image features with the identifiability are often submerged in a large number of normal skin color texture features, so that the identifiability features cannot be effectively applied to a poor image classification task.

(3) The connection between the features and the image semantics is disjointed, and the correct detection rate is low due to the lack of the description of the image semantic information.

Disclosure of Invention

The invention aims to provide a bad image detection method based on implicit theme analysis aiming at the defects of the prior art, so that interference information such as hair regions in human skin color regions and the like can be removed, the image characteristic identification is increased, the semantic content of images is analyzed, and the detection rate of bad images is improved.

The technical idea for realizing the invention is as follows: firstly, extracting a skin area of an image, describing the skin area by a word bag model under cosine distance measurement, endowing words in the image with a certain weight by adopting a word frequency-inverse discriminative document frequency method tf-iddf, forming a vector by using the weight of the words to represent the co-occurrence characteristic of visual words in the image, analyzing a theme implied by the image by using an LDA model, and realizing poor image detection according to the similarity of the theme. The implementation process comprises the following steps:

(1) in the color space YCbCr, a double gaussian mixture model is constructed:

(1a) manually cutting an image I containing a skin area;

(1b) converting an image I from an RGB color space to a color space YC_bC_rWherein Y represents a luminance component, C_bIs the blue chrominance component, C_rIs the red chrominance component;

(1c) after removing the luminance component Y, at C_bC_rA Gaussian mixture model is adopted in the chromaticity space to establish a skin color model, and the probability density function of the Gaussian mixture model is as follows:

G (x | K, ω, μ, Σ) = Σ_{n = 1}^{K} ω_{n} N^{D} (x | μ_{n}, Σ_{n})

where K is the number of Gaussian components and ω is (ω)₁，ω₂，…，ω_K) Is the weight, Σ, of K independent Gaussian components in the hybrid model_Kω_n＝1，∑＝(∑₁，∑₂，…，∑_K) And μ ═ μ (μ)₁，μ₂，…，μ_K) Respectively the covariance matrix and the mean vector of the respective gaussian components,

N^{D} (x | μ_{n}, Σ_{n}) = {(2 π)}^{- \frac{D}{2}} {| Σ_{n} |}^{- \frac{1}{2}} \exp {- \frac{1}{2} {(x - μ_{n})}^{T} Σ_{n}^{- 1} (x - μ_{n})}

is a D-dimensional normal density function of the nth gaussian component;

estimating parameters omega, mu, sigma and K of the Gaussian mixture model by using an expectation maximization EM algorithm and a minimum description length criterion MDL, and establishing a skin color model;

(1d) manually cutting an image I containing a hair area, repeating the steps (1b) to (1c), and establishing a hair area model;

(1e) cascading a skin color model and a hair region model to establish a double-Gaussian mixture model;

(2) removing a hair region in a skin color region by using a Bayesian model;

(3) detecting the significant characteristic points in the image I in the skin color area by using a Gaussian difference operator, and removing the characteristic points of which the characteristic points are concentrated at the edge part of the skin color area to obtain an effective characteristic point set V';

(4) describing effective feature points in the feature point set V' by using Scale Invariant Feature Transform (SIFT) descriptors, and representing each feature point into a 128-dimensional feature vector f;

(5) obtaining SIFT descriptors of effective feature points of all images of normal and bad images in the training set through the steps (1) to (4), carrying out K-means clustering under cosine distance measure on all SIFT descriptors to obtain C clustering centers, and defining each clustering center as oneObtaining a codebook set W ═ W of the image for each visual word₁，w₂，L，w_CW represents a visual word, and C represents the number of visual words in the codebook;

(6) calculating the distance between each SIFT descriptor and each visual word in the codebook through a vector quantization method for the SIFT descriptor of the effective feature point of each image in the training set, and quantizing the SIFT descriptor into the codebook word closest to the SIFT descriptor;

(7) according to the codebook words obtained in the step (5), counting word frequency-inverse discriminant document frequency tf-iddf values of all words in the jth image, and arranging the values into a co-occurrence vector d with weight according to the sequence of the words appearing in the codebook_jTo represent the jth image;

(8) forming a co-occurrence matrix by the co-occurrence vectors of all the training images, and performing LDA modeling on the co-occurrence vectors by adopting an LDA model based on a Gibbs sampling algorithm to obtain a mixed theme distribution theta of the training images;

(9) inputting the mixed theme distribution theta of the training image and the class mark thereof into a BP neural network, and training a bad image classifier based on the BP neural network;

(10) obtaining SIFT descriptors of effective feature points of an image to be detected according to the steps (1) to (4), expressing the image to be detected into co-occurrence vectors of codebook words by using the vector quantization method and the tf-iddf method in the steps (6) to (7), and inputting the co-occurrence vectors into an LDA model to obtain theme distribution theta' of the image to be detected;

(11) and inputting the theme distribution theta' of the image to be detected into a bad image classifier based on a BP neural network, judging whether the image to be detected is a bad image or not, and finally finishing the detection of the bad image.

Compared with the existing bad information image detection method, the method has the following advantages:

1) the invention uses a double Gaussian mixture model, namely a Bi-GMM skin color model, so that the skin color detection is more robust, and the accuracy of skin area extraction is improved.

2) The invention uses the word frequency-inverse discriminative document frequency method to describe the co-occurrence frequency of the words, thereby improving the discriminative performance of the obvious visual features of the image and further improving the classification rate of the image.

3) According to the invention, the latent Dirichlet allocation LDA topic model is used for representing the semantics of the image, so that the influence of the skin color-like region on the recognition result is reduced.

The experimental result shows that compared with the existing method, the method for detecting the bad information image has the advantages that the accuracy rate of extracting the skin color area is higher, the identifiability of the image characteristics is stronger, and the detection rate of the bad image is obviously improved.

Drawings

The technical process of the present invention can be described in detail with reference to the following drawings.

FIG. 1 is a general flow chart of the present invention for bad information image detection;

FIG. 2 is a sub-flow diagram of the Bi-GMM skin tone model based on the double Gaussian mixture model of the present invention;

fig. 3 is a schematic diagram of an existing latent dirichlet allocation LDA model.

Detailed Description

Referring to fig. 1, the bad image detection method based on topic analysis of the present invention mainly includes the following two stages:

codebook training phase

Step 1, constructing a double-Gaussian mixture Bi-GMM model.

Referring to fig. 2, the specific implementation of this step is as follows:

1a) manually cutting an image I containing a skin area;

1b) converting an image I from an RGB color space to a color space YC_bC_rWherein Y represents a luminance component, C_bIs the blue chrominance component, C_rIs the red chrominance component;

1c) after removing the luminance component Y, at C_bC_rA Gaussian mixture model is adopted in the chromaticity space to establish a skin color model, and the probability density function of the Gaussian mixture model is as follows:

G (x | ω, μ, Σ) = Σ_{n = 1}^{K} ω_{n} N^{D} (x | μ_{n}, Σ_{n})

where K is the number of Gaussian components and ω is (ω)₁，ω₂，…，ω_K) Is the weight of K independent Gaussian components in a Gaussian mixture model, wherein omega n is more than 0 and less than 1, sigma_Kω_n＝1，∑＝(∑₁，∑₂，…，∑_K) And μ ═ μ (μ)₁，μ₂，…，μ_K) Respectively, a covariance matrix and a mean vector of the respective Gaussian components, wherein

N^{D} (x | μ_{n}, Σ_{n}) = {(2 π)}^{- \frac{D}{2}} {| Σ_{n} |}^{- \frac{1}{2}} \exp {- \frac{1}{2} {(x - μ_{n})}^{T} Σ_{n}^{- 1} (x - μ_{n})}

Is a D-dimensional normal density function of the nth gaussian component;

1d) determining the number K of Gaussian components of the probability density function of the Gaussian mixture model, the weight omega, the mean vector mu and the value of each parameter of the covariance matrix sigma:

1d1) randomly initializing the number K of Gaussian components;

1d2) estimating parameter values of a Gaussian mixture model weight omega, a mean vector mu and a covariance matrix sigma under the initialized K value by using an expectation maximization algorithm EM;

1d3) calculating the distance between every two Gaussian components in the Gaussian mixture model by using a distance formula d (l, m), selecting the two Gaussian components with the closest distance, and combining the two Gaussian components into one Gaussian component so as to reduce the number K of the Gaussian components by 1, wherein the formula d (l, m) is as follows:

d (l, m) = \frac{N {\overset{&OverBar;}{ω}}_{l}}{2} \log (\frac{| Σ_{(l, m)} |}{| {\overset{&OverBar;}{Σ}}_{l} |}) + \frac{N {\overset{&OverBar;}{ω}}_{m}}{2} \log (\frac{| Σ_{(l, m)} |}{| {\overset{&OverBar;}{Σ}}_{m} |})

wherein l and m respectively represent the ith and mth Gaussian components in the model, N represents the number of data samples,

respectively representing the weights of the ith and mth gaussian components,

covariance matrices, Σ, representing the ith and mth gaussian components, respectively_(l，m)A covariance matrix representing the ith and mth gaussian components;

continuously combining two closest Gaussian components to obtain a new K value and calculating a corresponding minimum description length criterion MDL (K, theta), terminating iteration when K is 1, selecting the K value corresponding to the minimum MDL (K, theta) value in the iteration process as an optimal value, wherein the MDL (K, theta) formula is as follows:

MDL (K, θ) = - \log G_{x} (x | K, θ) + \frac{1}{2} L \log (NM)

where K represents the number of gaussian components, θ ═ represents the estimated parameters, (ω, μ, Σ), M represents the dimensionality of the data in the sample, N represents the number of data samples,

1d4) and (4) estimating the optimal values of the parameters omega, mu and sigma under the optimal K value by using an expectation maximization algorithm EM, and establishing a skin color model.

1e) Manually cutting an image I containing a hair area, and repeating the steps 1b) to 1d) to establish a hair area model;

1f) and cascading the skin color model and the hair region model to establish a double-Gaussian mixture model.

And 2, removing the hair region in the skin color region by using a Bayesian model.

After the double Gaussian mixture model is obtained, for each pixel point of the skin color area of the image I, the probability P of the skin color area model is calculated by using a Bayes formula₁And probability P of belonging to hair region model₂Probability P as belonging to the skin color region model₁When the size is larger, the pixel point is reserved, otherwise, the pixel point is erased, finally, a hair area in a skin color area is removed, and the Bayesian formula is as follows:

p (V_{i} | θ, K) = \frac{p (θ, K | V_{i}) p (V_{i})}{p (θ, K)}

wherein, V_iWhere i is 1 and 2 denotes the skin color region and the hair region, θ is (ω, μ, Σ), ω, μ, Σ denotes the weight, mean vector, and covariance matrix of the gaussian mixture model, and K denotes the number of gaussian components.

And 3, detecting salient feature points such as corners, spots and the like in the skin color area of the image by using a Gaussian difference DoG operator.

3a) Defining a difference of gaussians DoG operator: d (x, y, σ) ═ L (x, y, k)_iσ)-L(x，y，k_jσ), where L (x, y, k σ) is the image I (x, y) and a scale-variable Gaussian function

Convolution of (2);

3b) removing non-identifying characteristic points:

since the extreme values of the DoG image defined in step 2a) have larger principal curvatures across the edges and smaller principal curvatures in the vertical edge direction, feature points satisfying these principal curvatures are found and removed, so as to remove the influence of the edges on extracting the feature points, thereby detecting a set of salient visual feature points such as blobs, corners, and the like { F₁(x，y，σ)，F₂(x，y，σ)L，F_N(x, y, σ) }, where (x, y) denotes the coordinates of the feature point F, and s denotes the scale of the feature point F;

and 4, further removing the feature points positioned at the edge of the skin color area in the feature points detected in the step 3: since there are still a large number of salient feature points in the edge region where the skin intersects the background after the image is processed in step 3, these points are not discriminative for classification of bad information, and for this reason, to further remove these feature points, there remain discriminative local feature points F (x, y, σ).

And step 5, describing the local feature points F (x, y, sigma) retained in the step 4 by adopting Scale Invariant Feature Transform (SIFT) features, wherein each feature point is represented as a 128-dimensional feature vector F.

Step 6, training a codebook:

6a) performing the steps 1-5 on each image in a training set containing M normal images and poor images to obtain a feature matrix formed by feature vectors of the M images;

6b) performing K-means clustering on the feature vectors in the feature matrix under cosine distance measure to obtain C clustering centers of the features, defining each clustering center as a visual word, and forming a codebook set W ═ W of the image by using the C visual words₁，w₂，L，w_CW represents a visual word, and C represents the number of visual words in the codebook;

6c) calculating the distance between each SIFT descriptor and each visual word in the codebook through a vector quantization method for the SIFT descriptor of the effective feature point of each image in the training set, and quantizing the SIFT descriptor into the codebook word closest to the SIFT descriptor;

6d) counting the ith word w_iThe frequency of occurrence in the j-th image is n_ijCounting the frequency of all words in the jth image as

Calculating the word frequency Tf of the ith word in the jth image according to the following formula_i，j：

{Tf}_{i, j} = \frac{n_{i, j}}{Σ_{c = 1}^{c} n_{c, j}};

6e) Statistics of contained words w_iNumber m of normal images_1，iStatistics include the word w_iM of defective images_2，iAnd calculating the ratio of the two words as the inverse discriminative document word frequency iddf by the following formula:

{iddf}_{i} = \log \frac{m_{1, i}}{m_{2, i}};

6f) calculating the ith word t in the jth image_iOf (tf-iddf)_i，jThe value:

(tf-iddf)_i，j＝tf_i，j×iddf_i，

6g) counting word frequency-inverse discriminant document frequency tf-iddf values of all words in the jth image, and arranging the values into a weighted co-occurrence vector d according to the sequence of the words appearing in the codebook_jComing watchShowing the jth image, and forming a co-occurrence matrix by the co-occurrence vectors of all the training images;

step 7, establishing an implicit Dirichlet allocation LDA model:

7a) by z_iJ denotes the word w_iAssign z to topic j, respectively₁，z₂，...，z_CRandomly initialized to an integer between 1 and T, i.e. the word w₁，w₂，...，w_CRandomly assigned to T topics;

7b) for each topic j from 1 to T, its posterior probability P (z) is calculated using the co-occurrence vectors_i＝j|z_-i，w_i) Value, selecting P (z) in each theme_i＝j|z_-i，w_i) The topic j with the largest value is z_iA value of (b), wherein P (z)_i＝j|z_-i，w_i) The formula is as follows:

P (z_{i} = j | z_{- i}, w_{i}) = \frac{\frac{n_{- i, j}^{(w_{i})} + β}{n_{- i, j}^{(g)} + Cβ} \cdot \frac{n_{- i, j}^{(d_{i})} + α}{n_{- i, \cdot}^{(d_{i})} + Tα}}{Σ_{j = 1}^{T} \frac{n_{- i, j}^{(w_{i})} + β}{n_{- i, j}^{(g)} + Cβ} \cdot \frac{n_{- i, j}^{(d_{i})} + α}{n_{- i, \cdot}^{(d_{i})} + Tα}}

in the formula, z_-iDenotes all z_k(k ≠ i) of the allocation,

representation assignment to topics j and w_iThe word frequency-inverse discriminative document frequency tf-iddf value of the same word,

is the sum of the tf-iddf values assigned to all words of topic j, beta is empirically assigned to 0.01, C represents the number of words,

is an image d_iThe sum of the tf-iddf values of the words assigned to topic j,

is d_iThe sum of the tf-iddf values of all words in the set assigned with a topic, T representing the number of topics, and a being assigned as

7c) Will the word w_iDispensing token z_iThe value of i is cycled from 1 to C, and the word-assigned token z is obtained by step 7b)₁，z₂，...，z_COf each word w, completes each word w₁，w₂，...w_CReassignment of the topic of (1);

7d) repeating steps 7b) -7c) when the posterior probability P (z) is reached_i＝j|z_-i，w_i) When the value of (a) is not changed much, the iteration is terminated to obtain z₁，z₂，...z_CThe optimal value is obtained, so that the assignment relation of the words and the topics is determined;

7e) after the distribution relation between the words and the topics is determined, the parameters of the topics of each image d in the training set are estimated, and the parameter estimation formula is as follows:

{\hat{θ}}_{j}^{(d)} = \frac{n_{j}^{(d)} + α}{n_{\cdot}^{(d)} + Tα}

wherein,

indicating for image d, a distribution of the terms over the T subjects,

representing the number of words in image d that are assigned to subject j,

representing the number of all words in image d to which a topic is assigned, and a is assigned as

7f) Obtaining a topic distribution theta of each training image d through the step 7e)^(d)Counting the mixed topic distribution of each training image

Wherein

Is the probability of occurrence of the subject i in the image d, and T is the number of subjects.

The LDA model established by steps 7a) -7f) above is shown in fig. 3.

Step 8, based on the classifier design of the BP neural network, distributing the mixed subject of each training image

And inputting the class mark into a BP neural network to obtain a classifier D.

Second, image testing stage

And step A, inputting the test image into a double-Gaussian mixture model, and detecting a skin area according to Bayes discrimination.

And step B, adopting a Gaussian difference DoG operator in the training stage steps 3 and 4 to detect the significant feature points in the image skin area of the skin area processed in the step A, and further removing the non-discriminative feature points.

And C, performing Scale Invariant Feature Transform (SIFT) description on the feature points obtained in the step B, wherein each feature point is represented as a 128-dimensional feature vector.

And D, quantizing the SIFT descriptor in the image into a word in a codebook by a vector quantization method, and expressing the test image into a co-occurrence vector of the word by a word frequency-inverse discriminative document frequency method tf-iddf.

Step E, inputting the co-occurrence vector of the test image into the implicit Dirichlet distribution LDA model, and determining the theme distribution theta '([ theta ]) of the image'₁，θ′₂，...θ′_T}。

And step F, inputting the theme distribution theta' of the image to be detected into the BP neural network classifier D obtained in the step 8 of the codebook training stage, counting 5 themes with the maximum probability of belonging to the image to be detected, and judging whether the image belongs to an undesirable image or not by combining a threshold value method.

Claims

1. A bad image detection method based on implicit theme analysis comprises the following processes:

(1) in the color space YC_bC_rIn the method, a double-Gaussian mixture model is constructed:

(1a) manually cutting an image I containing a skin area;

(1c) removing brightnessAfter component Y, at C_bC_rA Gaussian mixture model is adopted in the chromaticity space to establish a skin color model, and the probability density function of the Gaussian mixture model is as follows:

G (x | K, ω, μ, Σ) = Σ_{n = 1}^{K} ω_{n} N^{D} (x | μ_{n}, Σ_{n})

where K is the number of Gaussian components and ω is (ω)₁,ω₂,…,ω_K) Is the weight of the K independent Gaussian components in the mixture model, Sigma_Kω_n＝1，Σ＝(Σ₁,Σ₂,…,Σ_K) And μ ═ μ (μ)₁,μ₂,…,μ_K) Respectively the covariance matrix and the mean vector of the respective gaussian components,

N^{D} (x | μ_{n}, Σ_{n}) = {(2 π)}^{- \frac{D}{2}} {| Σ_{n} |}^{- \frac{1}{2}} \exp {- \frac{1}{2} {(x - μ_{n})}^{T} Σ_{n}^{- 1} (x - μ_{n})}

is a D-dimensional normal density function of the nth gaussian component;

estimating parameters omega, mu, sigma and K of the Gaussian mixture model by using an expectation maximization EM algorithm and a minimum description length criterion MDL, and establishing a skin color model, wherein the parameter estimation steps are as follows:

(1c1) randomly initializing the number K of Gaussian components;

(1c2) estimating parameter values of a Gaussian mixture model weight omega, a mean vector mu and a covariance matrix sigma under the initialized K value by using an expectation maximization algorithm EM;

(1c3) calculating the distance between every two Gaussian components in the Gaussian mixture model by using a distance formula d (l, m), selecting the two Gaussian components with the closest distance, and combining the two Gaussian components into one Gaussian component so as to reduce the number K of the Gaussian components by 1, wherein the formula d (l, m) is as follows:

d (l, m) = \frac{N {\overset{&OverBar;}{ω}}_{l}}{2} \log (\frac{| Σ_{(l, m)} |}{| {\overset{&OverBar;}{Σ}}_{l} |}) + \frac{N {\overset{&OverBar;}{ω}}_{m}}{2} \log (\frac{| Σ_{(l, m)} |}{| {\overset{&OverBar;}{Σ}}_{m} |})

respectively representing the weights of the ith and mth gaussian components,

covariance matrices, Σ, representing the ith and mth gaussian components, respectively_(l,m)A covariance matrix representing the ith and mth gaussian components;

continuously combining two Gaussian components with the nearest distance to obtain a new K value, calculating a corresponding minimum description length criterion MDL (K, omega), terminating iteration when K =1, selecting the K value corresponding to the minimum MDL (K, omega) value in the iteration process as an optimal value, wherein the MDL (K, omega) formula is as follows:

MDL (K, Ω) = - \log G_{x} (x | K, Ω) + \frac{1}{2} L \log (NM)

where K denotes the number of gaussian components, Ω ═ ω, μ, Σ denotes the estimated parameters, M denotes the dimensionality of the data in the samples, N denotes the number of data samples,

L = K (1 + M + \frac{(M + 1) M}{2}) - 1;

(1c4) estimating the optimal value of the parameter omega, mu, sigma under the optimal K value by using a maximum expectation algorithm EM;

(2) removing a hair region in a skin color region by using a Bayesian model;

(5) obtaining SIFT descriptors of effective feature points of all images of normal and bad images in a training set through the steps (1) to (4), carrying out K-means clustering under cosine distance measure on all SIFT descriptors to obtain C clustering centers, defining each clustering center as a visual word, and obtaining a codebook set W of the images₁,w₂,…,w_CW represents a visual word, and C represents the number of visual words in the codebook;

the statistics of the word frequency-inverse discriminative document frequency tf-iddf values of all visual words in the jth image is carried out according to the following steps:

(7a) counting the ith word w_iThe frequency of occurrence in the j-th image is n_ijCounting the frequency of all words in the jth image as

Calculating the word frequency tf of the ith word in the jth image according to the following formula_i,j：

{tf}_{i, j} = \frac{n_{i, j}}{Σ_{c = 1}^{C} n_{c, j}};

(7b) Statistics of contained words w_iNumber m of normal images_1,iStatistics include the word w_iM of defective images_2,iAnd calculating the ratio of the two words as the inverse discriminative document word frequency iddf by the following formula:

{iddf}_{i} = \log \frac{m_{1, i}}{m_{2, i}};

(7c) calculating the ith word t in the jth image_iOf (tf-iddf)_i,jThe value:

(tf-iddf)_i,j＝tf_i,j×iddf_i；

(8) all training image co-occurrence vectors form a co-occurrence matrix, LDA modeling is carried out on the co-occurrence vectors by adopting an LDA model based on a Gibbs sampling algorithm to obtain the mixed topic distribution theta of the training images, and the LDA modeling steps are as follows:

(8a) by z_iQ denotes the word w_iAssigning z to the subject q, respectively₁,z₂,...,z_CRandomly initialized to an integer between 1 and T, i.e. the word w₁,w₂,...,w_CRandomly assigned to T topics;

(8b) for each topic from 1 to TqAll utilize the co-occurrence vector to calculate the posterior probability P (z)_i＝q|z__i,w_i) Value, selecting P (z) in each theme_i＝q|z__i,w_i) The topic q with the largest value is z_iA value of (b), wherein P (z)_i＝q|z__i,w_i) The formula is as follows:

in the formula, z \ u_iRepresents for all z_kK ≠ i of assignment relationship,

representation assignment to topics q and w_iThe tf-iddf value of the same word,

is the sum of the tf-iddf values assigned to all words of the topic q, beta is empirically assigned to 0.01, C represents the number of words,

is an image d_iThe sum of the tf-iddf values assigned to the words of the topic q,

(8c) Will the word w_iDispensing token z_iThe value of i is cycled from 1 to C, and the word assignment token z is obtained by step (8b)₁,z₂,...,z_COf each word w, completes each word w₁，w₂，...w_CReassignment of the topic of (1);

(8d) repeating the steps (8b) - (8c) when the posterior probability P (z)_i＝q|z__i,w_i) When the value of (a) is not changed much, the iteration is terminated to obtain z₁，z₂，...z_CThe optimal value is obtained, so that the assignment relation of the words and the topics is determined;

(8e) after the assignment relationship between the words and the topics is determined, the parameters of the topics of each image d are estimated, and the parameter estimation formula is as follows:

{\hat{θ}}_{q}^{(d)} = \frac{n_{q}^{(d)} + α}{n_{\cdot}^{(d)} + Tα}

wherein,

indicating for image d, a distribution of the terms over the T subjects,

representing the number of words in image d that are assigned to subject q,

(8f) Obtaining the theme distribution theta of the images through the step (8e), and counting the mixed theme distribution of each training image

θ^{(d)} = {θ_{1}^{(d)}, θ_{2}^{(d)}, . . . θ_{T}^{(d)}},

Wherein

Is the probability of occurrence of topic i, T is the number of topics;