WO2016033965A1 - 图像分类器的生成方法、图像分类方法和装置 - Google Patents

图像分类器的生成方法、图像分类方法和装置 Download PDF

Info

Publication number
WO2016033965A1
WO2016033965A1 PCT/CN2015/075781 CN2015075781W WO2016033965A1 WO 2016033965 A1 WO2016033965 A1 WO 2016033965A1 CN 2015075781 W CN2015075781 W CN 2015075781W WO 2016033965 A1 WO2016033965 A1 WO 2016033965A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
model parameters
value
sample
hidden variable
Prior art date
Application number
PCT/CN2015/075781
Other languages
English (en)
French (fr)
Inventor
谢清鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2016033965A1 publication Critical patent/WO2016033965A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/87Arrangements for image or video recognition or understanding using pattern recognition or machine learning using selection of the recognition techniques, e.g. of a classifier in a multiple classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present invention relates to the field of image classification, and more particularly to a method of generating an image classifier, an image classification method and apparatus.
  • Hidden variables are comprehensive variables that cannot be directly observed but play an important role in practical applications, such as spatial relationships, data structures, and inline states. Hidden variables are widely used in machine vision, natural language processing, speech recognition, and public health. Experiments show that when dealing with objects such as images and speech, the introduction of hidden variables can capture more useful information, and the processing effect is significantly improved compared with the method of using only explicit variables.
  • HMM Hidden Markov Model
  • GMM Gaussian Mixture Model
  • CRF Conditional Random Field
  • LSVM Latent Support Vector Machine
  • DPM Deformable Part-based Model
  • the main part is used to describe the general outline of the object
  • the partial part is used to describe the detailed features of the detected object
  • the deformation penalty is used to protect It is proved that the position of each part relative to the body cannot be excessively offset.
  • the position of the part relative to the subject can be changed within a certain range, which can be regarded as a hidden variable, and training is performed by using LSVM.
  • is the model parameter of the classifier
  • y i represents the label of the training sample x i
  • s(x i , ⁇ ) represents the score of the sample x i , which is in all possible local relative positions (ie, the range of hidden variables) The best score in the score, which satisfies the formula (2):
  • z is a hidden variable
  • f is a feature extraction method
  • f(x i , z) is a feature vector of the sample x i , such as a frame gradient histogram feature used in DPM.
  • the objective function of LSVM (formula (1)) is semi-concave, that is, when the value of the implicit variable of the fixed positive sample is taken, the objective function is concave. Therefore, the solution of LSVM can use Coordinate Gradient Descent, which first fixes the model parameters of the classifier, obtains the value of the positive sample hidden variable, and fixes the value of the positive sample hidden variable to find the optimal model parameter and negative. The sample hidden variable takes a value and is iterated until it converges.
  • LSVM is mainly suitable for object detection.
  • LSVM is processed by transforming multiple types of problems in the field of object classification into two types of problems in the field of object detection.
  • the training processes of multiple classifiers for object classification are isolated from each other.
  • Embodiments of the present invention provide a method and apparatus for generating an image classifier to improve the accuracy of the classification result.
  • a method for generating an image classifier including: acquiring a training sample set, where the training sample set includes N image samples, the N image samples belong to K categories, and N and K are positive integers. N is greater than K; acquiring a feature vector of each of the image samples, wherein the feature vector includes a hidden variable of the image sample; and training the K by a multiple logistic regression model based on the hidden variable of the N image samples Category classifier.
  • the classifiers of the K categories respectively include K model parameters, and the hidden variables based on the N image samples are subjected to a multiple logistic regression model. Training the K categories of classifiers, comprising: obtaining initial values of the K model parameters; acquiring initial values of the hidden variables of the N image samples; and extracting feature vectors based on the N image samples, and An initial value of the hidden variables of the N image samples is described, and the classifiers of the K categories are trained by the multiple logistic regression model to determine target values of the K model parameters.
  • the initial values of the N image sample hidden variables include: an initial value and a negative image of a hidden variable of the positive image sample An initial value of a sample hidden variable, the feature vector based on the N image samples, and an initial value of the N image sample hidden variables, the classifier of the K categories being trained by the multiple logistic regression model Determining a target value of the K model parameters, including: based on a feature vector of the N image samples, and an initial value of the N image sample hidden variables, the training is performed by the multiple logistic regression model a classifier of K categories to determine a current value of the K model parameters, and when the current value of the K model parameters satisfies a preset convergence condition, determining a current value of the K model parameters as Determining a target value of the K model parameters, when the current values of the K model parameters do not satisfy the convergence condition, based on the feature vectors of the N image samples, and the current values
  • the feature vector based on the N image samples, and an initial of the N image sample hidden variables a value, by the multi-logistical regression model, training the classifiers of the K categories to determine current values of the K model parameters, including: feature vectors based on the N image samples, and the N An initial value of a hidden variable of an image sample, by which the classifiers of the K categories are trained to determine an iterative value of the K model parameters, based on a feature vector of the N image samples, and An iteration value of the K model parameters, determining an iteration value of the negative image sample hidden variable, and updating an initial value of the negative image sample hidden variable by using an iterative value of the negative image sample hidden variable, when the K When the iterative value of the model parameters satisfies the preset iteration stop condition, the iteration value of the K model parameters is determined as the current value of the K model parameters, otherwise, the
  • the feature vector based on the N image samples, and an initial of the N image sample hidden variables a value, through the multiple logistic regression model, training the classifiers of the K categories to determine an iterative value of the K model parameters, including: according to a formula Determining an iteration value of the K model parameters, wherein x i represents an i-th sample of the N image samples, ⁇ l represents a first model parameter of the K model parameters, and ⁇ represents a K-dimensional variable composed of the K model parameters, Represent model parameters corresponding to the type of x i, Z (x i) denotes the range of the hidden variable x i, z, f (x i, z) represents the feature vector x i.
  • the Determining an iterative value of the K model parameters including: according to a formula Determining the gradient corresponding to ⁇ k , wherein Representing the partial derivative function of l( ⁇ ) with respect to ⁇ k , ⁇ k represents the kth model parameter of the K model parameters, and z i ( ⁇ k ) represents the initial of the hidden variable of x i when the model parameter is ⁇ k value, f (x i, z i ( ⁇ k)) represents a hidden variable z value when feature vector x i z i ( ⁇ k); ⁇ k based on the corresponding gradient, l ( ⁇ ) is the objective function An iterative value of the ⁇ k is determined using a gradient ascent algorithm.
  • the iterative stop condition is that the change of the target function value l( ⁇ ) is less than a preset threshold; or The iterative stop condition is that the number of iterations reaches a preset number of times.
  • the Determining an iterative value of the K model parameters including: according to a formula Calculating an iterative value of the K model parameters in parallel, wherein l LC ( ⁇ ) is converted from a logarithm of the logarithm in l( ⁇ ),
  • the feature vector based on the N image samples, and an iteration value of the K model parameters Determining an iterative value of the negative image sample hidden variable, including: according to a formula Determining an iteration value of the negative image sample hidden variable, wherein x i represents an i-th sample of the N image samples, and ⁇ t represents a t- th model parameter of the K model parameters, and X i represents the model parameters corresponding to the class, the Z (x i) denotes the range of the hidden variable x i, z, F (x i, z) represents the feature vector x i, An iteration value representing a hidden variable of x i when the model parameter is ⁇ t , i is an arbitrary integer from 1 to N, and t is an arbitrary integer from 1 to K.
  • the feature vector based on the N image samples, and a current value of the K model parameters Determining a current value of the positive image sample hidden variable, including: according to a formula Determining a current value of the positive image sample hidden variable, wherein x i represents an i-th sample of the N image samples, X i represents the model parameters corresponding to the class, the Z (x i) denotes the range of the hidden variable x i, z, F (x i, z) represents the feature vector x i, Indicates that the model parameter is When x i is the current value of the hidden variable, i is any integer from 1 to N.
  • the determining, based on an initial value of each of the model parameters, a hidden variable of each of the image samples Initial values including: according to the formula Determining an initial value of a hidden variable for each of the image samples, wherein x i represents an i-th sample of the N image samples, and ⁇ k represents a k-th model parameter of the K model parameters, Z ( x i) represents a hidden variable x i is in the range of z, F (x i, z) represents the feature vector x i, An initial value representing the implicit variable z of x i when the model parameter is ⁇ k , i is an arbitrary integer from 1 to N, and k is an arbitrary integer from 1 to K.
  • an image classification method including: acquiring a feature vector of an image to be classified; determining, according to the feature vector of the image to be classified, a class of the image to be classified by using K classifiers, where K classifiers are K classifiers trained by using the first aspect or any one of the first aspects; according to the formula Determining a probability that the image to be classified is in the K categories, wherein x represents the image to be classified, ⁇ k represents a model parameter of the kth classifier in the K classifiers, f(x, z) represents a feature vector of x, and Z(x) represents a hidden variable z of x
  • the value range, k is any integer from 1 to K.
  • an apparatus for generating an image classifier including: a first acquiring unit, configured to acquire a training sample set, where the training sample set includes N image samples, and the N image samples belong to K categories, N, K is a positive integer, N is greater than K; a second acquiring unit is configured to acquire a feature vector of each of the image samples acquired by the first acquiring unit, where the feature vector includes a hidden variable of the image sample; a training unit, configured to train the classifiers of the K categories by a multiple logistic regression model based on the hidden variables of the N image samples acquired by the second acquiring unit.
  • the classifiers of the K categories respectively include K model parameters, where the training unit is specifically configured to obtain initial values of the K model parameters; An initial value of a hidden variable of the N image samples; and the K values are trained by the multiple logistic regression model based on a feature vector of the N image samples and an initial value of the N image sample hidden variables A classifier of the class to determine a target value of the K model parameters.
  • the initial values of the N image sample hidden variables include: an initial value of a positive image sample hidden variable and an initial value of a negative image sample hidden variable
  • the training The unit is specifically configured to: based on the feature vectors of the N image samples, and initial values of the N image sample hidden variables, the classifiers of the K categories are trained by the multiple logistic regression model to determine the a current value of the K model parameters, when the current value of the K model parameters satisfies a preset convergence condition, determining a current value of the K model parameters as a target value of the K model parameters, When the current value of the K model parameters does not satisfy the convergence condition, determining a current value of the hidden variable of the positive image sample based on the feature vector of the N image samples and the current value of the K model parameters, And updating the initial value of the positive image sample hidden variable by using the current value of the positive image sample hidden variable, and repeating the step until the current value of the K model parameters satisfies the convergence
  • the training unit is specifically configured to train the classification of the K categories by the multiple logistic regression model based on feature vectors of the N image samples and initial values of the N image sample hidden variables And determining an iteration value of the K model parameters, determining an iterative value of the hidden variable of the negative image sample based on a feature vector of the N image samples, and an iteration value of the K model parameters, and utilizing An iterative value of the negative image sample hidden variable updates an initial value of the negative image sample implicit variable, and when the iterative value of the K model parameters satisfies a preset iterative stop condition, iteratively uses the K model parameters The value is determined as the current value of the K model parameters, otherwise, the step is repeated until the current values of the K model parameters satisfy the iterative stop condition.
  • the training unit is specifically used according to a formula Determining an iteration value of the K model parameters, wherein x i represents an i-th sample of the N image samples, ⁇ l represents a first model parameter of the K model parameters, and ⁇ represents a K-dimensional variable composed of the K model parameters, Represent model parameters corresponding to the type of x i, Z (x i) denotes the range of the hidden variable x i, z, f (x i, z) represents the feature vector x i.
  • the training unit is specifically used according to a formula Determining the gradient corresponding to ⁇ k , wherein Representing the partial derivative function of l( ⁇ ) with respect to ⁇ k , ⁇ k represents the kth model parameter of the K model parameters, and z i ( ⁇ k ) represents the initial of the hidden variable of x i when the model parameter is ⁇ k value, f (x i, z i ( ⁇ k)) represents a hidden variable z value when feature vector x i z i ( ⁇ k); ⁇ k based on the corresponding gradient, l ( ⁇ ) is the objective function An iterative value of the ⁇ k is determined using a gradient ascent algorithm.
  • the iterative stop condition is that the change of the target function value l( ⁇ ) is less than a preset threshold; or The iterative stop condition is that the number of iterations reaches a preset number of times.
  • the training unit is specifically used according to a formula Calculating an iterative value of the K model parameters in parallel, wherein l LC ( ⁇ ) is converted from a logarithm of the logarithm in l( ⁇ ),
  • the training unit is specifically used according to a formula Determining an iteration value of the negative image sample hidden variable, wherein x i represents an i-th sample of the N image samples, and ⁇ t represents a t- th model parameter of the K model parameters, and X i represents the model parameters corresponding to the class, the Z (x i) denotes the range of the hidden variable x i, z, F (x i, z) represents the feature vector x i, An iteration value representing a hidden variable of x i when the model parameter is ⁇ t , i is an arbitrary integer from 1 to N, and t is an arbitrary integer from 1 to K.
  • the training unit is specifically used according to a formula Determining a current value of the positive image sample hidden variable, wherein x i represents an i-th sample of the N image samples, X i represents the model parameters corresponding to the class, the Z (x i) denotes the range of the hidden variable x i, z, F (x i, z) represents the feature vector x i, Indicates that the model parameter is When x i is the current value of the hidden variable, i is any integer from 1 to N.
  • the training unit is specifically configured to Determining an initial value of a hidden variable for each of the image samples, wherein x i represents an i-th sample of the N image samples, and ⁇ k represents a k-th model parameter of the K model parameters, Z ( x i) represents a hidden variable x i is in the range of z, F (x i, z) represents the feature vector x i, An initial value representing the implicit variable z of x i when the model parameter is ⁇ k , i is an arbitrary integer from 1 to N, and k is an arbitrary integer from 1 to K.
  • a fourth aspect provides an image classification apparatus, including: a first acquiring unit, configured to acquire a feature vector of an image to be classified; a first determining unit, configured to use K classifiers based on a feature vector of the image to be classified Determining, for the class of the image to be classified, wherein the K classifiers are K classifiers trained by using any one of the third aspect or the third aspect; and the second determining unit is configured to use the formula Determining a probability that the image to be classified is in the K categories, wherein x represents the image to be classified, ⁇ k represents a model parameter of the kth classifier in the K classifiers, f(x, z) represents a feature vector of x, and Z(x) represents a hidden variable z of x
  • the value range, k is any integer from 1 to K.
  • K classifiers are simultaneously trained in the form of maximum likelihood through a multivariate logistic regression model, that is, the use of the multiple logistic regression model preserves the correlation between the classifiers of the K categories, and Compared with the way that LVSM converts the K-class classification problem in the object classification field into multiple second-class problems, the training result is more accurate.
  • FIG. 1 is a schematic flowchart of a method for generating an image classifier according to an embodiment of the present invention.
  • FIG. 2 is a diagram showing an example of classifying an image using a classifier parameter trained in an embodiment of the present invention.
  • FIG. 3 is a diagram showing an example of classifying an image using a classifier parameter trained in an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of an apparatus for generating an image classifier according to an embodiment of the present invention.
  • FIG. 5 is a schematic configuration diagram of an apparatus for generating an image classifier according to an embodiment of the present invention.
  • FIG. 6 is a schematic flowchart of an image classification method according to an embodiment of the present invention.
  • Fig. 7 is a schematic block diagram of an image classifying apparatus of an embodiment of the present invention.
  • Fig. 8 is a schematic block diagram of an image classifying apparatus according to an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a method for generating an image classifier according to an embodiment of the present invention.
  • the method of Figure 1 includes:
  • the training sample set includes N image samples, N image samples belong to K categories, N and K are positive integers, and N is greater than K.
  • image features and hidden variables can be selected according to the application scenario or actual needs.
  • image features can be selected (or defined as) Histogram of Oriented Gradient (HOG), Local Binary Patterns (LBP), or Haar
  • hidden variables can be selected (or defined as) objects. The position in the image, the relative position between the part and the subject in the image, or the subcategory of the object.
  • the acquired feature vector of each image is not a fixed value, and will change with the change of the hidden variable, assuming the image x i
  • the hidden variable is z, and the extracted feature vector can be represented by f(x, z).
  • the classifiers of the K categories are trained by the multiple logistic regression model.
  • K classifiers are simultaneously trained in the form of maximum likelihood through a multivariate logistic regression model, that is, the use of the multiple logistic regression model preserves the correlation between the classifiers of the K categories, and Compared with the way that LVSM converts the K-class classification problem in the object classification field into multiple second-class problems, the training result is more accurate.
  • step 130 may include: acquiring initial values of K model parameters; acquiring initial values of hidden variables of N image samples; extracting feature vectors based on N image samples, and N image sample hidden
  • the initial values of the variables are trained by the multivariate logistic regression model to classify the classifiers of the K categories to determine the target values of the K model parameters.
  • the hidden variable of an image sample may include K initial values, that is, the hidden variable of an image sample has a corresponding initial value under the initial value of a model parameter.
  • an initial value of N*K hidden variables can be obtained.
  • the initial values of the hidden variables of the N image samples include: an initial value of a hidden variable of the positive image sample and an initial value of a hidden variable of the negative image sample, the feature vector based on the N image samples, and
  • the initial values of the N image samples are trained by the multivariate logistic regression model to determine the target values of the K model parameters, which may include: feature vectors based on N image samples, and N image samples
  • the initial value of the hidden variable is trained by the multivariate logistic regression model to classify the K classifiers to determine the current values of the K model parameters.
  • K models are used.
  • the current value of the parameter is determined as the target value of the K model parameters.
  • the positive image is determined based on the feature vector of the N image samples and the current values of the K model parameters.
  • the current value of the sample hidden variable and update the positive image sample with the current value of the positive image sample hidden variable
  • the initial value of the hidden variable is repeatedly executed until the current value of the K model parameters satisfies the convergence condition.
  • a hidden variable of an image sample may have different initial values under different model parameters, that is, a hidden variable of an image sample may include K initial values, and initial values of the N image sample hidden variables may include : K * N initial values.
  • An image sample is a positive sample under the model parameter corresponding to the image sample category, and the initial values of the positive image sample hidden variable include a total of N initial values, which are initial values of the N image samples under the model parameters corresponding to the respective categories.
  • the remaining K*(N-1) initial values except the initial value of the positive image hidden variable are the initial values of the negative image sample hidden variables.
  • the multivariate logistic regression model has a concave shape and can be solved by the gradient ascending method.
  • the feature values of the N image samples and the initial values of the hidden variables of the N image samples are used to train K categories of classifiers to determine K models by using a multiple logistic regression model.
  • the current value of the parameter may include: an eigenvector based on the N image samples, and initial values of the hidden variables of the N image samples, and the classifiers of the K categories are trained to determine the iterative values of the K model parameters by the multiple logistic regression model.
  • determine the iterative value of the hidden variable of the negative image sample determine the iterative value of the hidden variable of the negative image sample, and update the initial value of the hidden variable of the negative image sample by using the iterative value of the negative variable of the negative image sample.
  • the iteration value of the K model parameters is determined as the current value of the K model parameters. Otherwise, the step is repeated until the current values of the K model parameters satisfy the iteration. Stop condition.
  • the purpose of optimizing the K model parameters is continuously updated by continuously updating the value of the negative sample hidden variable, thereby further improving the accuracy of the classification result.
  • the feature values of the N image samples and the initial values of the hidden variables of the N image samples are used to train K classifiers to determine K model parameters by using a multiple logistic regression model.
  • the iteration value can include: according to the formula Determining the iteration value of the K model parameters, wherein x i represents the i-th sample of the N image samples, ⁇ l represents the first model parameter of the K model parameters, and ⁇ represents the K-dimensional variable composed of K model parameters, Represent model parameters corresponding to the type of x i, Z (x i) denotes the range of the hidden variable x i, z, f (x i, z) represents the feature vector x i.
  • the above formula according to Determining the iterative values of the K model parameters may include: according to the formula Determining the gradient corresponding to ⁇ k , wherein Representing the partial derivative function of l( ⁇ ) with respect to ⁇ k , ⁇ k represents the kth model parameter of K model parameters, and z i ( ⁇ k ) represents the initial value of the hidden variable of x i when the model parameter is ⁇ k , f(x i , z i ( ⁇ k )) represents the eigenvector of x i when the hidden variable z takes the value z i ( ⁇ k ); based on the gradient corresponding to ⁇ k , using l( ⁇ ) as the objective function, the gradient is raised Algorithm to determine the iterative value of ⁇ k .
  • the iterative stop condition is that the change of the objective function value l( ⁇ ) is less than a preset threshold; or the iterative stop condition is that the number of iterations reaches a preset number of times.
  • the above formula according to Determining the iterative values of the K model parameters may include: according to the formula Iteratively calculating the iterative values of the K model parameters in parallel, where l LC ( ⁇ ) is transformed from the upper bound of the logarithm in l( ⁇ ),
  • the above objective function l( ⁇ ) has a logarithmic additive function. Therefore, it cannot be decomposed into a superposition of K-type sub-problems, and it is impossible to accelerate the optimization process by parallel or distributed computing.
  • the Log-concavity Bound is used to convert the objective function l( ⁇ ) into a K-sub-problem sum by using a log-concavity Bound.
  • Parallel computing is implemented to accelerate the convergence of the algorithm.
  • the gradient of the classifier parameters is as follows:
  • the auxiliary parameter a i is taken as:
  • the foregoing eigenvectors based on the N image samples and the iteration values of the K model parameters, determining the iterative value of the negative image sample hidden variable may include: according to the formula Determining an iterative value of a negative variable of the negative image sample, wherein x i represents an i- th sample of the N image samples, and ⁇ t represents a t- th model parameter of the K model parameters, and X i represents the model parameters corresponding to the class, the Z (x i) denotes the range of the hidden variable x i, z, F (x i, z) represents the feature vector x i, An iteration value representing a hidden variable of x i when the model parameter is ⁇ t , i is an arbitrary integer from 1 to N, and t is an arbitrary integer from 1 to K.
  • the foregoing determining the current value of the hidden variable of the positive image sample according to the feature vector of the N image samples and the current value of the K image samples may include: according to the formula Determining a current value of a positive variable of the image sample, wherein x i represents an i-th sample of the N image samples, X i represents the model parameters corresponding to the class, the Z (x i) denotes the range of the hidden variable x i, z, F (x i, z) represents the feature vector x i, Indicates that the model parameter is When x i is the current value of the hidden variable, i is any integer from 1 to N.
  • determining, according to an initial value of each model parameter, an initial value of a hidden variable of each image sample may include: according to a formula Determining an initial value of a hidden variable for each image sample, where x i represents an ith sample of the N image samples, ⁇ k represents a kth model parameter of the K model parameters, and Z(x i ) represents x i
  • the range of values of the hidden variable z, f(x i ,z) represents the eigenvector of x i
  • Training sample set ⁇ (x 1 , y 1 ), ..., (x N , y N ) ⁇ , the initial value of all hidden variables.
  • training sample set ⁇ (x 1 , y 1 ),...,(x N ,y N ) ⁇ , initial implicit variable value ⁇ h ⁇ .
  • MLLR Multinomial Latent Logistic Regression
  • FIG. 2 is a diagram showing an example of classifying an image using a classifier parameter trained in an embodiment of the present invention.
  • mammals were classified as subjects, and a total of 6 types of mammals were included, each of which had about 50 pictures.
  • 50% of the images were taken as training, and 50% of the images were used as tests.
  • the HOG feature is used in the image feature.
  • the hidden variable is the position of the object to be detected in the picture, and the size of the frame where the object is located is required to be more than 30% of the total picture size.
  • Linear SVM, LSVM and MLLR the test results are as follows:
  • test results show that the accuracy of MLLR exceeds LSVM, and the effect of classifiers trained by LSVM and MLLR is better than traditional linear SVM.
  • the first column is a schematic diagram of the classifier trained by the linear SVM (using the HOG feature), and the second column is a schematic diagram of the classifier trained by the MLLR.
  • the rectangular frame in the small picture in Figure 2 is the position of the object detected by the MLLR.
  • FIG. 3 is a diagram showing an example of classifying an image using a classifier parameter trained in an embodiment of the present invention.
  • Figure 3 takes the sports characters as the research object. It includes 6 types of action (cricket batting, cricket throwing, volleyball smashing, croquet hitting, tennis forehand and tennis serve).
  • the image features still use HOG
  • the hidden variable model uses DPM, that is, the object position and the relative position of the local body are used as hidden variables.
  • the results showed that the classification accuracy rate MLLR (78.3%) exceeded LSVM (74.4%).
  • the first column is a schematic diagram of the main body model in the picture
  • the second column is a schematic diagram of the partial model in the picture. In the small picture in FIG.
  • the dark rectangular frame represents the main body position
  • the light colored rectangular frame represents the local position. It should be understood that, in various embodiments of the present invention, the size of the sequence numbers of the above processes does not mean the order of execution, and the order of execution of each process should be determined by its function and internal logic, and should not be taken to the embodiments of the present invention.
  • the implementation process constitutes any limitation.
  • FIG. 4 to FIG. 5 a device for generating an image classifier according to an embodiment of the present invention will be described below.
  • FIG. 4 is a schematic structural diagram of an apparatus for generating an image classifier according to an embodiment of the present invention.
  • the apparatus 400 of Figure 4 includes:
  • the first obtaining unit 410 is configured to acquire a training sample set, where the training sample set includes N image samples, the N image samples belong to K categories, N and K are positive integers, and N is greater than K;
  • a second acquiring unit 420 configured to acquire a feature vector of each of the image samples acquired by the first acquiring unit 410, where the feature vector includes a hidden variable of the image sample;
  • the training unit 430 is configured to train the classifiers of the K categories by using a multiple logistic regression model based on the hidden variables of the N image samples acquired by the second acquiring unit 420.
  • K classifiers are simultaneously trained in the form of maximum likelihood through a multivariate logistic regression model, that is, the use of the multiple logistic regression model preserves the correlation between the classifiers of the K categories, and Compared with the way that LVSM converts the K-class classification problem in the object classification field into multiple second-class problems, the training result is more accurate.
  • the classifiers of the K categories respectively include K model parameters, where the training unit 430 is specifically configured to obtain initial values of the K model parameters; based on each of the model parameters An initial value determining an initial value of a hidden variable of each of the image samples; and based on a feature vector of the N image samples, and an initial value of the N image sample hidden variables, by the multiple logistic regression model,
  • the K categories of classifiers are trained to determine target values for the K model parameters.
  • the initial values of the N image sample hidden variables include: an initial value of a positive image sample hidden variable and an initial value of a negative image sample hidden variable, where the training unit 430 is specifically configured to be based on Determining a feature vector of the N image samples, and an initial value of the N image sample hidden variables, by using the multiple logistic regression model, training the classifiers of the K categories to determine a current state of the K model parameters a value, when a current value of the K model parameters satisfies a preset convergence condition, determining a current value of the K model parameters as a target value of the K model parameters, when the K model parameters are When the current value does not satisfy the convergence condition, determining a current value of the hidden variable of the positive image sample based on the feature vector of the N image samples and the current value of the K model parameters, and using the positive image Updating the initial value of the positive image sample hidden variable by the current value of the sample hidden variable, and repeating this step until the current value of the K model parameters is full
  • the training unit 430 is specifically configured to: according to the feature vector of the N image samples, and an initial value of the N image sample hidden variables, by using the multiple logistic regression model, training Determining the negative image sample by determining the iteration value of the K model parameters based on the feature vectors of the N image samples and the iteration values of the K model parameters An iteration value of the variable, and using an iterative value of the negative image sample hidden variable to update an initial value of the negative image sample hidden variable, when the iterative value of the K model parameters satisfies a preset iteration stop condition, The iteration values of the K model parameters are determined as the current values of the K model parameters. Otherwise, the step is repeated until the current values of the K model parameters satisfy the iteration stop condition.
  • the training unit 430 is specifically configured according to a formula. Determining an iteration value of the K model parameters, wherein x i represents an i-th sample of the N image samples, ⁇ l represents a first model parameter of the K model parameters, and ⁇ represents a K-dimensional variable composed of the K model parameters, Represent model parameters corresponding to the type of x i, Z (x i) denotes the range of the hidden variable x i, z, f (x i, z) represents the feature vector x i.
  • the training unit 430 is specifically configured according to a formula. Determining the gradient corresponding to ⁇ k , wherein Representing the partial derivative function of l( ⁇ ) with respect to ⁇ k , ⁇ k represents the kth model parameter of the K model parameters, and z i ( ⁇ k ) represents the initial of the hidden variable of x i when the model parameter is ⁇ k value, f (x i, z i ( ⁇ k)) represents a hidden variable z value when feature vector x i z i ( ⁇ k); ⁇ k based on the corresponding gradient, l ( ⁇ ) is the objective function An iterative value of the ⁇ k is determined using a gradient ascent algorithm.
  • the iterative stop condition is that the change of the target function value l( ⁇ ) is less than a preset threshold; or the iterative stop condition is that the number of iterations reaches a preset number of times.
  • the training unit 430 is specifically configured according to a formula. Calculating an iterative value of the K model parameters in parallel, wherein l LC ( ⁇ ) is converted from a logarithm of the logarithm in l( ⁇ ),
  • the training unit 430 is specifically configured according to a formula. Determining an iteration value of the negative image sample hidden variable, wherein x i represents an i-th sample of the N image samples, and ⁇ t represents a t- th model parameter of the K model parameters, and X i represents the model parameters corresponding to the class, the Z (x i) denotes the range of the hidden variable x i, z, F (x i, z) represents the feature vector x i, An iteration value representing a hidden variable of x i when the model parameter is ⁇ t , i is an arbitrary integer from 1 to N, and t is an arbitrary integer from 1 to K.
  • the training unit 430 is specifically configured according to a formula. Determining a current value of the positive image sample hidden variable, wherein x i represents an i-th sample of the N image samples, X i represents the model parameters corresponding to the class, the Z (x i) denotes the range of the hidden variable x i, z, F (x i, z) represents the feature vector x i, Indicates that the model parameter is When x i is the current value of the hidden variable, i is any integer from 1 to N.
  • the training unit 430 is specifically configured according to a formula. Determining an initial value of a hidden variable for each of the image samples, wherein x i represents an i-th sample of the N image samples, and ⁇ k represents a k-th model parameter of the K model parameters, Z ( x i) represents a hidden variable x i is in the range of z, F (x i, z) represents the feature vector x i, An initial value representing the implicit variable z of x i when the model parameter is ⁇ k , i is an arbitrary integer from 1 to N, and k is an arbitrary integer from 1 to K.
  • FIG. 5 is a schematic configuration diagram of an apparatus for generating an image classifier according to an embodiment of the present invention.
  • the apparatus 500 of Figure 5 includes:
  • a memory 510 configured to store a program
  • the processor 520 is configured to execute the program, when the program is executed, the processor 520 is specifically configured to acquire a training sample set, where the training sample set includes N image samples, and the N image samples belong to K categories, N, K are positive integers, N is greater than K; acquiring feature vectors of each of the image samples, wherein the feature vectors include hidden variables of image samples; based on hidden variables of the N image samples, The classifiers of the K categories are trained by a multiple logistic regression model.
  • K classifiers are simultaneously trained in the form of maximum likelihood through a multivariate logistic regression model, that is, the use of the multiple logistic regression model preserves the correlation between the classifiers of the K categories, and Compared with the way that LVSM converts the K-class classification problem in the object classification field into multiple second-class problems, the training result is more accurate.
  • the classifiers of the K categories respectively include K model parameters
  • the processor 520 is specifically configured to acquire initial values of the K model parameters; based on each of the model parameters.
  • An initial value determining an initial value of a hidden variable of each of the image samples; and based on a feature vector of the N image samples, and an initial value of the N image sample hidden variables, by the multiple logistic regression model,
  • the K categories of classifiers are trained to determine target values for the K model parameters.
  • the initial values of the N image sample hidden variables include: an initial value of a positive image sample hidden variable and an initial value of a negative image sample hidden variable, where the processor 520 is specifically configured to be based on Determining a feature vector of the N image samples, and an initial value of the N image sample hidden variables, by using the multiple logistic regression model, training the classifiers of the K categories to determine a current state of the K model parameters a value, when a current value of the K model parameters satisfies a preset convergence condition, determining a current value of the K model parameters as a target of the K model parameters a value, when the current value of the K model parameters does not satisfy the convergence condition, determining the positive image sample based on the feature vector of the N image samples and the current value of the K model parameters And updating the initial value of the positive image sample hidden variable by using the current value of the positive image sample hidden variable, and repeating the step until the current value of the K model parameters satisfies the convergence condition.
  • the processor 520 is specifically configured to: according to the feature vector of the N image samples, and an initial value of the N image sample hidden variables, by using the multiple logistic regression model, training Determining the negative image sample by determining the iteration value of the K model parameters based on the feature vectors of the N image samples and the iteration values of the K model parameters An iteration value of the variable, and using an iterative value of the negative image sample hidden variable to update an initial value of the negative image sample hidden variable, when the iterative value of the K model parameters satisfies a preset iteration stop condition, The iteration values of the K model parameters are determined as the current values of the K model parameters. Otherwise, the step is repeated until the current values of the K model parameters satisfy the iteration stop condition.
  • the processor 520 is specifically configured according to a formula. Determining an iteration value of the K model parameters, wherein x i represents an i-th sample of the N image samples, ⁇ l represents a first model parameter of the K model parameters, and ⁇ represents a K-dimensional variable composed of the K model parameters, Represent model parameters corresponding to the type of x i, Z (x i) denotes the range of the hidden variable x i, z, f (x i, z) represents the feature vector x i.
  • the processor 520 is specifically configured according to a formula. Determining the gradient corresponding to ⁇ k , wherein Representing the partial derivative function of l( ⁇ ) with respect to ⁇ k , ⁇ k represents the kth model parameter of the K model parameters, and z i ( ⁇ k ) represents the initial of the hidden variable of x i when the model parameter is ⁇ k value, f (x i, z i ( ⁇ k)) represents a hidden variable z value when feature vector x i z i ( ⁇ k); ⁇ k based on the corresponding gradient, l ( ⁇ ) is the objective function An iterative value of the ⁇ k is determined using a gradient ascent algorithm.
  • the iterative stop condition is that the change of the target function value l( ⁇ ) is less than a preset threshold; or the iterative stop condition is that the number of iterations reaches a preset number of times.
  • the processor 520 is specifically configured according to a formula. Calculating an iterative value of the K model parameters in parallel, wherein l LC ( ⁇ ) is converted from a logarithm of the logarithm in l( ⁇ ),
  • the processor 520 is specifically configured according to a formula. Determining an iteration value of the negative image sample hidden variable, wherein x i represents an i-th sample of the N image samples, and ⁇ t represents a t- th model parameter of the K model parameters, and X i represents the model parameters corresponding to the class, the Z (x i) denotes the range of the hidden variable x i, z, F (x i, z) represents the feature vector x i, An iteration value representing a hidden variable of x i when the model parameter is ⁇ t , i is an arbitrary integer from 1 to N, and t is an arbitrary integer from 1 to K.
  • the processor 520 is specifically configured according to a formula. Determining a current value of the positive image sample hidden variable, wherein x i represents an i-th sample of the N image samples, X i represents the model parameters corresponding to the class, the Z (x i) denotes the range of the hidden variable x i, z, F (x i, z) represents the feature vector x i, Indicates that the model parameter is When x i is the current value of the hidden variable, i is any integer from 1 to N.
  • the processor 520 is specifically configured according to a formula. Determining an initial value of a hidden variable for each of the image samples, wherein x i represents an i-th sample of the N image samples, and ⁇ k represents a k-th model parameter of the K model parameters, Z ( x i) represents a hidden variable x i is in the range of z, F (x i, z) represents the feature vector x i, An initial value representing the implicit variable z of x i when the model parameter is ⁇ k , i is an arbitrary integer from 1 to N, and k is an arbitrary integer from 1 to K.
  • FIG. 6 is a schematic flowchart of an image classification method according to an embodiment of the present invention.
  • the K classifiers trained by the method of FIG. 1 can be used to classify images, and the method of FIG. 6 includes:
  • ⁇ k represents the model parameter of the kth classifier in the K classifiers
  • f(x, z) represents the feature vector of x
  • Z(x) represents the value range of the hidden variable z of x
  • k is any integer from 1 to K.
  • the classification result of the existing LSVM only gives the classification of the image to be classified, but in actual situations, there may be a certain relationship between different types, and an image does not belong to which category. For example, you can classify the style of a building, including modern style, medieval style, etc. A building style in the image may adopt some modern styles and some medieval styles.
  • the classification results of existing LSVM It is obviously not accurate enough to show which architectural style the building in the image to be classified belongs to.
  • the probability of the image in each category is also given. Compared with the prior art, the probability interpretation of introducing the image classification result makes the description of the image classification result more accurate. .
  • FIG. 7 is a schematic block diagram of an apparatus for image classification according to an embodiment of the present invention.
  • the apparatus 700 of FIG. 7 can classify images using K classifiers trained by the apparatus 400 of FIG. 4, and the apparatus 700 includes:
  • a first acquiring unit 710 configured to acquire a feature vector of the image to be classified
  • a first determining unit 720 configured to determine, according to a feature vector of the image to be classified, a class of the image to be classified by using K classifiers;
  • a second determining unit 730 configured to Determining the probability of the image to be classified under K categories, wherein x represents the image to be classified, ⁇ k represents the model parameter of the kth classifier in the K classifiers, f(x, z) represents the feature vector of x, and Z(x) represents the value range of the hidden variable z of x, k is any integer from 1 to K.
  • the classification result of the existing LSVM only gives the classification of the image to be classified, but in actual situations, there may be a certain relationship between different types, and an image does not belong to which category. For example, you can classify the style of a building, including modern style, medieval style, etc. A building style in the image may adopt some modern styles and some medieval styles.
  • the classification results of existing LSVM It is obviously not accurate enough to show which architectural style the building in the image to be classified belongs to.
  • the probability of the image in each category is also given. Compared with the prior art, the probability interpretation of introducing the image classification result makes the description of the image classification result more accurate. .
  • FIG. 8 is a schematic block diagram of an apparatus for image classification according to an embodiment of the present invention.
  • the image classification device 800 in FIG. 8 can classify images using K classifiers trained by the device 500 of FIG. 5.
  • the method of FIG. 8 includes:
  • a memory 810 configured to store a program
  • a processor 820 configured to execute a program, when the program is executed, the program is used to acquire a feature vector of an image to be classified; and based on a feature vector of the image to be classified, use K classifiers to determine a category of the image to be classified.
  • K classifiers to determine a category of the image to be classified.
  • the classification result of the existing LSVM only gives the classification of the image to be classified, but in actual situations, there may be a certain relationship between different types, and an image does not belong to which category. For example, you can classify the style of a building, including modern style, medieval style, etc. A building style in the image may adopt some modern styles and some medieval styles.
  • the classification results of existing LSVM It is obviously not accurate enough to show which architectural style the building in the image to be classified belongs to.
  • the probability of the image in each category is also given. Compared with the prior art, the probability interpretation of introducing the image classification result makes the description of the image classification result more accurate. .
  • the term "and/or” is merely an association relationship describing an associated object, indicating that there may be three relationships.
  • a and/or B may indicate that A exists separately, and A and B exist simultaneously, and B cases exist alone.
  • the character "/" in this article generally indicates that the contextual object is an "or" relationship.
  • the disclosed systems, devices, and The method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present invention.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention contributes in essence or to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Abstract

一种图像分类器的生成方法和装置,该方法包括:获取训练样本集,训练样本集包括N个图像样本,N个图像样本属于K个类别,N、K为正整数,N大于K;获取每一个图像样本的特征向量,其中,特征向量包括图像样本的隐变量;基于N个图像样本的隐变量,通过多元逻辑回归模型,训练K个类别的分类器。通过多元逻辑回归模型,以最大似然的形式同时训练K个类别的分类器,也就是说,多元逻辑回归模型的使用保留了K个类别的分类器之间的相互关联,与LVSM将物体分类领域的K类分类问题转换成相互孤立的多个二类问题的方式相比,训练结果更加准确。

Description

图像分类器的生成方法、图像分类方法和装置
本申请要求于2014年09月05日提交中国专利局、申请号为201410453884.6、发明名称为“图像分类器的生成方法、图像分类方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及图像分类领域,并且更具体地,涉及一种图像分类器的生成方法、图像分类方法和装置。
背景技术
隐变量指不能直接被观测到,却在实际应用中起到重要作用的综合性变量,如空间关系、数据结构、内联状态等。隐变量广泛应用于机器视觉、自然语言处理、语音识别和公众健康等领域。实验证明,处理图像、语音等对象时,隐变量的引入能捕获更多的有用信息,与仅使用显变量的方式相比,处理效果显著提高。
早期的隐变量模型多为生成模型(generative models),如隐马尔可夫模型(Hidden Markov Model,HMM)、高斯混合模型(Gaussian Mixture Model,GMM)等。近期更多的研究者试图探寻判别模型(discriminative models)中引入隐变量的可能性。典型的例子如条件随机场(Conditional Random Field,CRF)、隐变量支持向量机(Latent Support Vector Machine,LSVM)等,这些模型在各自领域均取得了一定成果。值得一提的是,LSVM配合局部可变形模型(Deformable Part-based Model,DPM),即DPM-LSVM,在机器视觉中的物体检测领域已成为近年来较为成功的算法。DPM用于描述检测类别物体的特征,它由三部分组成:一个主体滤波器(root filter),多个局部滤波器(part filters),以及每个局部对应的形变惩罚(deformable costs)。主体部分用于描述物体的大体轮廓,局部部分用于描述检测物体的细节特征,形变惩罚用于保 证每个局部相对于主体的位置不能有过大的偏移。在物体检测过程中,局部相对于主体的位置可以在一定范围内变化,可看作隐变量,采用LSVM进行训练。
LSVM的目标函数形式与原始的SVM相似,如(1)所示:
Figure PCTCN2015075781-appb-000001
其中,β是分类器的模型参数,yi表示训练样本xi的标签,s(xi,β)表示样本xi的分数,这个分数是在所有可能局部相对位置(即隐变量取值范围)中最优的分数,该分数满足式(2):
Figure PCTCN2015075781-appb-000002
式(2)中,z为隐变量,f为特征提取方法,f(xi,z)为样本xi的特征向量,如DPM中使用框架梯度直方图特征。
可以证明LSVM的目标函数(式(1))具有半凹性,即固定正样本的隐变量取值时,目标函数是凹的。因此,LSVM的求解可使用坐标梯度下降(Coordinate Gradient Descent),即首先固定分类器的模型参数,求得正样本隐变量取值,再固定正样本隐变量取值,求最优模型参数和负样本隐变量取值,如此迭代直至收敛。
LSVM与SVM一样,主要适用于物体检测领域。当推广到物体分类领域时,LSVM的处理方式是将物体分类领域中的多类问题转化成物体检测领域的二类问题。采用此种处理方式,会使得用于物体分类的多个分类器的训练过程彼此孤立。实际中,多种物体类别之间可能存在一定的关联性,比如,将建筑物分成多类建筑风格,待分类图片中的建筑物可能同时具有两种或两种以上建筑风格的特征。因此,将多个分类器的训练过程转化成彼此孤立、非此即彼的多个二类问题,会导致分类结果不准确。
发明内容
本发明实施例提供一种图像分类器的生成方法和装置,以提高分类结果的准确性。
第一方面,提供一种图像分类器的生成方法,包括:获取训练样本集,所述训练样本集包括N个图像样本,所述N个图像样本属于K个类别,N、K为正整数,N大于K;获取每一个所述图像样本的特征向量,其中,所述特征向量包括图像样本的隐变量;基于所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器。
结合第一方面,在第一方面的一种实现方式中,所述K个类别的分类器分别包括K个模型参数,所述基于所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器,包括:获取所述K个模型参数的初始值;获取所述N个图像样本的隐变量的初始值;基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值。
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述N个图像样本隐变量的初始值包括:正图像样本隐变量的初始值和负图像样本隐变量的初始值,所述基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值,包括:基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的当前值,当所述K个模型参数的当前值满足预设的收敛条件时,将所述K个模型参数的当前值确定为所述K个模型参数的目标值,当所述K个模型参数的当前值不满足所述收敛条件时,基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,并利用所述正图像样本隐变量的当前值更新所述正图像样本隐变量的初 始值,重复执行本步骤直到所述K个模型参数的当前值满足所述收敛条件。
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的当前值,包括:基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,并利用所述负图像样本隐变量的迭代值更新所述负图像样本隐变量的初始值,当所述K个模型参数的迭代值满足预设的迭代停止条件时,将所述K个模型参数的迭代值确定为所述K个模型参数的当前值,否则,重复执行本步骤直到所述K个模型参数的当前值满足所述迭代停止条件。
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,包括:根据公式
Figure PCTCN2015075781-appb-000003
确定所述K个模型参数的迭代值,其中,
Figure PCTCN2015075781-appb-000004
xi表示所述N个图像样本中的第i样本,βl表示所述K个模型参数中的第l个模型参数,θ表示所述K个模型参数组成的K维变量,
Figure PCTCN2015075781-appb-000005
表示xi的类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量。
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述根据公式
Figure PCTCN2015075781-appb-000006
确定所述K个模型参数的 迭代值,包括:根据公式
Figure PCTCN2015075781-appb-000007
确定βk对应的梯度,其中,
Figure PCTCN2015075781-appb-000008
Figure PCTCN2015075781-appb-000009
Figure PCTCN2015075781-appb-000010
表示l(θ)关于βk的偏导函数,βk表示所述K个模型参数中的第k个模型参数,zik)表示模型参数为βk时xi的隐变量的初始值,f(xi,zik))表示隐变量z取值zik)时xi的特征向量;基于所述βk对应的梯度,以l(θ)为目标函数,采用梯度上升算法,确定所述βk的迭代值。
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述迭代停止条件为所述目标函数值l(θ)的变化小于预设阈值;或者,所述迭代停止条件为迭代次数达到预设次数。
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述根据公式
Figure PCTCN2015075781-appb-000011
确定所述K个模型参数的迭代值,包括:根据公式
Figure PCTCN2015075781-appb-000012
并行计算所述K个模型参数的迭代值,其中,lLC(θ)是对l(θ)中的对数取凹上界转化而来的,
Figure PCTCN2015075781-appb-000013
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,包括:根据公式
Figure PCTCN2015075781-appb-000014
确定所述负图像样本隐变量的迭代值,其中,xi表示所述N个图像样本中的第i样本,βt表示所述K个模型参数中的第t个模 型参数,且
Figure PCTCN2015075781-appb-000015
表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000016
表示模型参数为βt时xi隐变量的迭代值,i为1至N中的任意整数,t为1至K中的任意整数。
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,包括:根据公式
Figure PCTCN2015075781-appb-000017
确定所述正图像样本隐变量的当前值,其中,xi表示所述N个图像样本中的第i样本,
Figure PCTCN2015075781-appb-000018
表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000019
表示模型参数为
Figure PCTCN2015075781-appb-000020
时xi隐变量的当前值,i为1至N中的任意整数。
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述基于每一个所述模型参数的初始值,确定每一个所述图像样本的隐变量的初始值,包括:根据公式
Figure PCTCN2015075781-appb-000021
确定每一个所述图像样本的隐变量的初始值,其中,xi表示所述N个图像样本中的第i样本,βk表示所述K个模型参数中的第k个模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000022
表示模型参数为βk时xi隐变量z的初始值,i为1至N中的任意整数,k为1至K中的任意整数。
第二方面,提供一种图像分类方法,包括:获取待分类图像的特征向量;基于所述待分类图像的特征向量,利用K个分类器,确定所述待分类图像的类别,其中,所述K个分类器是利用第一方面或第一方面的任意一种实现方式训练出的K个分类器;根据公式
Figure PCTCN2015075781-appb-000023
确定所述待分类图像在所述K个类别下的概率,其中,
Figure PCTCN2015075781-appb-000024
x表示所述待分类图像,βk表示所述K个分类器中第k个分类器的模型参数,f(x,z)表示x的特征向量,Z(x)表示x的隐变量z的取值范围,k为1至K中的任意整 数。
第三方面,提供一种图像分类器的生成装置,包括:第一获取单元,用于获取训练样本集,所述训练样本集包括N个图像样本,所述N个图像样本属于K个类别,N、K为正整数,N大于K;第二获取单元,用于获取所述第一获取单元获取的每一个所述图像样本的特征向量,其中,所述特征向量包括图像样本的隐变量;训练单元,用于基于所述第二获取单元获取的所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器。
结合第二方面,在第二方面的一种实现方式中,所述K个类别的分类器分别包括K个模型参数,所述训练单元具体用于获取所述K个模型参数的初始值;获取所述N个图像样本的隐变量的初始值;基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值。
结合第三方面,在第三方面的一种实现方式中,所述N个图像样本隐变量的初始值包括:正图像样本隐变量的初始值和负图像样本隐变量的初始值,所述训练单元具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的当前值,当所述K个模型参数的当前值满足预设的收敛条件时,将所述K个模型参数的当前值确定为所述K个模型参数的目标值,当所述K个模型参数的当前值不满足所述收敛条件时,基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,并利用所述正图像样本隐变量的当前值更新所述正图像样本隐变量的初始值,重复执行本步骤直到所述K个模型参数的当前值满足所述收敛条件。
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方 式中,所述训练单元具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,并利用所述负图像样本隐变量的迭代值更新所述负图像样本隐变量的初始值,当所述K个模型参数的迭代值满足预设的迭代停止条件时,将所述K个模型参数的迭代值确定为所述K个模型参数的当前值,否则,重复执行本步骤直到所述K个模型参数的当前值满足所述迭代停止条件。
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述训练单元具体用于根据公式
Figure PCTCN2015075781-appb-000025
确定所述K个模型参数的迭代值,其中,
Figure PCTCN2015075781-appb-000026
Figure PCTCN2015075781-appb-000027
xi表示所述N个图像样本中的第i样本,βl表示所述K个模型参数中的第l个模型参数,θ表示所述K个模型参数组成的K维变量,
Figure PCTCN2015075781-appb-000028
表示xi的类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量。
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述训练单元具体用于根据公式
Figure PCTCN2015075781-appb-000029
确定βk对应的梯度,其中,
Figure PCTCN2015075781-appb-000030
Figure PCTCN2015075781-appb-000031
Figure PCTCN2015075781-appb-000032
表示l(θ)关于βk的偏导函数,βk表示所述K个模型参数中的第k个模型参数,zik)表示模型参数为βk时xi的隐变量的初始值,f(xi,zik))表示隐变量z取值zik)时xi的 特征向量;基于所述βk对应的梯度,以l(θ)为目标函数,采用梯度上升算法,确定所述βk的迭代值。
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述迭代停止条件为所述目标函数值l(θ)的变化小于预设阈值;或者,所述迭代停止条件为迭代次数达到预设次数。
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述训练单元具体用于根据公式
Figure PCTCN2015075781-appb-000033
并行计算所述K个模型参数的迭代值,其中,lLC(θ)是对l(θ)中的对数取凹上界转化而来的,
Figure PCTCN2015075781-appb-000034
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述训练单元具体用于根据公式
Figure PCTCN2015075781-appb-000035
确定所述负图像样本隐变量的迭代值,其中,xi表示所述N个图像样本中的第i样本,βt表示所述K个模型参数中的第t个模型参数,且
Figure PCTCN2015075781-appb-000036
表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000037
表示模型参数为βt时xi隐变量的迭代值,i为1至N中的任意整数,t为1至K中的任意整数。
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述训练单元具体用于根据公式
Figure PCTCN2015075781-appb-000038
确定所述正图像样本隐变量的当前值,其中,xi表示所述N个图像样本中的第i样本,
Figure PCTCN2015075781-appb-000039
表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000040
表示模型参数为
Figure PCTCN2015075781-appb-000041
时xi隐变量的当前值,i为1至N中的任意整数。
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方 式中,所述训练单元具体用于根据公式
Figure PCTCN2015075781-appb-000042
确定每一个所述图像样本的隐变量的初始值,其中,xi表示所述N个图像样本中的第i样本,βk表示所述K个模型参数中的第k个模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000043
表示模型参数为βk时xi隐变量z的初始值,i为1至N中的任意整数,k为1至K中的任意整数。
第四方面,提供一种图像分类装置,包括:第一获取单元,用于获取待分类图像的特征向量;第一确定单元,用于基于所述待分类图像的特征向量,利用K个分类器,确定所述待分类图像的类别,其中,所述K个分类器是利用第三方面或第三方面的任意一种实现方式训练出的K个分类器;第二确定单元,用于根据公式
Figure PCTCN2015075781-appb-000044
确定所述待分类图像在所述K个类别下的概率,其中,
Figure PCTCN2015075781-appb-000045
x表示所述待分类图像,βk表示所述K个分类器中第k个分类器的模型参数,f(x,z)表示x的特征向量,Z(x)表示x的隐变量z的取值范围,k为1至K中的任意整数。
本发明实施例中,通过多元逻辑回归模型,以最大似然的形式同时训练K个分类器,也就是说,多元逻辑回归模型的使用保留了K个类别的分类器之间的相互关联,与LVSM将物体分类领域的K类分类问题转换成相互孤立的多个二类问题的方式相比,训练结果更加准确。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例的图像分类器的生成方法的示意性流程图。
图2是利用本发明实施例训练出的分类器参数对图像分类的示例图。
图3是利用本发明实施例训练出的分类器参数对图像分类的示例图。
图4是本发明实施例的图像分类器的生成装置的示意性结构图。
图5是本发明实施例的图像分类器的生成装置的示意性结构图。
图6是本发明实施例的图像分类方法的示意性流程图。
图7是本发明实施例的图像分类装置的示意性框图。
图8是本发明实施例的图像分类装置的示意性框图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都应属于本发明保护的范围。
图1是本发明实施例的图像分类器的生成方法的示意性流程图。图1的方法包括:
110、获取训练样本集,训练样本集包括N个图像样本,N个图像样本属于K个类别,N、K为正整数,N大于K。
例如,训练样本集合D={(x1,y1),...,(xN,yN)},共包含N个图像样本,其中,yi为图像样本xi的标签,用于指示xi的类别,该类别为上述K个类别之一。
120、获取每一个图像样本的特征向量,其中,特征向量包括图像样本的隐变量。
应理解,图像特征和隐变量可以根据应用场景或实际需要选取。例如,图像特征可以选取(或定义为)方向梯度直方图(Histogram of Oriented Gradient,HOG),局部二值模式(Local Binary Patterns,LBP),或Haar等;隐变量可以选取(或定义为)物体在图像中的位置,图像中局部和主体间的相对位置,或物体的子类别等。基于上述选取的图像特征和隐变量,获取每 一个图像样本的特征向量,此时,获取的每个图像的特征向量并非一个固定值,会随着隐变量的变化而变化,假设图像xi的隐变量为z,提取出的特征向量可通过f(x,z)表示。
130、基于N个图像样本的隐变量,通过多元逻辑回归模型,训练K个类别的分类器。
本发明实施例中,通过多元逻辑回归模型,以最大似然的形式同时训练K个分类器,也就是说,多元逻辑回归模型的使用保留了K个类别的分类器之间的相互关联,与LVSM将物体分类领域的K类分类问题转换成相互孤立的多个二类问题的方式相比,训练结果更加准确。
可选地,作为一个实施例,步骤130可包括:获取K个模型参数的初始值;获取N个图像样本的隐变量的初始值;基于N个图像样本的特征向量,以及N个图像样本隐变量的初始值,通过多元逻辑回归模型,训练K个类别的分类器,以确定K个模型参数的目标值。
需要说明的是,一个图像样本的隐变量可包括K个初始值,也就是说,一个图像样本的隐变量在一个模型参数的初始值下会有一个对应的初始值。通过步骤130,可获取N*K个隐变量的初始值。
可选地,作为一个实施例,上述N个图像样本隐变量的初始值包括:正图像样本隐变量的初始值和负图像样本隐变量的初始值,上述基于N个图像样本的特征向量,以及N个图像样本的初始值,通过多元逻辑回归模型,训练K个类别的分类器,以确定K个模型参数的目标值,可包括:基于N个图像样本的特征向量,以及N个图像样本的隐变量的初始值,通过多元逻辑回归模型,训练K个类别的分类器,以确定K个模型参数的当前值,当K个模型参数的当前值满足预设的收敛条件时,将K个模型参数的当前值确定为K个模型参数的目标值,当K个模型参数的当前值不满足该收敛条件时,基于N个图像样本的特征向量,以及K个模型参数的当前值,确定正图像样本隐变量的当前值,并利用正图像样本隐变量的当前值更新该正图像样本 隐变量的初始值,重复执行本步骤直到K个模型参数的当前值满足收敛条件。
具体而言,一个图像样本的隐变量在不同模型参数下可具有不同的初始值,也就是说一个图像样本的隐变量可包括K个初始值,上述N个图像样本隐变量的初始值可包括:K*N个初始值。一个图像样本在该图像样本类别对应的模型参数下为正样本,上述正图像样本隐变量的初始值共包括N个初始值,分别是N个图像样本在各自类别对应的模型参数下的初始值。K*N个初始值中,除去上述正图像隐变量初始值之外剩余的K*(N-1)个初始值均为负图像样本隐变量的初始值。
可以证明,当正图像样本隐变量初始值固定时,多元逻辑回归模型具有凹性,可以通过梯度上升的方式求解。
可选地,作为一个实施例,上述基于N个图像样本的特征向量,以及N个图像样本的隐变量的初始值,通过多元逻辑回归模型,训练K个类别的分类器,以确定K个模型参数的当前值可包括:基于N个图像样本的特征向量,以及N个图像样本隐变量的初始值,通过多元逻辑回归模型,训练K个类别的分类器,以确定K个模型参数的迭代值,基于N个图像样本的特征向量,以及K个模型参数的迭代值,确定负图像样本隐变量的迭代值,并利用负图像样本隐变量的迭代值更新负图像样本隐变量的初始值,当K个模型参数的迭代值满足预设的迭代停止条件时,将K个模型参数的迭代值确定为K个模型参数的当前值,否则,重复执行本步骤直到K个模型参数的当前值满足迭代停止条件。
本发明实施例中,在固定正样本隐变量取值的情况下,通过不断更新负样本隐变量的取值达到优化K个模型参数的目的,进一步提高了分类结果的准确性。
可选地,作为一个实施例,上述基于N个图像样本的特征向量,以及N个图像样本隐变量的初始值,通过多元逻辑回归模型,训练K个类别的分类 器,以确定K个模型参数的迭代值可包括:根据公式
Figure PCTCN2015075781-appb-000046
确定K个模型参数的迭代值,其中,
Figure PCTCN2015075781-appb-000047
xi表示N个图像样本中的第i样本,βl表示K个模型参数中的第l个模型参数,θ表示K个模型参数组成的K维变量,
Figure PCTCN2015075781-appb-000048
表示xi的类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量。
可选地,作为一个实施例,上述根据公式
Figure PCTCN2015075781-appb-000049
确定K个模型参数的迭代值,可包括:根据公式
Figure PCTCN2015075781-appb-000050
确定βk对应的梯度,其中,
Figure PCTCN2015075781-appb-000051
Figure PCTCN2015075781-appb-000052
Figure PCTCN2015075781-appb-000053
表示l(θ)关于βk的偏导函数,βk表示K个模型参数中的第k个模型参数,zik)表示模型参数为βk时xi的隐变量的初始值,f(xi,zik))表示隐变量z取值zik)时xi的特征向量;基于βk对应的梯度,以l(θ)为目标函数,采用梯度上升算法,确定βk的迭代值。
可选地,作为一个实施例,上述迭代停止条件为目标函数值l(θ)的变化小于预设阈值;或者,迭代停止条件为迭代次数达到预设次数。
可选地,作为一个实施例,上述根据公式
Figure PCTCN2015075781-appb-000054
确定K个模型参数的迭代值,可包括:根据公式
Figure PCTCN2015075781-appb-000055
并行计算K个 模型参数的迭代值,其中,lLC(θ)是对l(θ)中的对数取凹上界转化而来的,
Figure PCTCN2015075781-appb-000056
上述目标函数l(θ)存在对数加和函数,因此,无法分解成K类子问题叠加的形式,也就无法采用并行或分布式计算对寻优过程进行加速。
本发明实施例中,利用对数具有凹性(Log-concavity),采用对数凹上界(Log-concavity Bound)将目标函数l(θ)转化为K类子问题加和的形式,从而可以实现并行计算,加速了算法的收敛。
具体而言,对数凹上界的形式为:
Figure PCTCN2015075781-appb-000057
利用该式就可以将l(θ)转化为:
Figure PCTCN2015075781-appb-000058
采用上式作为目标函数,利用梯度上升法求解时,分类器参数的梯度的形式如下:
Figure PCTCN2015075781-appb-000059
其中,辅助参数ai取值为:
Figure PCTCN2015075781-appb-000060
可选地,作为一个实施例,上述基于N个图像样本的特征向量,以及K个模型参数的迭代值,确定负图像样本隐变量的迭代值可包括:根据公式
Figure PCTCN2015075781-appb-000061
确定负图像样本隐变量的迭代值,其中,xi表示N个图像样本中的第i样本,βt表示K个模型参数中的第t个模型参数,且
Figure PCTCN2015075781-appb-000062
表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000063
表示模型参数为βt时xi隐变量的迭代值,i为1至N中的任意整数,t为1至K中的任意整数。
可选地,作为一个实施例,上述基于N个图像样本的特征向量,以及K个图像样本的当前值,确定正图像样本隐变量的当前值可包括:根据公式
Figure PCTCN2015075781-appb-000064
确定正图像样本隐变量的当前值,其中,xi表示N个图像样本中的第i样本,
Figure PCTCN2015075781-appb-000065
表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000066
表示模型参数为
Figure PCTCN2015075781-appb-000067
时xi隐变量的当前值,i为1至N中的任意整数。
可选地,作为一个实施例,上述基于每一个模型参数的初始值,确定每一个图像样本的隐变量的初始值可包括:根据公式
Figure PCTCN2015075781-appb-000068
确定每一个图像样本的隐变量的初始值,其中,xi表示N个图像样本中的第i样本,βk表示K个模型参数中的第k个模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000069
表示模型参数为βk时xi隐变量z的初始值,i为1至N中的任意整数,k为1至K中的任意整数。
下面将结合具体的例子,详细描述本发明实施例。应注意,这些例子只是为了帮助本领域技术人员更好地理解本发明实施例,而非限制本发明实施例的范围。
实施例1:
输入:训练样本集{(x1,y1),…,(xN,yN)},初始全部隐变量取值。
输出:分类器参数θ,θ={β1,...,βK}。
For outerLoop:=1 to numOuterLoop
//求解辅助目标函数l(θ,{Zp}),其中Zp代表正样本隐变量取值。
//内循环。
While(目标函数与上一轮相比的变化>阈值)//判断是否收敛
//更新分类器的参数
for k:=1 to K
Figure PCTCN2015075781-appb-000070
计算第k类分类器参数对应的梯度。
end for
//更新分类器的参数。
使用以上计算的梯度值利用梯度上升算法更新所有类别的分类器参数;
//更新负样本隐变量取值。
for i:=1 to N and k:=1 to K and yi≠k
利用
Figure PCTCN2015075781-appb-000071
计算各样本xi隐变量取值。
end for
end for while
//更新正样本隐变量取值。
for i:=1 to N
利用
Figure PCTCN2015075781-appb-000072
计算各样本xi隐变量取值。
end for
end for
实施例2:
输入:训练样本集{(x1,y1),…,(xN,yN)},初始隐变量取值{h}。
输出:分类器参数θ。
//外循环
For outerLoop:=1 to numOuterLoop
//求解辅助目标函数l(θ,{Zp}),其中Zp代表正样本隐变量取值。
//内循环
for innerLoop:=1 to numInnerLoop
//更新分类器的参数
for k:=1 to K
Figure PCTCN2015075781-appb-000073
计算第k类分类器参数对应的梯度。
end for
//更新分类器参数
使用以上计算的梯度值利用梯度上升算法更新所有类别的分类器参数
//更新负样本隐变量取值
for i:=1 to N and k:=1 to K and yi≠k
利用
Figure PCTCN2015075781-appb-000074
计算各样本xi隐变量取值。
end for
end for
//更新正样本隐变量取值
for i:=1 to N
利用
Figure PCTCN2015075781-appb-000075
计算各样本xi隐变量取值。
end for
end for
具体实现中,常数numOuterLoop和numInnerLoop的取值与应用场景有较大关系,如在数字识别(digit recognition)中,由于样本数量多,特征维度小,可以设numOuterLoop=50,numInnerLoop=1。
在更复杂的实例中,如样本数量小,特征维度高,可设numOuterLoop=5,numInnerLoop=1000。
下面给出训练出的分类器参数对图像分类的结果。需要说明的是,在下面的描述中,本发明实施例的分类器训练方式称为:隐变量多元逻辑回归(Multinomial Latent Logistic Regression,MLLR)。
图2是利用本发明实施例训练出的分类器参数对图像分类的示例图。图2的例子中以哺乳动物分类为研究对象,共包含6类哺乳动物,每类约50张图片。实验中取50%图片作为训练,另50%图像作为测试。图像特征方面使用HOG特征,隐变量为待检测物体在图片中的位置,并规定物体所在框的大小要在总图片大小的30%以上。线性SVM、LSVM和MLLR,测试结果如下:
表1哺乳动物分类实验分类结果
分类方法 线性SVM LSVM MLLR
准确率(%) 64.23 69.59 73.31
测试结果表明,MLLR的准确率超过LSVM,并且LSVM和MLLR两种隐变量方式训练出的分类器的效果均优于传统线性SVM方法。
图2中,第一列为线性SVM训练出的分类器示意图(采用HOG特征),第二列为MLLR训练出的分类器示意图。图2内小图片中的矩形框为MLLR检测出的物体位置。
图3是利用本发明实施例训练出的分类器参数对图像分类的示例图。图3以体育人物动作为研究对象,共包括6类动作(板球击球、板球投球、排球扣球、门球击球、网球正手和网球发球)。图像特征仍使用HOG,隐变量模型使用DPM,即物体位置和局部主体相对位置均作为隐变量。结果显示分类准确率MLLR(78.3%)超过LSVM(74.4%)。图3中,第一列为图片中的主体模型示意图,第二列为图片中的局部模型示意图,图3内小图片中深色矩形框代表主体位置,浅色矩形框代表局部位置。应理解,在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
上文中结合图1至图3,详细描述了根据本发明实施例的图像分类器的生成方法,下面将结合图4至图5,描述根据本发明实施例的图像分类器的生成装置。
应理解,根据本发明实施例的图像分类器的生成装置能够实现图1中的各个步骤,为了简洁,在此不再赘述。
图4是本发明实施例的图像分类器的生成装置的示意性结构图。图4的装置400包括:
第一获取单元410,用于获取训练样本集,所述训练样本集包括N个图像样本,所述N个图像样本属于K个类别,N、K为正整数,N大于K;
第二获取单元420,用于获取所述第一获取单元410获取的每一个所述图像样本的特征向量,其中,所述特征向量包括图像样本的隐变量;
训练单元430,用于基于所述第二获取单元420获取的所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器。
本发明实施例中,通过多元逻辑回归模型,以最大似然的形式同时训练K个分类器,也就是说,多元逻辑回归模型的使用保留了K个类别的分类器之间的相互关联,与LVSM将物体分类领域的K类分类问题转换成相互孤立的多个二类问题的方式相比,训练结果更加准确。
可选地,作为一个实施例,所述K个类别的分类器分别包括K个模型参数,所述训练单元430具体用于获取所述K个模型参数的初始值;基于每一个所述模型参数的初始值,确定每一个所述图像样本的隐变量的初始值;基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值。
可选地,作为一个实施例,所述N个图像样本隐变量的初始值包括:正图像样本隐变量的初始值和负图像样本隐变量的初始值,所述训练单元430具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的当前值,当所述K个模型参数的当前值满足预设的收敛条件时,将所述K个模型参数的当前值确定为所述K个模型参数的目标值,当所述K个模型参数的当前值不满足所述收敛条件时,基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,并利用所述正图像样本隐变量的当前值更新所述正图像样本隐变量的初始值,重复执行本步骤直到所述K个模型参数的当前值满 足所述收敛条件。
可选地,作为一个实施例,所述训练单元430具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,并利用所述负图像样本隐变量的迭代值更新所述负图像样本隐变量的初始值,当所述K个模型参数的迭代值满足预设的迭代停止条件时,将所述K个模型参数的迭代值确定为所述K个模型参数的当前值,否则,重复执行本步骤直到所述K个模型参数的当前值满足所述迭代停止条件。
可选地,作为一个实施例,所述训练单元430具体用于根据公式
Figure PCTCN2015075781-appb-000076
确定所述K个模型参数的迭代值,其中,
Figure PCTCN2015075781-appb-000077
xi表示所述N个图像样本中的第i样本,βl表示所述K个模型参数中的第l个模型参数,θ表示所述K个模型参数组成的K维变量,
Figure PCTCN2015075781-appb-000078
表示xi的类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量。
可选地,作为一个实施例,所述训练单元430具体用于根据公式
Figure PCTCN2015075781-appb-000079
确定βk对应的梯度,其中,
Figure PCTCN2015075781-appb-000080
Figure PCTCN2015075781-appb-000081
Figure PCTCN2015075781-appb-000082
表示l(θ)关于βk的偏导函数,βk表示所述K个模型参数中的第k个模型参数,zik)表示模型参数为βk时xi的隐变量的初始值,f(xi,zik))表示隐变量z取值zik)时xi的 特征向量;基于所述βk对应的梯度,以l(θ)为目标函数,采用梯度上升算法,确定所述βk的迭代值。
可选地,作为一个实施例,所述迭代停止条件为所述目标函数值l(θ)的变化小于预设阈值;或者,所述迭代停止条件为迭代次数达到预设次数。
可选地,作为一个实施例,所述训练单元430具体用于根据公式
Figure PCTCN2015075781-appb-000083
并行计算所述K个模型参数的迭代值,其中,lLC(θ)是对l(θ)中的对数取凹上界转化而来的,
Figure PCTCN2015075781-appb-000084
可选地,作为一个实施例,所述训练单元430具体用于根据公式
Figure PCTCN2015075781-appb-000085
确定所述负图像样本隐变量的迭代值,其中,xi表示所述N个图像样本中的第i样本,βt表示所述K个模型参数中的第t个模型参数,且
Figure PCTCN2015075781-appb-000086
表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000087
表示模型参数为βt时xi隐变量的迭代值,i为1至N中的任意整数,t为1至K中的任意整数。
可选地,作为一个实施例,所述训练单元430具体用于根据公式
Figure PCTCN2015075781-appb-000088
确定所述正图像样本隐变量的当前值,其中,xi表示所述N个图像样本中的第i样本,
Figure PCTCN2015075781-appb-000089
表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000090
表示模型参数为
Figure PCTCN2015075781-appb-000091
时xi隐变量的当前值,i为1至N中的任意整数。
可选地,作为一个实施例,所述训练单元430具体用于根据公式
Figure PCTCN2015075781-appb-000092
确定每一个所述图像样本的隐变量的初始值,其中,xi表示所述N个图像样本中的第i样本,βk表示所述K个模型参数中的第k个模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征 向量,
Figure PCTCN2015075781-appb-000093
表示模型参数为βk时xi隐变量z的初始值,i为1至N中的任意整数,k为1至K中的任意整数。
图5是本发明实施例的图像分类器的生成装置的示意性结构图。图5的装置500包括:
存储器510,用于存储程序;
处理器520,用于执行所述程序,当所述程序被执行时,所述处理器520具体用于获取训练样本集,所述训练样本集包括N个图像样本,所述N个图像样本属于K个类别,N、K为正整数,N大于K;获取每一个所述图像样本的特征向量,其中,所述特征向量包括图像样本的隐变量;基于所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器。
本发明实施例中,通过多元逻辑回归模型,以最大似然的形式同时训练K个分类器,也就是说,多元逻辑回归模型的使用保留了K个类别的分类器之间的相互关联,与LVSM将物体分类领域的K类分类问题转换成相互孤立的多个二类问题的方式相比,训练结果更加准确。
可选地,作为一个实施例,所述K个类别的分类器分别包括K个模型参数,所述处理器520具体用于获取所述K个模型参数的初始值;基于每一个所述模型参数的初始值,确定每一个所述图像样本的隐变量的初始值;基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值。
可选地,作为一个实施例,所述N个图像样本隐变量的初始值包括:正图像样本隐变量的初始值和负图像样本隐变量的初始值,所述处理器520具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的当前值,当所述K个模型参数的当前值满足预设的收敛条件时,将所述K个模型参数的当前值确定为所述K个模型参数的目 标值,当所述K个模型参数的当前值不满足所述收敛条件时,基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,并利用所述正图像样本隐变量的当前值更新所述正图像样本隐变量的初始值,重复执行本步骤直到所述K个模型参数的当前值满足所述收敛条件。
可选地,作为一个实施例,所述处理器520具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,并利用所述负图像样本隐变量的迭代值更新所述负图像样本隐变量的初始值,当所述K个模型参数的迭代值满足预设的迭代停止条件时,将所述K个模型参数的迭代值确定为所述K个模型参数的当前值,否则,重复执行本步骤直到所述K个模型参数的当前值满足所述迭代停止条件。
可选地,作为一个实施例,所述处理器520具体用于根据公式
Figure PCTCN2015075781-appb-000094
确定所述K个模型参数的迭代值,其中,
Figure PCTCN2015075781-appb-000095
xi表示所述N个图像样本中的第i样本,βl表示所述K个模型参数中的第l个模型参数,θ表示所述K个模型参数组成的K维变量,
Figure PCTCN2015075781-appb-000096
表示xi的类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量。
可选地,作为一个实施例,所述处理器520具体用于根据公式
Figure PCTCN2015075781-appb-000097
确定βk对应的梯度,其中,
Figure PCTCN2015075781-appb-000098
Figure PCTCN2015075781-appb-000099
Figure PCTCN2015075781-appb-000100
表示l(θ)关于βk的偏导函数,βk表示所述K个模型参数中的第k个模型参数,zik)表示模型参数为βk时xi的隐变量的初始值,f(xi,zik))表示隐变量z取值zik)时xi的特征向量;基于所述βk对应的梯度,以l(θ)为目标函数,采用梯度上升算法,确定所述βk的迭代值。
可选地,作为一个实施例,所述迭代停止条件为所述目标函数值l(θ)的变化小于预设阈值;或者,所述迭代停止条件为迭代次数达到预设次数。
可选地,作为一个实施例,所述处理器520具体用于根据公式
Figure PCTCN2015075781-appb-000101
并行计算所述K个模型参数的迭代值,其中,lLC(θ)是对l(θ)中的对数取凹上界转化而来的,
Figure PCTCN2015075781-appb-000102
可选地,作为一个实施例,所述处理器520具体用于根据公式
Figure PCTCN2015075781-appb-000103
确定所述负图像样本隐变量的迭代值,其中,xi表示所述N个图像样本中的第i样本,βt表示所述K个模型参数中的第t个模型参数,且
Figure PCTCN2015075781-appb-000104
表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000105
表示模型参数为βt时xi隐变量的迭代值,i为1至N中的任意整数,t为1至K中的任意整数。
可选地,作为一个实施例,所述处理器520具体用于根据公式
Figure PCTCN2015075781-appb-000106
确定所述正图像样本隐变量的当前值,其中,xi表示所述N个图像样本中的第i样本,
Figure PCTCN2015075781-appb-000107
表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000108
表示模型参数为
Figure PCTCN2015075781-appb-000109
时xi隐变量的当前值,i为1至N中的任意整数。
可选地,作为一个实施例,所述处理器520具体用于根据公式
Figure PCTCN2015075781-appb-000110
确定每一个所述图像样本的隐变量的初始值,其中,xi表示所述N个图像样本中的第i样本,βk表示所述K个模型参数中的第k个模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
Figure PCTCN2015075781-appb-000111
表示模型参数为βk时xi隐变量z的初始值,i为1至N中的任意整数,k为1至K中的任意整数。
图6是本发明实施例的图像分类方法的示意性流程图。图6的方法中,可利用图1方法训练出的K个分类器对图像进行分类,图6方法包括:
610、获取待分类图像的特征向量;
620、基于待分类图像的特征向量,利用K个分类器,确定待分类图像的类别;
630、根据公式
Figure PCTCN2015075781-appb-000112
确定待分类图像在K个类别下的概率,其中,
Figure PCTCN2015075781-appb-000113
x表示待分类图像,βk表示K个分类器中第k个分类器的模型参数,f(x,z)表示x的特征向量,Z(x)表示x的隐变量z的取值范围,k为1至K中的任意整数。
现有的LSVM的分类结果仅给出待分类图像属于哪一类,但是实际情况中,不同类型之间可能存在一定的联系,某一图像并非绝对属于哪一类。例如,可以将建筑物的风格进行分类,包括现代风格,中世纪风格等,图像中某一建筑物风格可能既采用了一些现代风格,也采用了一部分中世纪风格,此时,现有LSVM的分类结果仅会显示待分类图像中的建筑物归为哪种建筑风格,显然不够准确。本实施例中,除了给出待分类图像所属的类别,还给出了该图片在各类别中的概率,与现有技术相比,引入图像分类结果的概率解释使得图像分类结果的描述更加准确。
图7是本发明实施例的图像分类的装置的示意性框图。图7中的装置700可利用图4的装置400训练出的K个分类器对图像进行分类,装置700包括:
第一获取单元710,用于获取待分类图像的特征向量;
第一确定单元720,用于基于待分类图像的特征向量,利用K个分类器,确定待分类图像的类别;
第二确定单元730,用于根据公式
Figure PCTCN2015075781-appb-000114
确定待分类图像在K个类别下的概率,其中,
Figure PCTCN2015075781-appb-000115
x表示待分类图像,βk表示K个分类器中第k个分类器的模型参数,f(x,z)表示x的特征向量,Z(x)表示x的隐变量z的取值范围,k为1至K中的任意整数。
现有的LSVM的分类结果仅给出待分类图像属于哪一类,但是实际情况中,不同类型之间可能存在一定的联系,某一图像并非绝对属于哪一类。例如,可以将建筑物的风格进行分类,包括现代风格,中世纪风格等,图像中某一建筑物风格可能既采用了一些现代风格,也采用了一部分中世纪风格,此时,现有LSVM的分类结果仅会显示待分类图像中的建筑物归为哪种建筑风格,显然不够准确。本实施例中,除了给出待分类图像所属的类别,还给出了该图片在各类别中的概率,与现有技术相比,引入图像分类结果的概率解释使得图像分类结果的描述更加准确。
图8是本发明实施例的图像分类的装置的示意性框图。图8中的图像分类装置800可利用图5的装置500训练出的K个分类器对图像进行分类,图8方法包括:
存储器810,用于存储程序;
处理器820,用于执行程序,当所述程序被执行时,所述程序用于获取待分类图像的特征向量;基于待分类图像的特征向量,利用K个分类器,确定待分类图像的类别;根据公式
Figure PCTCN2015075781-appb-000116
确定待分类图像在K 个类别下的概率,其中,
Figure PCTCN2015075781-appb-000117
x表示待分类图像,βk表示K个分类器中第k个分类器的模型参数,f(x,z)表示x的特征向量,Z(x)表示x的隐变量z的取值范围,k为1至K中的任意整数。
现有的LSVM的分类结果仅给出待分类图像属于哪一类,但是实际情况中,不同类型之间可能存在一定的联系,某一图像并非绝对属于哪一类。例如,可以将建筑物的风格进行分类,包括现代风格,中世纪风格等,图像中某一建筑物风格可能既采用了一些现代风格,也采用了一部分中世纪风格,此时,现有LSVM的分类结果仅会显示待分类图像中的建筑物归为哪种建筑风格,显然不够准确。本实施例中,除了给出待分类图像所属的类别,还给出了该图片在各类别中的概率,与现有技术相比,引入图像分类结果的概率解释使得图像分类结果的描述更加准确。
应理解,在本发明实施例中,术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和 方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本发明实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围 之内。因此,本发明的保护范围应以权利要求的保护范围为准。

Claims (24)

  1. 一种图像分类器的生成方法,其特征在于,包括:
    获取训练样本集,所述训练样本集包括N个图像样本,所述N个图像样本属于K个类别,N、K为正整数,N大于K;
    获取每一个所述图像样本的特征向量,其中,所述特征向量包括图像样本的隐变量;
    基于所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器。
  2. 如权利要求1所述的方法,其特征在于,所述K个类别的分类器分别包括K个模型参数,
    所述基于所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器,包括:
    获取所述K个模型参数的初始值;
    获取所述N个图像样本的隐变量的初始值;
    基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值。
  3. 如权利要求2所述的方法,其特征在于,所述N个图像样本隐变量的初始值包括:正图像样本隐变量的初始值和负图像样本隐变量的初始值,
    所述基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值,包括:
    基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的当前值,
    当所述K个模型参数的当前值满足预设的收敛条件时,将所述K个模 型参数的当前值确定为所述K个模型参数的目标值,
    当所述K个模型参数的当前值不满足所述收敛条件时,基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,并利用所述正图像样本隐变量的当前值更新所述正图像样本隐变量的初始值,重复执行本步骤直到所述K个模型参数的当前值满足所述收敛条件。
  4. 如权利要求3所述的方法,其特征在于,所述基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的当前值,包括:
    基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,
    基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,并利用所述负图像样本隐变量的迭代值更新所述负图像样本隐变量的初始值,
    当所述K个模型参数的迭代值满足预设的迭代停止条件时,将所述K个模型参数的迭代值确定为所述K个模型参数的当前值,
    否则,重复执行本步骤直到所述K个模型参数的当前值满足所述迭代停止条件。
  5. 如权利要求4所述的方法,其特征在于,所述基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,包括:
    根据公式
    Figure PCTCN2015075781-appb-100001
    确定所述K个模型参数的迭代值, 其中,
    Figure PCTCN2015075781-appb-100002
    xi表示所述N个图像样本中的第i样本,βl表示所述K个模型参数中的第l个模型参数,θ表示所述K个模型参数组成的K维变量,βy
    Figure PCTCN2015075781-appb-100003
    表示xi的类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量。
  6. 如权利要求5所述的方法,其特征在于,所述根据公式
    Figure PCTCN2015075781-appb-100004
    确定所述K个模型参数的迭代值,包括:
    根据公式
    Figure PCTCN2015075781-appb-100005
    确定βk对应的梯度,其中,
    Figure PCTCN2015075781-appb-100006
    Figure PCTCN2015075781-appb-100007
    Figure PCTCN2015075781-appb-100008
    Figure PCTCN2015075781-appb-100009
    表示l(θ)关于βk的偏导函数,βk表示所述K个模型参数中的第k个模型参数,zik)表示模型参数为βk时xi的隐变量的初始值,f(xi,zik))表示隐变量z取值zik)时xi的特征向量;
    基于所述βk对应的梯度,以l(θ)为目标函数,采用梯度上升算法,确定所述βk的迭代值。
  7. 如权利要求6所述的方法,其特征在于,
    所述迭代停止条件为所述目标函数值l(θ)的变化小于预设阈值;或者,
    所述迭代停止条件为迭代次数达到预设次数。
  8. 如权利要求5所述的方法,其特征在于,所述根据公式
    Figure PCTCN2015075781-appb-100010
    确定所述K个模型参数的迭代值,包括:
    根据公式
    Figure PCTCN2015075781-appb-100011
    并行计算所述K个模型参数的迭代值,其中,lLC(θ)是对l(θ)中的对数取凹上 界转化而来的,
    Figure PCTCN2015075781-appb-100012
  9. 如权利要求4-8中任一项所述的方法,其特征在于,所述基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,包括:
    根据公式
    Figure PCTCN2015075781-appb-100013
    确定所述负图像样本隐变量的迭代值,其中,xi表示所述N个图像样本中的第i样本,βt表示所述K个模型参数中的第t个模型参数,且
    Figure PCTCN2015075781-appb-100014
    Figure PCTCN2015075781-appb-100015
    表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
    Figure PCTCN2015075781-appb-100016
    表示模型参数为βt时xi隐变量的迭代值,i为1至N中的任意整数,t为1至K中的任意整数。
  10. 如权利要求3-9中任一项所述的方法,其特征在于,所述基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,包括:
    根据公式
    Figure PCTCN2015075781-appb-100017
    确定所述正图像样本隐变量的当前值,其中,xi表示所述N个图像样本中的第i样本,
    Figure PCTCN2015075781-appb-100018
    表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
    Figure PCTCN2015075781-appb-100019
    表示模型参数为
    Figure PCTCN2015075781-appb-100020
    时xi隐变量的当前值,i为1至N中的任意整数。
  11. 如权利要求2-10中任一项所述的方法,其特征在于,所述获取所述N个图像样本的隐变量的初始值,包括:
    根据公式
    Figure PCTCN2015075781-appb-100021
    确定每一个所述图像样本的隐变量的初始值,其中,xi表示所述N个图像样本中的第i样本,βk表示所述K个模型参数中的第k个模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
    Figure PCTCN2015075781-appb-100022
    表示模型参数为βk时xi隐变量z的初始值,i为1至N中的任意整数,k为1至K中的任意整数。
  12. 一种图像分类方法,其特征在于,包括:
    获取待分类图像的特征向量;
    基于所述待分类图像的特征向量,利用K个分类器,确定所述待分类图像的类别,其中,所述K个分类器是利用权利要求1至权利要求11中任一项所述的方法训练出的K个分类器;
    根据公式
    Figure PCTCN2015075781-appb-100023
    确定所述待分类图像在所述K个类别下的概率,其中,
    Figure PCTCN2015075781-appb-100024
    x表示所述待分类图像,βk表示所述K个分类器中第k个分类器的模型参数,f(x,z)表示x的特征向量,Z(x)表示x的隐变量z的取值范围,k为1至K中的任意整数。
  13. 一种图像分类器的生成装置,其特征在于,包括:
    第一获取单元,用于获取训练样本集,所述训练样本集包括N个图像样本,所述N个图像样本属于K个类别,N、K为正整数,N大于K;
    第二获取单元,用于获取所述第一获取单元获取的每一个所述图像样本的特征向量,其中,所述特征向量包括图像样本的隐变量;
    训练单元,用于基于所述第二获取单元获取的所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器。
  14. 如权利要求13所述的装置,其特征在于,所述K个类别的分类器分别包括K个模型参数,所述训练单元具体用于获取所述K个模型参数的初始值;获取所述N个图像样本的隐变量的初始值;基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值。
  15. 如权利要求14所述的装置,其特征在于,所述N个图像样本隐变量的初始值包括:正图像样本隐变量的初始值和负图像样本隐变量的初始值,所述训练单元具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类 别的分类器,以确定所述K个模型参数的当前值,当所述K个模型参数的当前值满足预设的收敛条件时,将所述K个模型参数的当前值确定为所述K个模型参数的目标值,当所述K个模型参数的当前值不满足所述收敛条件时,基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,并利用所述正图像样本隐变量的当前值更新所述正图像样本隐变量的初始值,重复执行本步骤直到所述K个模型参数的当前值满足所述收敛条件。
  16. 如权利要求15所述的装置,其特征在于,所述训练单元具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,并利用所述负图像样本隐变量的迭代值更新所述负图像样本隐变量的初始值,当所述K个模型参数的迭代值满足预设的迭代停止条件时,将所述K个模型参数的迭代值确定为所述K个模型参数的当前值,否则,重复执行本步骤直到所述K个模型参数的当前值满足所述迭代停止条件。
  17. 如权利要求16所述的装置,其特征在于,所述训练单元具体用于根据公式
    Figure PCTCN2015075781-appb-100025
    确定所述K个模型参数的迭代值,其中,
    Figure PCTCN2015075781-appb-100026
    xi表示所述N个图像样本中的第i样本,βl表示所述K个模型参数中的第l个模型参数,θ表示所述K个模型参数组成的K维变量,
    Figure PCTCN2015075781-appb-100027
    表示xi的类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量。
  18. 如权利要求17所述的装置,其特征在于,所述训练单元具体用于根据公式
    Figure PCTCN2015075781-appb-100028
    确定βk对应的梯度,其中,
    Figure PCTCN2015075781-appb-100029
    Figure PCTCN2015075781-appb-100030
    Figure PCTCN2015075781-appb-100031
    Figure PCTCN2015075781-appb-100032
    表示l(θ)关于βk的偏导函数,βk表示所述K个模型参数中的第k个模型参数,zik)表示模型参数为βk时xi的隐变量的初始值,f(xi,zik))表示隐变量z取值zik)时xi的特征向量;基于所述βk对应的梯度,以l(θ)为目标函数,采用梯度上升算法,确定所述βk的迭代值。
  19. 如权利要求18所述的装置,其特征在于,所述迭代停止条件为所述目标函数值l(θ)的变化小于预设阈值;或者,所述迭代停止条件为迭代次数达到预设次数。
  20. 如权利要求17所述的装置,其特征在于,所述训练单元具体用于根据公式
    Figure PCTCN2015075781-appb-100033
    并行计算所述K个模型参数的迭代值,其中,lLC(θ)是对l(θ)中的对数取凹上界转化而来的,
    Figure PCTCN2015075781-appb-100034
  21. 如权利要求16-20中任一项所述的装置,其特征在于,所述训练单元具体用于根据公式
    Figure PCTCN2015075781-appb-100035
    确定所述负图像样本隐变量的迭代值,其中,xi表示所述N个图像样本中的第i样本,βt表示所述K个模型参数中的第t个模型参数,且
    Figure PCTCN2015075781-appb-100036
    Figure PCTCN2015075781-appb-100037
    表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
    Figure PCTCN2015075781-appb-100038
    表示模型参数为βt时xi隐变量的迭代值,i为1至N中的任意整数,t为1至K中的任意整数。
  22. 如权利要求15-21中任一项所述的装置,其特征在于,所述训练单元具体用于根据公式
    Figure PCTCN2015075781-appb-100039
    确定所述正图像样本隐变量 的当前值,其中,xi表示所述N个图像样本中的第i样本,
    Figure PCTCN2015075781-appb-100040
    表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
    Figure PCTCN2015075781-appb-100041
    表示模型参数为
    Figure PCTCN2015075781-appb-100042
    时xi隐变量的当前值,i为1至N中的任意整数。
  23. 如权利要求14-22中任一项所述的装置,其特征在于,所述训练单元具体用于根据公式
    Figure PCTCN2015075781-appb-100043
    确定每一个所述图像样本的隐变量的初始值,其中,xi表示所述N个图像样本中的第i样本,βk表示所述K个模型参数中的第k个模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,
    Figure PCTCN2015075781-appb-100044
    表示模型参数为βk时xi隐变量z的初始值,i为1至N中的任意整数,k为1至K中的任意整数。
  24. 一种图像分类装置,其特征在于,包括:
    第一获取单元,用于获取待分类图像的特征向量;
    第一确定单元,用于基于所述待分类图像的特征向量,利用K个分类器,确定所述待分类图像的类别,其中,所述K个分类器是利用权利要求13至权利要求23中任一项所述的装置训练出的K个分类器;
    第二确定单元,用于根据公式
    Figure PCTCN2015075781-appb-100045
    确定所述待分类图像在所述K个类别下的概率,其中,
    Figure PCTCN2015075781-appb-100046
    x表示所述待分类图像,βk表示所述K个分类器中第k个分类器的模型参数,f(x,z)表示x的特征向量,Z(x)表示x的隐变量z的取值范围,k为1至K中的任意整数。
PCT/CN2015/075781 2014-09-05 2015-04-02 图像分类器的生成方法、图像分类方法和装置 WO2016033965A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410453884.6A CN105389583A (zh) 2014-09-05 2014-09-05 图像分类器的生成方法、图像分类方法和装置
CN201410453884.6 2014-09-05

Publications (1)

Publication Number Publication Date
WO2016033965A1 true WO2016033965A1 (zh) 2016-03-10

Family

ID=55421853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/075781 WO2016033965A1 (zh) 2014-09-05 2015-04-02 图像分类器的生成方法、图像分类方法和装置

Country Status (2)

Country Link
CN (1) CN105389583A (zh)
WO (1) WO2016033965A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685749A (zh) * 2018-09-25 2019-04-26 平安科技(深圳)有限公司 图像风格转换方法、装置、设备和计算机存储介质
CN109815971A (zh) * 2017-11-20 2019-05-28 富士通株式会社 信息处理方法和信息处理装置
CN110516737A (zh) * 2019-08-26 2019-11-29 南京人工智能高等研究院有限公司 用于生成图像识别模型的方法和装置
CN111199244A (zh) * 2019-12-19 2020-05-26 北京航天测控技术有限公司 一种数据的分类方法、装置、存储介质及电子装置
CN111225299A (zh) * 2018-11-27 2020-06-02 中国移动通信集团广东有限公司 一种onu故障识别、修复方法和装置
CN111368861A (zh) * 2018-12-25 2020-07-03 杭州海康威视数字技术股份有限公司 在图像物体检测过程中确定子部件顺序的方法和装置
CN112329837A (zh) * 2020-11-02 2021-02-05 北京邮电大学 一种对抗样本检测方法、装置、电子设备及介质
CN113239804A (zh) * 2021-05-13 2021-08-10 杭州睿胜软件有限公司 图像识别方法、可读存储介质及图像识别系统

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056146B (zh) * 2016-05-27 2019-03-26 西安电子科技大学 基于逻辑回归的视觉跟踪方法
CN108875455B (zh) * 2017-05-11 2022-01-18 Tcl科技集团股份有限公司 一种无监督的人脸智能精确识别方法及系统
CN107492067B (zh) * 2017-09-07 2019-06-07 维沃移动通信有限公司 一种图像美化方法及移动终端
CN109784351B (zh) * 2017-11-10 2023-03-24 财付通支付科技有限公司 行为数据分类方法、分类模型训练方法及装置
CN108536838B (zh) * 2018-04-13 2021-10-19 重庆邮电大学 基于Spark的极大无关多元逻辑回归模型对文本情感分类方法
CN108549692B (zh) * 2018-04-13 2021-05-11 重庆邮电大学 Spark框架下的稀疏多元逻辑回归模型对文本情感分类的方法
CN108595568B (zh) * 2018-04-13 2022-05-17 重庆邮电大学 一种基于极大无关多元逻辑回归的文本情感分类方法
CN110163794B (zh) * 2018-05-02 2023-08-29 腾讯科技(深圳)有限公司 图像的转换方法、装置、存储介质和电子装置
CN110633725B (zh) * 2018-06-25 2023-08-04 富士通株式会社 训练分类模型的方法和装置以及分类方法和装置
CN110084380A (zh) * 2019-05-10 2019-08-02 深圳市网心科技有限公司 一种迭代训练方法、设备、系统及介质
CN113674219A (zh) * 2021-07-28 2021-11-19 云南大益微生物技术有限公司 一种基于双重逻辑回归的茶叶杂质识别方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100310159A1 (en) * 2009-06-04 2010-12-09 Honda Motor Co., Ltd. Semantic scene segmentation using random multinomial logit (rml)
CN103310230A (zh) * 2013-06-17 2013-09-18 西北工业大学 联合解混及自适应端元提取的高光谱图像分类方法
CN103324938A (zh) * 2012-03-21 2013-09-25 日电(中国)有限公司 训练姿态分类器及物体分类器、物体检测的方法及装置
CN103530656A (zh) * 2013-09-10 2014-01-22 浙江大学 基于隐结构学习的图像摘要生成方法
US20140099029A1 (en) * 2012-10-05 2014-04-10 Carnegie Mellon University Face Age-Estimation and Methods, Systems, and Software Therefor
CN103761295A (zh) * 2014-01-16 2014-04-30 北京雅昌文化发展有限公司 基于图片自动分类的艺术类图片的定制化特征量提取算法
CN103942558A (zh) * 2013-01-22 2014-07-23 日电(中国)有限公司 获取物体检测器的方法及装置
US8842883B2 (en) * 2011-11-21 2014-09-23 Seiko Epson Corporation Global classifier with local adaption for objection detection

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100310159A1 (en) * 2009-06-04 2010-12-09 Honda Motor Co., Ltd. Semantic scene segmentation using random multinomial logit (rml)
US8842883B2 (en) * 2011-11-21 2014-09-23 Seiko Epson Corporation Global classifier with local adaption for objection detection
CN103324938A (zh) * 2012-03-21 2013-09-25 日电(中国)有限公司 训练姿态分类器及物体分类器、物体检测的方法及装置
US20140099029A1 (en) * 2012-10-05 2014-04-10 Carnegie Mellon University Face Age-Estimation and Methods, Systems, and Software Therefor
CN103942558A (zh) * 2013-01-22 2014-07-23 日电(中国)有限公司 获取物体检测器的方法及装置
CN103310230A (zh) * 2013-06-17 2013-09-18 西北工业大学 联合解混及自适应端元提取的高光谱图像分类方法
CN103530656A (zh) * 2013-09-10 2014-01-22 浙江大学 基于隐结构学习的图像摘要生成方法
CN103761295A (zh) * 2014-01-16 2014-04-30 北京雅昌文化发展有限公司 基于图片自动分类的艺术类图片的定制化特征量提取算法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XU, ZHE ET AL.: "Architectural Style Classification Using Multinomial Latent Logistic Regression", COMPUTER VISION-ECCV 2014, 13TH EUROPEAN CONFERENCE, 6 September 2014 (2014-09-06) - 12 September 2014 (2014-09-12), Zurich, Switzerland *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815971A (zh) * 2017-11-20 2019-05-28 富士通株式会社 信息处理方法和信息处理装置
CN109815971B (zh) * 2017-11-20 2023-03-10 富士通株式会社 信息处理方法和信息处理装置
CN109685749B (zh) * 2018-09-25 2023-04-18 平安科技(深圳)有限公司 图像风格转换方法、装置、设备和计算机存储介质
CN109685749A (zh) * 2018-09-25 2019-04-26 平安科技(深圳)有限公司 图像风格转换方法、装置、设备和计算机存储介质
CN111225299A (zh) * 2018-11-27 2020-06-02 中国移动通信集团广东有限公司 一种onu故障识别、修复方法和装置
CN111368861A (zh) * 2018-12-25 2020-07-03 杭州海康威视数字技术股份有限公司 在图像物体检测过程中确定子部件顺序的方法和装置
CN111368861B (zh) * 2018-12-25 2023-05-09 杭州海康威视数字技术股份有限公司 在图像物体检测过程中确定子部件顺序的方法和装置
CN110516737A (zh) * 2019-08-26 2019-11-29 南京人工智能高等研究院有限公司 用于生成图像识别模型的方法和装置
CN110516737B (zh) * 2019-08-26 2023-05-26 南京人工智能高等研究院有限公司 用于生成图像识别模型的方法和装置
CN111199244A (zh) * 2019-12-19 2020-05-26 北京航天测控技术有限公司 一种数据的分类方法、装置、存储介质及电子装置
CN111199244B (zh) * 2019-12-19 2024-04-09 北京航天测控技术有限公司 一种数据的分类方法、装置、存储介质及电子装置
CN112329837A (zh) * 2020-11-02 2021-02-05 北京邮电大学 一种对抗样本检测方法、装置、电子设备及介质
CN113239804A (zh) * 2021-05-13 2021-08-10 杭州睿胜软件有限公司 图像识别方法、可读存储介质及图像识别系统
CN113239804B (zh) * 2021-05-13 2023-06-02 杭州睿胜软件有限公司 图像识别方法、可读存储介质及图像识别系统

Also Published As

Publication number Publication date
CN105389583A (zh) 2016-03-09

Similar Documents

Publication Publication Date Title
WO2016033965A1 (zh) 图像分类器的生成方法、图像分类方法和装置
CN106372581B (zh) 构建及训练人脸识别特征提取网络的方法
CN108288051B (zh) 行人再识别模型训练方法及装置、电子设备和存储介质
WO2019127451A1 (zh) 图像识别方法及云端系统
Zhu et al. Learning a hierarchical deformable template for rapid deformable object parsing
WO2018045910A1 (zh) 情感倾向的识别方法、对象分类方法及数据处理系统
CN103425996B (zh) 一种并行分布式的大规模图像识别方法
CN114582470A (zh) 一种模型的训练方法、训练装置及医学影像报告标注方法
CN110263174B (zh) —基于焦点关注的主题类别分析方法
CN111144566B (zh) 神经网络权重参数的训练方法、特征分类方法及对应装置
CN107862680B (zh) 一种基于相关滤波器的目标跟踪优化方法
JP2019096313A (ja) 情報処理方法及び情報処理装置
CN110751027B (zh) 一种基于深度多示例学习的行人重识别方法
Carbonetto et al. Learning to recognize objects with little supervision
WO2023088174A1 (zh) 目标检测方法及装置
JP2019204505A (ja) オブジェクト検出装置及び方法及び記憶媒体
CN106021402A (zh) 用于跨模态检索的多模态多类Boosting框架构建方法及装置
CN110717407A (zh) 基于唇语密码的人脸识别方法、装置及存储介质
CN107220597B (zh) 一种基于局部特征和词袋模型人体动作识别过程的关键帧选取方法
CN113011532A (zh) 分类模型训练方法、装置、计算设备及存储介质
CN105844299B (zh) 一种基于词袋模型的图像分类方法
Gaston et al. Matching larger image areas for unconstrained face identification
JP4928193B2 (ja) 顔画像認識装置及び顔画像認識プログラム
CN107729945A (zh) 基于类间稀疏表示的鉴别回归、分类方法及系统
Zamzami et al. An accurate evaluation of msd log-likelihood and its application in human action recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15838197

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15838197

Country of ref document: EP

Kind code of ref document: A1