CN112990242A - Training method and training device for image classification model - Google Patents

Training method and training device for image classification model Download PDF

Info

Publication number
CN112990242A
CN112990242A CN201911291584.1A CN201911291584A CN112990242A CN 112990242 A CN112990242 A CN 112990242A CN 201911291584 A CN201911291584 A CN 201911291584A CN 112990242 A CN112990242 A CN 112990242A
Authority
CN
China
Prior art keywords
image
representation
feature
potential
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911291584.1A
Other languages
Chinese (zh)
Inventor
史英迪
程建波
彭南博
黄志翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JD Digital Technology Holdings Co Ltd
Original Assignee
JD Digital Technology Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JD Digital Technology Holdings Co Ltd filed Critical JD Digital Technology Holdings Co Ltd
Priority to CN201911291584.1A priority Critical patent/CN112990242A/en
Publication of CN112990242A publication Critical patent/CN112990242A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The disclosure provides a training method and a training device for an image classification model, and relates to the field of image processing. The method fuses the feature processing part and the image classification part into an objective function, so that the feature processing part and the image classification part can be trained together, a prediction result can be obtained from an original feature input end to a classification label output end, an error can be obtained by comparing the prediction result with a real result, the error can be transmitted in each part of the objective function, the representation of each part can be adjusted according to the error until the whole objective function converges or reaches an expected effect, all the intermediate operations are contained in the objective function, and therefore the whole training effect of the feature processing part and the image classification part is improved.

Description

Training method and training device for image classification model
Technical Field
The present disclosure relates to the field of image processing, and in particular, to a training method and a training apparatus for an image classification model.
Background
With the development of network technology, people express comments and transmit emotions through images (or pictures) more and more. In some services, an image needs to be analyzed to mine an emotional tendency corresponding to the image, and corresponding service processing is performed.
In some image emotion classification technologies based on machine learning, the characteristics of color, texture, shape and the like of an image are extracted, the characteristics are processed, and a classifier is trained by using the processed characteristics to carry out emotion class inference on the image.
Disclosure of Invention
The inventor finds that the related art is a non-end-to-end solution, the feature processing and the emotion classification are taken as a plurality of independent steps, each step is an independent task, and the quality of the result can influence the next step, so that the overall training effect is influenced.
The method fuses the feature processing part and the image classification part into an objective function, so that the feature processing part and the image classification part can be trained together, a prediction result can be obtained from an original feature input end to a classification label output end, an error can be obtained by comparing the prediction result with a real result, the error can be transmitted in each part of the objective function, the representation of each part can be adjusted according to the error until the whole objective function converges or reaches an expected effect, all the intermediate operations are contained in the objective function, and therefore the whole training effect of the feature processing part and the image classification part is improved.
Some embodiments of the present disclosure provide a training method of an image classification model, including:
constructing an objective function for training an image classification model, wherein the objective function comprises a characteristic conversion part from original characteristic representation to potential characteristic representation of an image and an image classification model from the potential characteristic representation serving as input characteristic representation to output label representation;
and training the objective function by using the original features of each image in the training set and the labeled classification labels to simultaneously determine the values of the projection matrix representation of the original feature representation of the image to the potential feature representation of the image in the feature conversion part and the values of the regression coefficient representation of the potential feature representation to the output label representation in the image classification model.
In some embodiments, the raw feature representation of the image comprises raw feature representations of a plurality of perspectives of the image, and the projection matrix representation comprises a respective plurality of projection matrix representations of the raw feature representation of each perspective of the image to a potential feature representation of the image.
In some embodiments, the feature transformation portion includes a relational representation of the original feature representation to the potential feature representation of the image constructed based on the projection matrix representation.
In some embodiments, the relationship of the original feature representation to the potential feature representation of the image is represented as: the product of the original feature representation of the image and the projection matrix representation minus a function of the norm of the potential feature representation.
In some embodiments, the feature conversion section further includes: one or more of redundant constraints of the original feature representation of the different view angles, low rank constraints of the projection matrix representation, and regularization constraints of the potential feature representation.
In some embodiments, the redundancy constraint of the original eigenrepresentations of the different views is a product between a transpose of a first projection matrix representation corresponding to one view, a covariance matrix of the original eigenrepresentations of the respective views, and a second projection matrix representation corresponding to another view.
In some embodiments, the low rank constraint of the projection matrix representation is a nuclear norm of the projection matrix representation.
In some embodiments, the regularization constraint of the potential feature representation is a function of a norm of the potential feature representation.
In some embodiments, the image classification model is a lasso regression image classification model or a logistic regression image classification model.
In some embodiments, the lasso regression image classification model includes a first relational representation of the potential feature representation to the output label representation constructed based on a regression coefficient representation and a regularization constraint of the regression coefficient representation.
In some embodiments, the logistic regression image classification model includes a second relational representation of the potential feature representation to the output label representation constructed based on regression coefficient representations.
In some embodiments, the raw feature representations for the multiple perspectives of the image comprise a foreground feature representation and a background feature representation of the image.
In some embodiments, the raw features of each image in the training set include foreground features and background features of the respective image, wherein the foreground features of the images are extracted using a VGG neural network model; or, extracting the background feature of the image by using an AlexNet model.
In some embodiments, further comprising: and aiming at an image to be classified, processing the original characteristics of the image to be classified by using the value represented by the trained projection matrix to obtain the potential characteristics of the image to be classified, and inputting the potential characteristics of the image to be classified into an image classification model obtained by training to output a classification label corresponding to the image to be classified.
In some embodiments, the classification label labeled by each image in the training set and the classification label corresponding to the image to be classified are emotion labels.
Some embodiments of the present disclosure provide a training apparatus for an image classification model, including:
a memory; and a processor coupled to the memory, the processor configured to perform the training method of any of the embodiments based on instructions stored in the memory.
Some embodiments of the disclosure propose a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the training method described in any of the embodiments.
Drawings
The drawings that will be used in the description of the embodiments or the related art will be briefly described below. The present disclosure will be more clearly understood from the following detailed description, which proceeds with reference to the accompanying drawings,
it is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without undue inventive faculty.
Fig. 1 is a schematic flow chart diagram illustrating some embodiments of a training method for an image classification model according to the present disclosure.
Fig. 2 is a schematic flow chart diagram of some embodiments of the disclosed image classification method.
Fig. 3 is a schematic structural diagram of some embodiments of a training apparatus for image classification models according to the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure.
Fig. 1 is a schematic flow chart diagram illustrating some embodiments of a training method for an image classification model according to the present disclosure. The image classification model may, for example, classify the image for emotion, such as whether the image is positive or negative, or whether the image is happy, angry, exclamatory, satisfied, aversive, afraid, etc., but is not limited to these examples.
As shown in fig. 1, the training method of this embodiment includes: steps 11-12 (step 12 includes steps 121-123).
In step 11, an objective function for training an image classification model is constructed, the objective function including a feature conversion part of the original feature representation of the image into a potential feature representation and an image classification model of the potential feature representation serving as an input feature representation into an output label representation.
The original feature representation of the image may include an original feature representation of one perspective of the image, or may include original feature representations of multiple perspectives of the image for a more complete description of the image. The raw feature representation of each view of the image corresponds to one projection matrix representation to the potential feature representation of the image, and thus, in the case of raw feature representations for multiple views, to multiple projection matrix representations. For example, the raw feature representations of the plurality of perspectives of the image comprise a foreground feature representation and a background feature representation of the image, the foreground feature representation to potential feature representation corresponding to the first projection matrix representation and the background feature representation to potential feature representation corresponding to the second projection matrix representation, respectively.
The feature conversion section and the image classification model are explained separately below.
The feature transformation part comprises a relationship representation of an original feature representation to a potential feature representation of the image constructed based on the projection matrix representation, and can optionally further comprise: one or more of redundant constraints of the original feature representation of the different view angles, low rank constraints of the projection matrix representation, and regularization constraints of the potential feature representation.
The formula is expressed as follows:
Figure BDA0002319219810000041
by the above formula, the overall correlation of different viewing angles is maximized, while the characteristic redundant parts are partially discarded, thereby fully utilizing the complementary properties of different viewing angles to eliminate the heterogeneity therebetween. The meaning of each symbol is specifically described in the description of each component section below. The method is also called low-rank multi-view learning method and aims to learn the expression Z epsilon R with low-rank constraint and potential (shared) featuresN×DA set of projection matrices
Figure BDA0002319219810000051
Taking the extraction of the original features of two views as an example, i.e.
Figure BDA0002319219810000052
Representing a foreground feature representation and a background feature representation of the image respectively,
Figure BDA0002319219810000053
a (low rank) projection matrix representing the raw feature representation to the potential feature representation.
The relationship representation of the original feature representation to the potential feature representation of the image is, for example: raw feature representation of an image (foreground feature representation X)1And background feature representation X2) And projection matrix representation (foreground feature representation X)1First projection matrix representation P to potential feature representation Z1And background feature representation X2Second projection matrix representation P to potential feature representation Z2) Minus a function of the norm of the potential signature representation Z. The formula is expressed as follows:
Figure BDA0002319219810000054
the function of the norm shown in equation 4-2 is the square of the norm, and may be an even power such as 4 th power, 6 th power, etc.).
Equation (4-2) is intended to seek such that
Figure BDA0002319219810000055
Smallest group (P)1,P2Z) when
Figure BDA0002319219810000056
At the minimum, Z can represent X to the maximum extent1And X2
If the original feature has only one perspective, then equation (4-2) can be replaced with equation (4-3).
Figure BDA0002319219810000057
Equation (4-3) is a relational representation between the original feature representation X and the potential feature representation Z of the image, i.e. a function of the product of the original feature representation X of the image and the projection matrix representation P minus the norm of the potential feature representation Z (the function of the norm shown in equation 4-3 is the square of the norm and may also be an even power of the norm, such as 4, 6, etc.).
Equation (4-3) is intended to seek such that
Figure BDA0002319219810000058
A minimum group (P, Z) of
Figure BDA0002319219810000059
At the minimum, Z can represent X to the maximum extent.
The redundancy constraint for the original eigenrepresentations of different views is the product between the transpose of the corresponding first projection matrix representation of one view, the covariance matrix of the original eigenrepresentation of the respective view and the corresponding second projection matrix representation of the other view. The formula is expressed as follows:
Figure BDA0002319219810000061
wherein the content of the first and second substances,
Figure BDA0002319219810000062
is defined as X1And X2Phi is a configurable parameter, tr represents the trace of the matrix, and equation (4-4) is a redundancy constraint of the original feature representation for different viewing angles, intended to remove X as much as possible1And X2Redundant information of (2).
The regularization constraint of the potential feature representation is a function of the norm of the potential feature representation, intended to improve the over-fitting problem, and is formulated as:
Figure BDA0002319219810000063
wherein λ is1Is a configurable parameter (the function of the norm shown in equations 4-5 is the square of the norm, and can also be the even power of the norm, such as the 4 th power, the 6 th power, etc.)
The low-rank constraint represented by the projection matrix is a nuclear norm represented by the projection matrix, aims to remove detail redundant information such as noise of an image as far as possible, and is helpful for learning a more robust feature subspace, and the formula is represented as follows:
Figure BDA0002319219810000064
wherein, beta1And beta2Is a configurable parameter.
The image classification model is a lasso regression image classification model or a logistic regression image classification model. lasso coming backThe classification model includes a first relational representation of a potential feature representation Z to an output label representation Y constructed based on regression coefficient representations
Figure BDA0002319219810000065
And regularization constraints for regression coefficients representing B
Figure BDA0002319219810000066
The formula for the lasso regression image classification model is as follows:
Figure BDA0002319219810000067
the above formula is intended to seek (Y-ZB)22||B||1The smallest group (Z, B) when (Y-ZB)22||B||1At the minimum, Y can represent B to the maximum extent. Regularization constraint for regression coefficient representation B
Figure BDA0002319219810000068
With the intention of improving the overfitting problem, the regularization constraint of the regression coefficients is for example the 1 norm, λ, of the regression coefficients2Is a configurable parameter.
The logistic regression image classification model includes a second relational representation of the potential feature representation Z to the output label representation Y constructed based on the regression coefficient representation B, formulated as follows:
Figure BDA0002319219810000069
the above formula is intended to seek a group (Z, B) that minimizes (Y-ZB), when (Y-ZB) is minimized, Y can represent B to the greatest extent.
As previously described, the objective function used to train the image classification model fuses the feature transformation portion of the original feature representation of the image into the latent feature representation and the image classification model of the latent feature representation used as the input feature representation into the output label representation.
In one case, the overall objective function is a combination of equations (4-1) and (4-7), which is expressed as follows:
Figure BDA0002319219810000071
equations (4-9) are intended to seek a set (P) that minimizes the entire equation after min1,P2And B). After the feature conversion part is fused with the image classification model, Z is an intermediate variable which does not need to be output in the objective function.
In another case, the overall objective function is obtained by fusing the formula (4-1) and the formula (4-8), and the formula is expressed as follows:
Figure BDA0002319219810000072
equations (4-10) are intended to seek a set (P) that minimizes the entire equation after min1,P2And B). After the feature conversion part is fused with the image classification model, Z is an intermediate variable which does not need to be output in the objective function.
In step 12, the original features of each image in the training set and the labeled classification labels are used to train the objective function, so as to simultaneously determine the values (i.e. projection matrix) represented by the projection matrix from the original feature representation of the image to the potential feature representation of the image in the feature transformation portion and the values (i.e. regression coefficients) represented by the regression coefficients from the potential feature representation to the output label representation in the image classification model, thereby simultaneously completing the training of the feature transformation portion and the image classification model.
Step 12 is described below by steps 121-123.
In step 121, the classification labels (e.g., emotion classification labels) corresponding to the images in the training set are labeled.
In step 122, feature extraction is performed on each image in the training set, and the extracted features are original features of the image.
The visual system has strong information processing capability, and the relationship between vision and emotion is complicated. Reasonably constructing the relation between the visual high-level emotion semantics and the low-level visual features, understanding the emotion information expressed by the user from the cognitive perspective, and providing important research content for perception-oriented visual emotion analysis.
As previously mentioned, to more fully describe an image, the image may be described from multiple perspectives including, but not limited to, color features, texture features, object features (also referred to as foreground features), scene features (also referred to as background features), and the like. For example, an image may be described from two perspectives, a foreground and a background, with the extracted original features of the image including foreground and background features of the image. Foreground features of the image may be extracted using a VGG (e.g., VGG16) neural network model. The background features of the image can be extracted using an AlexNet model. The VGG16 neural network model and the AlexNet model were trained using the Pre-trained method. The VGG16 network can be obtained by pre-training based on 1000 classes of image Object types, so 4096-dimensional features output by the full connection layer are mainly Object (Object) features of the image, and the Object features mainly extract main body features of the image as foreground features of the image. The AlexNet network can be obtained based on 205 types of scenes, 2048-dimensional features output by a full connection layer mainly extract background contents of an image as background (scene) features of the image. In addition, using the pre-trained parameters as the initial parameters of the feature extraction section, the convergence of the network can be accelerated.
The image features (such as foreground features and background features) of a plurality of visual angles are used for describing the image together to obtain rich information, so that the emotional content expressed in the image by a user can be better understood. However, due to the heterogeneity among different modalities, information redundancy is easily caused if the features of multiple modalities are directly integrated into one large feature vector, and for this reason, it is necessary to find a potentially shared feature space of all modalities by using the complementarity among different modalities, so the embodiment uses a low-rank multi-view learning method to obtain the most intrinsic feature representation of an image, which is described in detail in the feature processing section.
In step 123, the original features of each image in the training set and the labeled classification labels are used to train the objective function until the whole objective function converges or reaches the expected effect, so as to complete the training of the feature transformation portion and the image classification model at the same time, and finally determine the projection matrix in the feature transformation portion and the regression coefficient in the image classification model at the same time.
The training end conditions are, for example: the difference between two successive objective function values in the iterative process is calculated and the training is stopped if the relative change in the objective function values is below a preset threshold (e.g. 0.001) and/or a preset maximum number of iterations is reached (e.g. 30).
The method fuses the feature processing part and the image classification part into an objective function, so that the feature processing part and the image classification part can be trained together, a prediction result can be obtained from an original feature input end to a classification label output end, an error can be obtained by comparing the prediction result with a real result, the error can be transmitted in each part of the objective function, the representation of each part can be adjusted according to the error until the whole objective function converges or reaches an expected effect, all the intermediate operations are contained in the objective function, and therefore the whole training effect of the feature processing part and the image classification part is improved.
In addition, the image features (such as foreground features and background features) of a plurality of visual angles are used for describing the image together to obtain rich information, and a multi-visual-angle subspace learning method is used for mapping each sample in a high-dimensional space to a low-dimensional subspace, so that the learning features of each subspace are reserved, the complementarity among the multi-visual angles is fully utilized, the emotion classification effect of the image is improved, the influence of learning dimension explosion is avoided, and the practicability is good.
Fig. 2 is a schematic flow chart diagram of some embodiments of the disclosed image classification method. This embodiment may, for example, classify the image for emotion, such as analyzing whether the image is positive or negative, or whether the image is happy, angry, exclamatory, satisfied, aversive, afraid, etc., but is not limited to the examples given.
As shown in fig. 2, the training method of this embodiment includes: steps 21-23.
In step 21, for an image to be classified, its original features are extracted, for example, its foreground features X are extracted1And background feature X2
The feature extraction method may refer to an extraction method of image features in a training set.
In step 22, the original features of the image to be classified are processed by using the projection matrix obtained by training to obtain the potential features of the image to be classified. For example, the original features of different perspectives of the image to be classified are subjected to feature processing to mine potential shared features (which may be referred to as potential features for short).
The formula is expressed as follows:
Z=X1P1+X2P2
in step 23, the latent features of the image to be classified are input into the trained image classification model, and the classification label corresponding to the image to be classified is output.
In the case of the lasso regression image classification model, the regression coefficients B have been determined by training, and the potential features Z are input to min (Y-ZB)22||B||1And outputting the corresponding classification label Y. Thus classification is achieved by means of lasso regression.
And if the image classification model is a logistic regression image classification model, the regression coefficient B is determined through training, the potential features Z are input into min (Y-ZB), and the corresponding classification labels Y are output. Thereby realizing classification by means of logistic regression.
The fused features have strong distinguishing capability, so that the performance of the image classification model can be improved.
Fig. 3 is a schematic structural diagram of some embodiments of a training apparatus for image classification models according to the present disclosure.
As shown in fig. 3, the training apparatus 300 of this embodiment includes: a memory 310 and a processor 320 coupled to the memory 310, the processor 320 configured to perform the training method of any of the foregoing embodiments based on instructions stored in the memory 310.
Memory 310 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.
Training device 300 may also include input-output interface 330, network interface 340, storage interface 350, and the like. These interfaces 330, 340, 350 and the memory 310 and the processor 320 may be connected, for example, by a bus 360. The input/output interface 330 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 340 provides a connection interface for various networking devices. The storage interface 350 provides a connection interface for external storage devices such as an SD card and a usb disk.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only exemplary of the present disclosure and is not intended to limit the present disclosure, so that any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (14)

1. A training method of an image classification model is characterized by comprising the following steps:
constructing an objective function for training an image classification model, wherein the objective function comprises a characteristic conversion part from original characteristic representation to potential characteristic representation of an image and an image classification model from the potential characteristic representation serving as input characteristic representation to output label representation;
and training the objective function by using the original features of each image in the training set and the labeled classification labels to simultaneously determine the values of the projection matrix representation of the original feature representation of the image to the potential feature representation of the image in the feature conversion part and the values of the regression coefficient representation of the potential feature representation to the output label representation in the image classification model.
2. The method of claim 1,
the raw feature representation of the image comprises raw feature representations of a plurality of perspectives of the image, and the projection matrix representation comprises a respective plurality of projection matrix representations of the raw feature representation of each perspective of the image to a potential feature representation of the image.
3. The method according to claim 1 or 2,
the feature transformation portion includes a relational representation of original feature representations to potential feature representations of the image constructed based on the projection matrix representation.
4. The method of claim 3,
the relationship of the original feature representation to the potential feature representation of the image is represented as: the product of the original feature representation of the image and the projection matrix representation minus a function of the norm of the potential feature representation.
5. The method of claim 3,
the feature conversion section further includes: one or more of redundant constraints of the original feature representation of the different view angles, low rank constraints of the projection matrix representation, and regularization constraints of the potential feature representation.
6. The method of claim 5,
the redundancy constraint of the original characteristic representations of different view angles is the product of the transpose of the first projection matrix representation corresponding to one view angle, the covariance matrix of the original characteristic representation of each view angle and the second projection matrix representation corresponding to another view angle;
or the low rank constraint of the projection matrix representation is a nuclear norm of the projection matrix representation;
alternatively, the regularization constraint of the potential feature representation is a function of a norm of the potential feature representation.
7. The method according to claim 1 or 2,
the image classification model is a lasso regression image classification model or a logistic regression image classification model.
8. The method of claim 7,
the lasso regression image classification model includes a regularization constraint of the regression coefficient representation and a first relational representation of the potential feature representation to the output label representation constructed based on the regression coefficient representation;
alternatively, the logistic regression image classification model comprises a second relational representation of the potential feature representation to the output label representation constructed based on regression coefficient representations.
9. The method of claim 2,
the raw feature representations for the multiple perspectives of the image comprise a foreground feature representation and a background feature representation of the image.
10. The method of claim 9,
the raw features of each image in the training set include foreground and background features of the respective image,
extracting foreground features of the image by using a VGG neural network model; or, extracting the background feature of the image by using an AlexNet model.
11. The method of claim 1 or 2, further comprising:
and aiming at an image to be classified, processing the original characteristics of the image to be classified by using the value represented by the trained projection matrix to obtain the potential characteristics of the image to be classified, and inputting the potential characteristics of the image to be classified into an image classification model obtained by training to output a classification label corresponding to the image to be classified.
12. The method of claim 11,
and the classification label marked by each image in the training set and the classification label corresponding to the image to be classified are emotion labels.
13. An apparatus for training an image classification model, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the training method of any of claims 1-12 based on instructions stored in the memory.
14. A non-transitory computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the training method according to any one of claims 1 to 12.
CN201911291584.1A 2019-12-16 2019-12-16 Training method and training device for image classification model Pending CN112990242A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911291584.1A CN112990242A (en) 2019-12-16 2019-12-16 Training method and training device for image classification model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911291584.1A CN112990242A (en) 2019-12-16 2019-12-16 Training method and training device for image classification model

Publications (1)

Publication Number Publication Date
CN112990242A true CN112990242A (en) 2021-06-18

Family

ID=76343019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911291584.1A Pending CN112990242A (en) 2019-12-16 2019-12-16 Training method and training device for image classification model

Country Status (1)

Country Link
CN (1) CN112990242A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106531192A (en) * 2016-12-09 2017-03-22 电子科技大学 Speech emotion recognition method and system based on redundancy features and multi-dictionary representation
CN107545276A (en) * 2017-08-01 2018-01-05 天津大学 The various visual angles learning method of joint low-rank representation and sparse regression
CN107590505A (en) * 2017-08-01 2018-01-16 天津大学 The learning method of joint low-rank representation and sparse regression
CN108564237A (en) * 2017-12-13 2018-09-21 中国银联股份有限公司 A kind of Capacity Evaluation Model method for building up, capacity evaluating method and device
CN110147782A (en) * 2019-05-29 2019-08-20 苏州大学 It is a kind of based on projection dictionary to the face identification method and device of study

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106531192A (en) * 2016-12-09 2017-03-22 电子科技大学 Speech emotion recognition method and system based on redundancy features and multi-dictionary representation
CN107545276A (en) * 2017-08-01 2018-01-05 天津大学 The various visual angles learning method of joint low-rank representation and sparse regression
CN107590505A (en) * 2017-08-01 2018-01-16 天津大学 The learning method of joint low-rank representation and sparse regression
CN108564237A (en) * 2017-12-13 2018-09-21 中国银联股份有限公司 A kind of Capacity Evaluation Model method for building up, capacity evaluating method and device
CN110147782A (en) * 2019-05-29 2019-08-20 苏州大学 It is a kind of based on projection dictionary to the face identification method and device of study

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANAN LIU ETAL: "Low-rank regularized multi-view inverse-covariance estimation for visual sentiment distribution prediction", J. VIS. COMMUN. IMAGE R., pages 246 *

Similar Documents

Publication Publication Date Title
CN109522818B (en) Expression recognition method and device, terminal equipment and storage medium
JP7193252B2 (en) Captioning image regions
CN109409222B (en) Multi-view facial expression recognition method based on mobile terminal
Arevalo et al. Gated multimodal networks
EP3179407B1 (en) Recognition of a 3d modeled object from a 2d image
WO2021093468A1 (en) Video classification method and apparatus, model training method and apparatus, device and storage medium
WO2021129181A1 (en) Portrait segmentation method, model training method and electronic device
CN110378438A (en) Training method, device and the relevant device of Image Segmentation Model under label is fault-tolerant
WO2016205286A1 (en) Automatic entity resolution with rules detection and generation system
CN114398961A (en) Visual question-answering method based on multi-mode depth feature fusion and model thereof
CN108154191B (en) Document image recognition method and system
CN112766079A (en) Unsupervised image-to-image translation method based on content style separation
Bykov et al. Explaining bayesian neural networks
Fathima Classification of blood types by microscope color images
CA3198335A1 (en) Systems and methods for artificial facial image generation conditioned on demographic information
CN115761905A (en) Diver action identification method based on skeleton joint points
CN112115131A (en) Data denoising method, device and equipment and computer readable storage medium
Banskota et al. A novel enhanced convolution neural network with extreme learning machine: facial emotional recognition in psychology practices
CN109558882B (en) Image classification method and device based on robust local low-rank sparse CNN features
US11182415B2 (en) Vectorization of documents
Altun et al. SKETRACK: stroke-based recognition of online hand-drawn sketches of arrow-connected diagrams and digital logic circuit diagrams
CN112560925A (en) Complex scene target detection data set construction method and system
CN111062291B (en) Robot vision tracking method and system
CN112990242A (en) Training method and training device for image classification model
CN113610080B (en) Cross-modal perception-based sensitive image identification method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Digital Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

CB02 Change of applicant information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination