CN117011567A - Method, device, equipment and storage medium for training image classification model - Google Patents

Method, device, equipment and storage medium for training image classification model Download PDF

Info

Publication number
CN117011567A
CN117011567A CN202211140067.6A CN202211140067A CN117011567A CN 117011567 A CN117011567 A CN 117011567A CN 202211140067 A CN202211140067 A CN 202211140067A CN 117011567 A CN117011567 A CN 117011567A
Authority
CN
China
Prior art keywords
image
auxiliary
sample
classification model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211140067.6A
Other languages
Chinese (zh)
Inventor
许剑清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202211140067.6A priority Critical patent/CN117011567A/en
Publication of CN117011567A publication Critical patent/CN117011567A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a method, a device, equipment and a storage medium for training an image classification model, which can be applied to the field of Internet of vehicles or the field of intelligent traffic and the like and are used for solving the problems of low classification accuracy and classification reliability of a target image classification model obtained through training. The method at least comprises the following steps: using the trained auxiliary classification model, the following is performed for each sample class: performing feature fusion on auxiliary image features of a plurality of sample images associated with one sample category to obtain corresponding auxiliary category features; adopting an image classification model to respectively extract characteristics of a plurality of sample images associated with one sample class to obtain corresponding training image characteristics; and adjusting model parameters based on the training image features and the auxiliary image features corresponding to the sample images and the auxiliary category features of the sample categories. Constraint is carried out from two angles of characteristics and relations, and classification accuracy and reliability of the target image classification model are improved.

Description

Method, device, equipment and storage medium for training image classification model
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for training an image classification model.
Background
With the development of technology, more and more devices can provide an image classification service through a trained object image classification model, and the image classification service can be used for determining the category to which an object in an image belongs.
For example, the device may image classify the face image through a trained target image classification model to determine the identity of the object corresponding to the face in the face image.
In the related art, in order to balance time consumption of image classification of a trained target image classification model and accuracy of image classification of the trained target image classification model, a method for obtaining the trained target image classification model is generally to perform multiple rounds of training on a large auxiliary classification model containing a large number of model parameters based on a sample image set to obtain a trained auxiliary classification model; and training the small image classification model containing a small number of model parameters for multiple times based on the image types of the sample image set output by the trained auxiliary classification model, so that the image types of the sample image set output by the obtained trained target image classification model can be similar to the image types of the sample image set output by the trained auxiliary classification model. Therefore, through the trained target image classification model, the time consumption of image classification can be reduced on the premise of not sacrificing too much image classification accuracy.
However, because there is a large difference in the parameter amounts of the model parameters between the auxiliary classification model and the image classification model, the small image classification model containing a small amount of model parameters is trained for multiple rounds based on the image class of the sample image set output by the trained auxiliary classification model, and the constraint condition is single, so that the trained target image classification model cannot accurately learn the image classification capability of the auxiliary classification model, and therefore, the image classification accuracy of the trained target image classification model is low.
Therefore, the training mode adopted under the related technology cannot ensure the classification accuracy and the classification reliability of the target image classification model obtained by training on the premise of not improving the time consumption of image classification.
Disclosure of Invention
The embodiment of the application provides a method, a device, computer equipment and a storage medium for training an image classification model, which are used for solving the problems of low classification accuracy and low classification reliability of a target image classification model obtained through training.
In a first aspect, a method of training an image classification model is provided, comprising:
acquiring a sample image set associated with a plurality of sample categories, wherein each sample category is associated with a plurality of sample images;
And respectively extracting the characteristics of each sample image by adopting a trained auxiliary classification model to obtain corresponding auxiliary image characteristics, and executing the following operations on each sample category: performing feature fusion on auxiliary image features of a plurality of sample images associated with one sample category to obtain corresponding auxiliary category features;
performing multiple rounds of iterative training on the image classification model to be trained based on the sample image set, and outputting a trained target image classification model, wherein each round of iteration comprises:
adopting the image classification model to respectively extract characteristics of a plurality of sample images associated with one sample class to obtain corresponding training image characteristics;
and adjusting model parameters of the image classification model based on the training image features and the auxiliary image features corresponding to the sample images and the auxiliary category features of the sample categories.
In a second aspect, there is provided an apparatus for training an image classification model, comprising:
the acquisition module is used for: the method comprises the steps of acquiring a sample image set associated with a plurality of sample categories, wherein each sample category is associated with a plurality of sample images;
the processing module is used for: the method is used for respectively carrying out feature extraction on each sample image by adopting a trained auxiliary classification model to obtain corresponding auxiliary image features, and carrying out the following operations on each sample category: performing feature fusion on auxiliary image features of a plurality of sample images associated with one sample category to obtain corresponding auxiliary category features;
The processing module is further configured to: performing multiple rounds of iterative training on the image classification model to be trained based on the sample image set, and outputting a trained target image classification model, wherein each round of iteration comprises:
the processing module is further configured to: adopting the image classification model to respectively extract characteristics of a plurality of sample images associated with one sample class to obtain corresponding training image characteristics;
and adjusting model parameters of the image classification model based on the training image features and the auxiliary image features corresponding to the sample images and the auxiliary category features of the sample categories.
Optionally, the processing module is specifically configured to:
determining a feature consistency loss of the image classification model based on training image features and auxiliary image features corresponding to each of the plurality of sample images, wherein the feature consistency loss characterizes: adopting the auxiliary classification model and the image classification model to perform consistency of feature extraction;
determining a relationship consistency loss of the image classification model based on the training image features and the auxiliary image features corresponding to each of the plurality of sample images and the auxiliary class features of each of the plurality of sample classes, wherein the relationship consistency loss characterizes: determining consistency of inter-class relationships between the plurality of sample classes using the auxiliary classification model and the image classification model;
And adjusting model parameters of the image classification model based on the obtained feature consistency loss and the relationship consistency loss.
Optionally, the processing module is specifically configured to:
for the plurality of sample images, the following operations are performed, respectively:
respectively taking the multiple sample categories as target categories, respectively determining the similarity between training image features corresponding to one sample image and auxiliary category features of the multiple target categories, and obtaining corresponding training similarity;
respectively determining the similarity between the auxiliary image features corresponding to the sample image and the auxiliary category features of each of the multiple target categories to obtain corresponding auxiliary similarity;
based on the obtained plurality of training similarities and the plurality of auxiliary similarities, a loss of relational consistency of the image classification model is determined.
Optionally, the processing module is specifically configured to:
for the plurality of sample images, the following operations are performed, respectively:
respectively determining the similarity between the auxiliary image features corresponding to the sample image and the auxiliary category features of the plurality of sample categories, respectively taking a plurality of similarities which are larger than a preset similarity threshold value in the obtained similarities as auxiliary similarities, and respectively taking the sample categories corresponding to the obtained auxiliary similarities as target categories;
Respectively determining the similarity between the training image features corresponding to the sample image and the auxiliary category features of the obtained multiple target categories to obtain corresponding training similarity;
based on the obtained plurality of training similarities and the plurality of auxiliary similarities, a loss of relational consistency of the image classification model is determined.
Optionally, the processing module is specifically configured to:
for the plurality of target categories, respectively performing the following operations:
taking an average value of training similarity between the plurality of sample images and auxiliary category characteristics of one target category as a training category relationship between the one sample category and the one target category;
taking an average value of auxiliary similarity between the plurality of sample images and auxiliary category characteristics of the one target category as an auxiliary category relationship between the one sample category and the one target category;
and determining a relationship consistency loss of the image classification model based on the obtained errors between the relationships among the plurality of training classes and the relationships among the plurality of auxiliary classes.
Optionally, the processing module is specifically configured to:
respectively determining errors between training image features and auxiliary image features corresponding to the sample images to obtain corresponding feature consistency errors;
And determining the feature consistency loss of the image classification model based on the obtained weighted average value of the feature consistency errors.
Optionally, the processing module is specifically configured to:
taking the weighted sum of the obtained feature consistency loss and the relationship consistency loss as the training loss of the image classification model;
and when the obtained training loss is determined to not reach the training target, adjusting the model parameters of the image classification model.
Optionally, the processing module is further configured to:
obtaining an image to be classified;
extracting features of the images to be classified by adopting the target image classification model to obtain target image features;
and determining the target category of the image to be classified from the plurality of sample categories based on the target image characteristics by adopting the target image classification model.
In a third aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first aspect.
In a fourth aspect, there is provided a computer device comprising:
a memory for storing program instructions;
and a processor for calling program instructions stored in the memory and executing the method according to the first aspect according to the obtained program instructions.
In a fifth aspect, there is provided a computer readable storage medium storing computer executable instructions for causing a computer to perform the method of the first aspect.
In the embodiment of the application, the trained auxiliary classification model is adopted to train the image classification model, so that the image classification model can learn the image classification capability of the auxiliary classification model, and when the model parameters of the auxiliary classification model are far more than those of the image classification model, the trained target image classification model can provide a prepared image classification service with fewer model parameters.
Further, in the training process of the image classification model, model parameters of the image classification model are adjusted based on the auxiliary image features of the plurality of sample images extracted by the auxiliary classification model and the training image features of the plurality of sample images extracted by the image classification model in each round of training, and the image classification model learns the feature extraction capability of the auxiliary classification model in the aspect of extracting the image features, so that the trained target image classification model can provide more ready reliable image classification service.
Further, the auxiliary classification model is adopted to determine the auxiliary classification characteristics of each of a plurality of sample classes, so that based on the auxiliary image characteristics of each of a plurality of sample images extracted by adopting the auxiliary classification model and the training image characteristics of each of a plurality of sample images extracted by adopting the image classification model, the sample classes in the training process of the training process and the inter-class relations between other sample classes can be learned through the similarity relations among the characteristics, and therefore, the image classification model can learn the capability of the auxiliary classification model to acquire the characteristics in the image with emphasis, and the trained target image classification model can provide more reliable image classification services.
From various angles of the characteristic relation and the inter-class relation, the image classification model is trained, and the detection accuracy and the detection reliability of the target image classification model obtained through training are improved.
Drawings
FIG. 1A is a schematic diagram of an application field of an image classification model according to an embodiment of the present application;
FIG. 1B is an application scenario of a method for training an image classification model according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for training an image classification model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a method for training an image classification model according to an embodiment of the present application;
FIG. 4 is a schematic diagram II of a method for training an image classification model according to an embodiment of the present application;
FIG. 5 is a schematic diagram III of a method for training an image classification model according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a method for training an image classification model according to an embodiment of the present application;
FIG. 7A is a schematic diagram of a method for training an image classification model according to an embodiment of the present application;
FIG. 7B is a schematic diagram of a method for training an image classification model according to an embodiment of the present application;
FIG. 8 is a schematic diagram of an apparatus for training an image classification model according to an embodiment of the present application;
fig. 9 is a schematic diagram of a second structure of an apparatus for training an image classification model according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.
Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.
(1) Intra-class relationships and inter-class relationships:
the intra-class relationships characterize relationships between images belonging to the same class, e.g., relationships between different face images of the same object.
The inter-class relationships characterize relationships between images belonging to different classes, e.g., relationships between face images of different objects.
The embodiment of the application relates to the field of artificial intelligence (Artificial Intelligence, AI), which is designed based on Computer Vision (CV) technology and Machine Learning (ML) technology, and can be applied to the fields of cloud computing, intelligent transportation, intelligent agriculture, intelligent medical treatment or map and the like.
Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a comprehensive technology of computer science, which studies the design principles and implementation methods of various machines in an attempt to understand the essence of intelligence, and to produce a new intelligent machine that can react in a similar way to human intelligence, so that the machine has the functions of sensing, reasoning and decision.
Artificial intelligence is a comprehensive discipline, and relates to a wide range of fields, including hardware-level technology and software-level technology. Basic technologies of artificial intelligence generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation interaction systems, electromechanical integration, and the like. The software technology of artificial intelligence mainly comprises computer vision technology, voice processing technology, natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other large directions. With the development and progress of artificial intelligence, the artificial intelligence is developed and applied in various fields, such as common fields of smart home, smart customer service, virtual assistant, smart sound box, smart marketing, smart wearable equipment, unmanned driving, automatic driving, unmanned plane, robot, smart medical treatment, internet of vehicles, automatic driving, smart transportation, etc., and it is believed that with the further development of future technology, the artificial intelligence will be applied in more fields, playing an increasingly important value. The scheme provided by the embodiment of the application relates to the technology of artificial intelligence deep learning, augmented reality and the like, and is specifically further described through the following embodiments.
The computer vision is a science for researching how to make a machine "see", and more specifically, a camera and a computer are used to replace human eyes to identify, track and measure targets, and the like, and further, graphic processing is performed, so that the computer is processed into images which are more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision technologies typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping, autopilot, intelligent transportation, etc., as well as common biometric technologies such as face recognition, fingerprint recognition, etc.
Machine learning is a multi-field interdisciplinary, and relates to multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like, and a specially researched computer acquires new knowledge or skills by simulating learning behaviors of human beings, reorganizes the existing knowledge structure and enables the computer to continuously improve the performance of the computer.
Machine learning is the core of artificial intelligence, which is the fundamental way for computers to have intelligence, applied throughout various areas of artificial intelligence; the core of machine learning is deep learning, which is a technology for realizing machine learning. Machine learning typically includes deep learning, reinforcement learning, transfer learning, induction learning, artificial neural networks, teaching learning, etc., which includes convolutional neural networks (Convolutional Neural Networks, CNN), deep confidence networks, recurrent neural networks, automatic encoders, generation countermeasure networks, etc.
It should be noted that, in the embodiments of the present application, related data such as a sample image set or an image to be classified is related, when the above embodiments of the present application are applied to specific products or technologies, user permission or consent is required to be obtained, and the collection, use and processing of related data are required to comply with related laws and regulations and standards of related countries and regions.
The application field of the method for training the image classification model provided by the embodiment of the application is briefly described below.
With the development of technology, more and more devices can provide an image classification service through a trained object image classification model, and the image classification service can be used for determining the category to which an object in an image belongs.
For example, in the practical application fields of security protection, payment, access control and the like, a face recognition system in a mobile terminal can be adopted to shoot a target object to obtain a face image, and then a trained target image classification model is adopted to carry out image classification on the face image so as to determine the identity of the object corresponding to the face in the face image. Referring to fig. 1A, a trained target image classification model is adopted to perform image classification on a face image a, and an object corresponding to a face in the face image a is determined to be an object a.
Therefore, the method is a mobile terminal, so that the method has high requirements on time consumption and accuracy of image classification of the target image classification model. To reduce the time consuming image classification of the target image classification model, the image classification model with fewer model parameters is typically trained to obtain a trained target image classification model. If a large number of sample images are directly adopted to train the image classification model, the fitting capability of the image classification model is poor, so that the image classification model falls into the local minimum value of the loss function in the training process and then starts to oscillate, further optimization of the loss function cannot be carried out, and the image classification accuracy of the trained target image classification model is low.
In the related art, in order to balance time consumption of image classification of a trained target image classification model and accuracy of image classification of the trained target image classification model, a method for obtaining the trained target image classification model is generally to perform multiple rounds of training on a large auxiliary classification model containing a large number of model parameters based on a sample image set to obtain a trained auxiliary classification model; and training the small image classification model containing a small number of model parameters for multiple times based on the image types of the sample image set output by the trained auxiliary classification model, so that the image types of the sample image set output by the obtained trained target image classification model can be similar to the image types of the sample image set output by the trained auxiliary classification model. Therefore, through the trained target image classification model, the time consumption of image classification can be reduced on the premise of not sacrificing too much image classification accuracy.
However, because there is a large difference in the parameter amounts of the model parameters between the auxiliary classification model and the image classification model, the small image classification model containing a small amount of model parameters is trained for multiple rounds based on the image class of the sample image set output by the trained auxiliary classification model, and the constraint condition is single, so that the trained target image classification model cannot accurately learn the image classification capability of the auxiliary classification model, and therefore, the image classification accuracy of the trained target image classification model is low.
Therefore, the training mode adopted under the related technology cannot ensure the classification accuracy and the classification reliability of the target image classification model obtained by training on the premise of not improving the time consumption of image classification.
In order to solve the problem that the classification accuracy and the classification reliability of the target image classification model obtained through training are low, the application provides a method for training the image classification model. In the method, a sample image set associated with a plurality of sample categories is acquired first, and each sample category is associated with a plurality of sample images. And respectively extracting the characteristics of each sample image by adopting a trained auxiliary classification model to obtain corresponding auxiliary image characteristics, and executing the following operations on each sample category: and carrying out feature fusion on the auxiliary image features of a plurality of sample images associated with one sample category to obtain corresponding auxiliary category features. And carrying out multiple rounds of iterative training on the image classification model to be trained based on the sample image set, and outputting a trained target image classification model.
Each iteration includes: and adopting an image classification model to respectively extract the characteristics of a plurality of sample images associated with one sample class, and obtaining corresponding training image characteristics. Based on the training image features and the auxiliary image features corresponding to the sample images and the auxiliary category features of the sample categories, model parameters of the image classification model are adjusted.
In the embodiment of the application, the trained auxiliary classification model is adopted to train the image classification model, so that the image classification model can learn the image classification capability of the auxiliary classification model, and when the model parameters of the auxiliary classification model are far more than those of the image classification model, the trained target image classification model can provide a prepared image classification service with fewer model parameters.
Further, in the training process of the image classification model, model parameters of the image classification model are adjusted based on the auxiliary image features of the plurality of sample images extracted by the auxiliary classification model and the training image features of the plurality of sample images extracted by the image classification model in each round of training, and the image classification model learns the feature extraction capability of the auxiliary classification model in the aspect of extracting the image features, so that the trained target image classification model can provide more ready reliable image classification service.
Further, the auxiliary classification model is adopted to determine the auxiliary classification characteristics of each of a plurality of sample classes, so that based on the auxiliary image characteristics of each of a plurality of sample images extracted by adopting the auxiliary classification model and the training image characteristics of each of a plurality of sample images extracted by adopting the image classification model, the sample classes in the training process of the training process and the inter-class relations between other sample classes can be learned through the similarity relations among the characteristics, and therefore, the image classification model can learn the capability of the auxiliary classification model to acquire the characteristics in the image with emphasis, and the trained target image classification model can provide more reliable image classification services.
From various angles of the characteristic relation and the inter-class relation, the image classification model is trained, and the detection accuracy and the detection reliability of the target image classification model obtained through training are improved.
The application scenario of the method for training the image classification model provided by the application is described below.
Referring to fig. 1B, a schematic view of an application scenario of the method for training an image classification model according to the present application is shown. The application scene comprises a client 101 and a server 102. Communication may be between client 101 and server 102. The communication mode can be communication by adopting a wired communication technology, for example, communication is carried out through a connecting network wire or a serial port wire; the communication may also be performed by using a wireless communication technology, for example, a bluetooth or wireless fidelity (wireless fidelity, WIFI) technology, which is not particularly limited.
Client 101 broadly refers to a device that may provide a sample image set to server 102 or may use a trained target image classification model or the like, e.g., a terminal device, a third party application accessible to a terminal device, or a web page accessible to a terminal device, or the like. Terminal devices include, but are not limited to, cell phones, computers, smart medical devices, smart home appliances, vehicle terminals or aircraft, etc. The server 102 generally refers to a device, such as a terminal device or a server, that can train or use a target image classification model. Servers include, but are not limited to, cloud servers, local servers, or associated third party servers, and the like. Both the client 101 and the server 102 can adopt cloud computing to reduce occupation of local computing resources; cloud storage may also be employed to reduce the occupation of local storage resources.
As an embodiment, the client 101 and the server 102 may be the same device, which is not limited in particular. In the embodiment of the present application, the description will be given by taking the case that the client 101 and the server 102 are different devices respectively.
The following specifically describes a method for training an image classification model provided by the embodiment of the present application, based on fig. 1B, with a server 102 as a server and a server as a main body. Referring to fig. 2, a flowchart of a method for training an image classification model according to an embodiment of the application is shown.
S201, acquiring a sample image set associated with a plurality of sample categories.
The sample image set contains a plurality of sample images, each sample image being associated with a respective sample category, each sample category being associated with a plurality of sample images. Taking the sample image as a face image as an example, each object may be associated with a plurality of face images, where the plurality of face images may include a plurality of front face images, side face images, top view face images, or bottom view face images of the corresponding object, and the like, and is not specifically limited.
By acquiring the sample image set associated with a plurality of sample categories, training and learning can be performed on the same sample category based on a plurality of sample images when the image classification model is trained, priori knowledge of the image classification model is enriched, and therefore classification accuracy and classification reliability of the trained target image classification model can be improved.
S202, adopting a trained auxiliary classification model to respectively extract the characteristics of each sample image to obtain corresponding auxiliary image characteristics, and executing the following operations on each sample category: and carrying out feature fusion on the auxiliary image features of a plurality of sample images associated with one sample category to obtain corresponding auxiliary category features.
Before the trained auxiliary classification model is adopted to extract the characteristics of each sample image respectively to obtain the corresponding auxiliary image characteristics, the auxiliary classification model to be trained can be subjected to multiple rounds of iterative training based on the sample image set, and the trained auxiliary classification model is output. The auxiliary classification model is a model containing a large number of model parameters, the parameter quantity of the model parameters contained in the auxiliary classification model is far greater than the parameter quantity of the model parameters contained in the image classification model, and the auxiliary classification model is used for carrying out image classification processing on the image.
Taking a round of iterative training process as an example, the process of training the auxiliary classification model is described, and the process of each round of iterative training is similar and is not described herein.
Multiple sample images belonging to each sample class can be read from the sample image set using the auxiliary classification model to be trained.
After a plurality of sample images belonging to one sample category are read, adopting an auxiliary classification model to be trained, sequentially extracting the characteristics of the read plurality of sample images to obtain respective image characteristics of the plurality of sample images, for example, respectively extracting the spatial characteristics of the plurality of sample images to obtain respective spatial characteristics of the plurality of sample images, and the like. In the auxiliary classification model to be trained, a convolutional neural network can be used to provide the capability of feature extraction, and the convolutional neural network can comprise a convolutional layer, a nonlinear activation function layer, a pooling layer and the like, which is not particularly limited.
After obtaining the respective image features of the plurality of sample images, an auxiliary classification model to be trained may be employed to determine class features of one sample class based on the respective image features of the plurality of sample images belonging to the one sample class. Thus, based on a plurality of sample images read at a time, a category characteristic of each sample category can be obtained. The class features of all sample classes may be stored in a matrix form, e.g. the shape of the matrix is (d x m), where d is the dimension of the class feature of each sample class and m is the number of sample classes.
Determining the class feature of the one sample class based on the respective image features of the plurality of sample images belonging to the one sample class may be to take an average feature of the respective image features of the plurality of sample images belonging to the one sample class as the class feature of the one sample class; the clustering center may be calculated based on the image features of each of the plurality of sample images belonging to one sample class, and the calculated clustering center may be used as the class feature of the one sample class, or the like, without being limited thereto.
And carrying out the iterative training of the round of auxiliary classification models to be trained aiming at a plurality of sample images which belong to one sample class and are read each time.
And adopting an auxiliary classification model to be trained, respectively calculating matrix products between the image characteristics and the class characteristics of each sample class according to the image characteristics of each sample image belonging to one sample class read at the time, and obtaining the probability value of the corresponding sample image belonging to each sample class through an activation function such as a softmax function. The probability value of each sample category of the sample image can form a probability vector of the sample image, each sample category is associated with a fixed arrangement sequence, and the corresponding probability values are sequentially arranged according to the arrangement sequence, so that the probability vector can be obtained. Based on the error between the probability vector of the sample image and the sample class of the sample image, a loss function of the auxiliary classification model to be trained is determined. The sample class of the sample image can be regarded as a probability vector, wherein the probability value of the sample class of the sample image is 1 and the rest are 0.
The loss function may be in the form of an activation function or various plus margin type activation functions, etc., and is not particularly limited.
When the loss function does not meet the training target, a gradient descent algorithm, such as a random gradient descent algorithm, a random gradient descent algorithm driving a measuring term, an adam algorithm or an adagard algorithm, and the like, can be adopted to carry out parameter adjustment on the auxiliary classification model to be trained, and the next round of iterative training is carried out until the obtained loss function meets the training target, and the trained auxiliary classification model is output. The training target may be that the number of iterations reaches a preset threshold, or that the loss function is smaller than a preset value, or the like, which is not particularly limited.
After the trained auxiliary classification model is obtained, the auxiliary classification model obtained through training has higher classification accuracy and classification reliability after the training is fully performed through a large number of sample images because the auxiliary classification model contains a large number of model parameters. Therefore, the trained auxiliary classification model can be adopted to extract the characteristics of each sample image respectively, so as to obtain the corresponding auxiliary image characteristics.
For each sample class, a corresponding auxiliary class feature is calculated, and a sample class is taken as an example, and the situation of each sample class is similar, and will not be described herein.
And carrying out feature fusion on auxiliary image features of a plurality of sample images associated with one sample category by adopting an auxiliary classification model to obtain corresponding auxiliary category features. The feature fusion can be to calculate the average feature of each auxiliary image feature, and take the average feature as the auxiliary category feature; the clustering center of each auxiliary image feature may be calculated, and the clustering center may be used as an auxiliary category feature, and the like, and is not particularly limited.
By calculating the auxiliary category characteristics of each sample category, for one sample category, the auxiliary image characteristics of each sample image associated with the sample category have higher similarity with the auxiliary category characteristics of the sample category; for different sample types, the auxiliary image features of the sample image associated with one sample type have lower similarity with the auxiliary type features of other sample types. When the image classification model to be trained is trained through the auxiliary image features and the auxiliary class features, the image classification model to be trained can learn the intra-class characteristics and the inter-class characteristics among the features extracted by the auxiliary classification model, so that the auxiliary classification model can be fully learned from multiple aspects, and the classification accuracy and the classification reliability of the trained target image classification model are improved.
Referring to fig. 3, the auxiliary classification model may include a data preparation module, a large recognition module, a class feature storage module, a training module, and an optimization module.
The data preparation module may be configured to read a plurality of sample images belonging to each sample class; the large-scale identification module comprises a large number of model parameters in the auxiliary classification model, and is used for sequentially extracting the characteristics of the read sample images to obtain the respective image characteristics of the sample images. The category feature storage module is used for determining the category features of one sample category based on the respective image features of a plurality of sample images belonging to the one sample category, so as to store the category features of each sample category. The training module is used for carrying out multiple rounds of iterative training on the auxiliary classification model to be trained, determining a loss function of the auxiliary classification model to be trained in each round of iterative training process, and determining whether the loss function meets a training target. The optimization module is used for carrying out parameter adjustment on the auxiliary classification model to be trained when the loss function does not meet the training target; and outputting the trained auxiliary classification model when the loss function meets the training target.
S203, performing multiple rounds of iterative training on the image classification model to be trained based on the sample image set, and outputting a trained target image classification model.
After the sample image set is obtained, and the respective auxiliary image features of each sample image and the respective auxiliary category features of each sample category contained in the sample image set, the image classification model to be trained can be subjected to multiple rounds of iterative training, and a trained target image classification model is output.
In the following description, a round of iterative training is taken as an example, and the process of each round of iterative training is similar, and is not repeated herein, please refer to S204-S205.
S204, adopting an image classification model to respectively extract characteristics of a plurality of sample images associated with one sample class, and obtaining corresponding training image characteristics.
During each iteration of the training process, multiple sample images associated with one sample class may be processed. And adopting an image classification model to respectively extract the characteristics of a plurality of sample images associated with one sample class, and obtaining corresponding training image characteristics. Training image features may characterize the current feature extraction capabilities of the image classification model, as well as the image classification capabilities.
In order to make the image classification capability of the image classification model closer to that of the auxiliary classification model, the similarity between the training image features extracted by the image classification model and the auxiliary image features extracted by the auxiliary classification model can be achieved through training and improving. The method for improving the similarity between the training image features extracted by the training improvement image classification model and the auxiliary image features extracted by the auxiliary classification model is various, if the method for reducing the error between the training image features and the auxiliary image features is only adopted, the training image features of each sample image are very similar to the auxiliary image features due to the fact that the constraint condition is too single, and the training time is long. Therefore, in the embodiment of the application, besides a sentence of training image features and auxiliary image features, by adding constraint conditions, the training process is enabled to have a more definite target, so that the image classification capability of the image classification model is enabled to be closer to the image classification capability of the auxiliary classification model while the training time is shortened, refer to S205.
S205, adjusting model parameters of the image classification model based on the training image features and the auxiliary image features corresponding to the sample images and the auxiliary category features of the sample categories.
After obtaining the training image features and the auxiliary image features corresponding to each of the plurality of sample images and the auxiliary class features corresponding to each of the plurality of sample classes, model parameters of the image classification model may be adjusted based on the training image features and the auxiliary image features corresponding to each of the plurality of sample images and the auxiliary class features corresponding to each of the plurality of sample classes. Compared with a method for adjusting model parameters of an image classification model based on training image features and auxiliary image features corresponding to a plurality of sample images, the method has the advantages that priori knowledge is more abundant, the training process is more efficient, and classification accuracy and classification reliability of the target image classification model obtained through training are higher.
As an embodiment, in adjusting the model parameters of the image classification model based on the training image features and the auxiliary image features corresponding to the plurality of sample images and the auxiliary class features corresponding to the plurality of sample classes, a feature consistency loss of the image classification model may be determined based on the training image features and the auxiliary image features corresponding to the plurality of sample images, wherein the feature consistency loss is characterized by: and adopting an auxiliary classification model and an image classification model to perform consistency of feature extraction. Determining a relationship consistency loss of the image classification model based on the training image features and the auxiliary image features corresponding to each of the plurality of sample images and the auxiliary class features of each of the plurality of sample classes, wherein the relationship consistency loss characterizes: and determining consistency of the inter-class relationship between the plurality of sample classes by adopting an auxiliary classification model and an image classification model. And adjusting model parameters of the image classification model based on the obtained feature consistency loss and the relationship consistency loss.
Through the feature consistency loss, the similarity between the training image features extracted by the image classification model and the auxiliary image features extracted by the auxiliary classification model can be ensured; by the loss of the relationship consistency, the intra-class similarity between the sample images belonging to the same sample category can be reduced, and the inter-class similarity between the sample images belonging to different sample categories can be enlarged. Thus, the training process of the image classification model is constrained by multiple dimensions.
As an embodiment, when determining that the feature consistency of the image classification model is lost based on the training image features and the auxiliary image features corresponding to the plurality of sample images, errors between the training image features and the auxiliary image features corresponding to the plurality of sample images may be determined respectively, and corresponding feature consistency errors may be obtained. Determining a feature consistency loss L of the image classification model based on the weighted average of the obtained feature consistency errors f Please refer to formula (1).
L f =‖F x -F y2 (1)
Wherein F is x Characterizing training image features corresponding to each of the plurality of sample images, F y Characterizing auxiliary image features corresponding to each of the plurality of sample images 2 Characterization calculates the L2 norm.
As an embodiment, when determining that the relationship consistency of the image classification model is lost based on the training image features and the auxiliary image features corresponding to the plurality of sample images and the auxiliary category features corresponding to the plurality of sample categories, a sample image is taken as an example, and the situation of each sample image is similar and will not be described herein.
And respectively taking the multiple sample categories as target categories, respectively determining the similarity between the training image characteristics corresponding to one sample image and the auxiliary category characteristics of the multiple target categories, and obtaining corresponding training similarity. And respectively determining the similarity between the auxiliary image features corresponding to one sample image and the auxiliary category features of each of the multiple target categories, and obtaining corresponding auxiliary similarity. Based on the obtained plurality of training similarities and the plurality of auxiliary similarities, a loss of relational consistency of the image classification model is determined.
The auxiliary similarity can be pre-calculated by the auxiliary relationship calculation model after the auxiliary image characteristics of each sample image are extracted by the auxiliary relationship calculation model, so that when the relationship consistency of the auxiliary similarity calculation image classification model is required to be lost, the pre-calculated auxiliary similarity can be directly read from the auxiliary relationship calculation model, the efficiency of determining the relationship consistency loss is improved, and the efficiency of training the image classification model is improved.
Referring to fig. 4, the auxiliary relationship calculation model may include a data preparation module, an inter-class relationship calculation module, and a relationship storage module.
The data preparation module is used for reading a plurality of sample images belonging to each sample category, and may share one data preparation module with the auxiliary classification model, etc., without being limited in particular. The inter-class relation calculation module is used for acquiring the auxiliary image characteristics corresponding to each sample image and the auxiliary class characteristics corresponding to the multiple target classes from the auxiliary classification model, determining the similarity between the auxiliary image characteristics corresponding to one sample image and the auxiliary class characteristics corresponding to the multiple target classes according to each sample image, and acquiring the corresponding auxiliary similarity. The relation storage module is used for storing a plurality of auxiliary similarities corresponding to the one sample image.
As one example, if the number of sample categories is large, a part of sample categories can be selected to determine the relation consistency loss of the image classification model. Taking a sample image as an example, the situation of each sample image is similar, and will not be described here again.
And respectively determining the similarity between the auxiliary image features corresponding to one sample image and the auxiliary category features of each of a plurality of sample categories, and respectively taking a plurality of similarities which are larger than a preset similarity threshold value in the obtained similarities as auxiliary similarities.
For example, after the obtained respective degrees of similarity, the respective degrees of similarity may be sorted in order from large to small. And taking the similarity arranged in the n+1th bit as a preset similarity threshold, wherein the first n-bit similarity is a plurality of similarities larger than the preset similarity threshold, so as to obtain n auxiliary similarities.
And respectively taking the sample categories corresponding to the obtained auxiliary similarity as target categories. And respectively determining the similarity between the training image characteristics corresponding to one sample image and the auxiliary category characteristics of each of the obtained multiple target categories, and obtaining corresponding training similarity. Based on the obtained plurality of training similarities and the plurality of auxiliary similarities, a loss of relational consistency of the image classification model is determined.
As an embodiment, when determining that the relationship consistency of the image classification model is lost based on the obtained plurality of training similarities and the obtained plurality of auxiliary similarities, a sample image is taken as an example for description, and the situation of each sample image is similar and will not be described herein.
And taking an average value of training similarity between the plurality of sample images and auxiliary category characteristics of one target category as a training category relation between one sample category and one target category. And taking an average value of auxiliary similarity between the plurality of sample images and auxiliary category characteristics of one target category as an auxiliary category relation between one sample category and one target category. Determining an image score based on an error between the obtained plurality of training classes and the plurality of auxiliary classes Class model relationship consistency loss L r Please refer to formula (2).
Wherein,characterizing an auxiliary class relationship between one sample class and n target classes calculated based on the ith sample image,/i->And representing the training class relation between the sample class corresponding to the ith sample image and N target classes, wherein N represents the number of the sample images.
Relationship consistency loss L of image classification model r A margin term may be added to make the constraint on the relationship consistency loss stricter, please refer to equation (3).
As an example, after obtaining the feature consistency loss and the relationship consistency loss, a weighted sum of the obtained feature consistency loss and the relationship consistency loss may be used as the training loss L of the image classification model, please refer to the formula (4).
L=L f +L r (4)
Wherein, the formula (4) is a case where the weights of the feature consistency loss and the relationship consistency loss are both 1.
And when the obtained training loss does not reach the training target, adjusting model parameters of the image classification model, and outputting the image classification model as a trained target image classification model until the obtained training loss reaches the training target.
The method for training the image classification model provided by the embodiment of the application is described by taking a sample image as a face image and taking an object corresponding to the face image as an example.
Referring to fig. 5, in the training stage, an auxiliary classification model may be trained based on a sample image set, then, by combining the auxiliary image features and the auxiliary class features output by the trained auxiliary classification model, an auxiliary relationship between the auxiliary classes of the sample classes is determined by an auxiliary relationship calculation module, and finally, by combining the auxiliary image features and the auxiliary class features output by the auxiliary classification model and the auxiliary relationships between the auxiliary classes output by the auxiliary relationship calculation module, the image classification model is trained to obtain the target image classification model. Thus, in the use stage, the image classification service can be provided using the target image classification model.
In the embodiment of the application, the sample image set can be obtained according to the classified scene. And carrying out multiple rounds of iterative training on the auxiliary classification model to be trained based on the sample image set to obtain a trained auxiliary classification model. For example, if the classification scene is a face recognition scene, the sample image set is a sample image set containing face images, and if there is no trained auxiliary classification model conforming to the face recognition scene, an auxiliary classification model may be trained based on the face images. If a trained auxiliary classification model conforming to the face recognition scenario already exists, the auxiliary classification model can be used directly without having to train the auxiliary classification model once more. Prior to training the image classification model, various prior knowledge may be obtained offline to improve the efficiency of training the image classification model.
The training process of the auxiliary classification model and the calculation process of the auxiliary relation calculation module may be described with reference to the foregoing description, and will not be described herein.
Referring to fig. 6, the image classification model may include a data preparation module, a feature extraction module, a training relationship calculation module, a feature consistency loss calculation module, a relationship consistency loss calculation module, and an optimization module.
The data preparation module may be configured to read a plurality of sample images corresponding to each sample class, taking a round of iterative training process as an example, the data preparation module may read a plurality of sample images corresponding to one sample class, and the data preparation module may sequentially input each sample image into the feature extraction module.
The feature extraction module contains fewer model parameters, and the parameter quantity is far smaller than the parameter quantity of the model parameters of the auxiliary classification model. The feature extraction module may be used to extract spatial features of the image, obtain spatial structure information, and the like. The structure of the feature extraction module may be a convolutional neural network, for example, including a convolutional layer, a nonlinear activation function layer, a pooling layer, and the like. The feature extraction module can extract the training image features of each sample image, the feature extraction module can input the obtained training image features into the feature consistency loss calculation module, and meanwhile, the feature consistency loss calculation module can obtain auxiliary image features corresponding to a plurality of sample images corresponding to the sample category from the auxiliary classification model.
The feature consistency loss calculation module may determine a feature consistency loss of the image classification model based on the obtained errors between the training image features and the auxiliary image features.
The training relationship calculation module may obtain, from the auxiliary relationship calculation module, a similarity between the auxiliary image feature of each sample image and the auxiliary category feature of each sample category. The training relation calculation module sorts the obtained similarity according to the sequence from big to small, selects the first n phase similarity as auxiliary similarity, and marks asAnd taking the sample category corresponding to each selected auxiliary similarity as the target category.
The feature extraction module can also input the obtained training image features into a training relation calculation module, the training relation calculation module can calculate the similarity between the training image features of each sample image and the obtained auxiliary category features of n target categories, obtain training similarity, and record as
The training relation calculation module inputs the calculated training similarity and auxiliary similarity into the relation consistency loss calculation module, and the relation consistency loss calculation module determines the training class relation and auxiliary class relation between the sample class and each target class based on the training similarity and the auxiliary similarity. And determining the relation consistency loss of the image classification model based on the errors between the relations between the training classes and the relations between the auxiliary classes.
The feature consistency loss calculation module inputs the obtained feature consistency loss into the optimization module, and the relation consistency loss calculation module inputs the obtained relation consistency loss into the optimization module. And the optimization module determines the training loss of the image classification model, adjusts model parameters of the image classification model based on a gradient descent algorithm and the like when the training loss is determined to not reach a training target, and enters the next round of iterative training until the obtained training loss reaches the training target, for example, the iteration number reaches the preset number of times, or the training loss converges or the training loss is smaller than a preset value and the like, and outputs the trained target image classification model.
As an example, after obtaining the trained target image classification model, the image classification may be performed using the target image classification model, which may be less time consuming to use due to the smaller number of model parameters that the target image classification model contains. Meanwhile, the target image classification model fully learns the image classification capability of the auxiliary classification model, so that the target image classification model has higher classification accuracy and reliability on the premise of short classification time consumption.
When the target image classification model is used for image classification, after the image to be classified is obtained, the target image classification model can be adopted for extracting the characteristics of the image to be classified, and the target image characteristics are obtained. And determining the target category of the image to be classified from a plurality of sample categories based on the target image characteristics by adopting a target image classification model.
Referring to fig. 7A, when the image to be classified is a face image, a target image classification model including a small number of model parameters is used to extract features of the face image, so as to obtain target image features. And continuously adopting a target image classification model, classifying the face image based on the target image characteristics, and determining that the target class of the face image is reddish, and representing the face which is reddish in the face image.
And likewise, adopting a trained auxiliary classification model containing a large number of model parameters to extract the characteristics of the face image and obtaining the auxiliary image characteristics. And continuously adopting an auxiliary classification model, classifying the face image based on the auxiliary image characteristics, and determining that the target class of the face image is reddish, and representing the face which is reddish in the face image.
The target image features and the auxiliary image features extracted by the target image classification model and the auxiliary classification model are similar, so that the target category of the face image determined by the target image classification model and the auxiliary classification model is the same. Because the parameter quantity of the model parameters of the target image classification model is far smaller than the parameter quantity of the model parameters of the auxiliary classification model, the target image classification model is used, so that the accuracy and the reliability of image classification can be improved while the lower image classification efficiency is ensured; the image classification method can also reduce the image classification efficiency while ensuring higher image classification accuracy and reliability, so that the target image classification model can be suitable for portable equipment such as mobile terminals and the like.
Referring to fig. 7B, the target image classification model may include an image acquisition module, a feature extraction module, and an image classification module.
The image acquisition module is used for acquiring the images to be classified and sending the acquired images to be classified to the feature extraction module. The feature extraction module is also used for extracting target image features of the images to be classified and sending the obtained target image features to the image classification module. The image classification module is used for determining the target category of the image to be classified based on the target image characteristics. The feature extraction module is formed by carrying out knowledge migration on a large auxiliary classification model with stronger expression capability, and the feature distribution extracted by the feature extraction module has higher similarity with the feature distribution extracted by the auxiliary classification model, so that the target image classification model has higher classification accuracy.
Based on the same inventive concept, the embodiment of the application provides a device for training an image classification model, which can realize the functions corresponding to the method for training the image classification model. Referring to fig. 8, the apparatus includes an acquisition module 801 and a processing module 802, where:
acquisition module 801: the method comprises the steps of acquiring a sample image set associated with a plurality of sample categories, wherein each sample category is associated with a plurality of sample images;
processing module 802: the method is used for respectively carrying out feature extraction on each sample image by adopting a trained auxiliary classification model to obtain corresponding auxiliary image features, and carrying out the following operations on each sample category: performing feature fusion on auxiliary image features of a plurality of sample images associated with one sample category to obtain corresponding auxiliary category features;
the processing module 802 is further configured to: based on the sample image set, performing multiple rounds of iterative training on the image classification model to be trained, and outputting a trained target image classification model, wherein each round of iteration comprises:
the processing module 802 is further configured to: adopting an image classification model to respectively extract characteristics of a plurality of sample images associated with one sample class to obtain corresponding training image characteristics;
Based on the training image features and the auxiliary image features corresponding to the sample images and the auxiliary category features of the sample categories, model parameters of the image classification model are adjusted.
In one possible embodiment, the processing module 802 is specifically configured to:
determining a feature consistency loss of the image classification model based on the training image features and the auxiliary image features corresponding to each of the plurality of sample images, wherein the feature consistency loss characterizes: adopting an auxiliary classification model and an image classification model to perform consistency of feature extraction;
determining a relationship consistency loss of the image classification model based on the training image features and the auxiliary image features corresponding to each of the plurality of sample images and the auxiliary class features of each of the plurality of sample classes, wherein the relationship consistency loss characterizes: determining consistency of inter-class relationships among a plurality of sample classes by adopting an auxiliary classification model and an image classification model;
and adjusting model parameters of the image classification model based on the obtained feature consistency loss and the relationship consistency loss.
In one possible embodiment, the processing module 802 is specifically configured to:
for a plurality of sample images, the following operations are performed, respectively:
Respectively taking a plurality of sample categories as target categories, respectively determining the similarity between training image features corresponding to one sample image and auxiliary category features of the plurality of target categories, and obtaining corresponding training similarity;
respectively determining the similarity between the auxiliary image features corresponding to one sample image and the auxiliary category features of each of a plurality of target categories to obtain corresponding auxiliary similarity;
based on the obtained plurality of training similarities and the plurality of auxiliary similarities, a loss of relational consistency of the image classification model is determined.
In one possible embodiment, the processing module 802 is specifically configured to:
for a plurality of sample images, the following operations are performed, respectively:
respectively determining the similarity between the auxiliary image features corresponding to one sample image and the auxiliary category features of each of a plurality of sample categories, respectively taking a plurality of similarities which are larger than a preset similarity threshold value in the obtained similarities as auxiliary similarities, and respectively taking the sample categories corresponding to the obtained auxiliary similarities as target categories;
respectively determining the similarity between training image features corresponding to one sample image and auxiliary category features of each of the obtained multiple target categories, and obtaining corresponding training similarity;
Based on the obtained plurality of training similarities and the plurality of auxiliary similarities, a loss of relational consistency of the image classification model is determined.
In one possible embodiment, the processing module 802 is specifically configured to:
for a plurality of target categories, the following operations are respectively executed:
taking an average value of training similarity between a plurality of sample images and auxiliary category characteristics of a target category as a training category relation between the sample category and the target category;
taking an average value of auxiliary similarity between a plurality of sample images and auxiliary category characteristics of one target category as an auxiliary category relationship between one sample category and one target category;
based on the obtained errors between the relationships between the plurality of training classes and the relationships between the plurality of auxiliary classes, a relationship consistency loss of the image classification model is determined.
In one possible embodiment, the processing module 802 is specifically configured to:
respectively determining errors between training image features and auxiliary image features corresponding to the sample images to obtain corresponding feature consistency errors;
and determining the feature consistency loss of the image classification model based on the obtained weighted average value of the feature consistency errors.
In one possible embodiment, the processing module 802 is specifically configured to:
taking the weighted sum of the obtained feature consistency loss and the relationship consistency loss as the training loss of the image classification model;
and when the obtained training loss is determined to not reach the training target, adjusting model parameters of the image classification model.
In one possible embodiment, the processing module 802 is further configured to:
obtaining an image to be classified;
extracting features of the images to be classified by adopting a target image classification model to obtain target image features;
and determining the target category of the image to be classified from a plurality of sample categories based on the target image characteristics by adopting a target image classification model.
Referring to fig. 9, the apparatus for training the image classification model may be run on a computer device 900, and a current version and a historical version of a data storage program and application software corresponding to the data storage program may be installed on the computer device 900, where the computer device 900 includes a processor 980 and a memory 920. In some embodiments, the computer device 900 may include a display unit 940, the display unit 940 including a display panel 941 for displaying an interface or the like for interactive operation by a user.
In one possible embodiment, the display panel 941 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD) or an Organic Light-Emitting Diode (OLED) or the like.
The processor 980 is configured to read the computer program and then perform a method defined by the computer program, for example, the processor 980 reads a data storage program or a file, etc., so that the data storage program is executed on the computer device 900 and a corresponding interface is displayed on the display unit 940. Processor 980 may include one or more general-purpose processors and may also include one or more DSPs (Digital Signal Processor, digital signal processors) for performing associated operations to implement the techniques according to embodiments of the present application.
Memory 920 generally includes memory and external storage, and memory may be Random Access Memory (RAM), read Only Memory (ROM), CACHE memory (CACHE), and the like. The external memory can be a hard disk, an optical disk, a USB disk, a floppy disk, a tape drive, etc. The memory 920 is used to store computer programs including application programs corresponding to respective clients, etc., and other data, which may include data generated after the operating system or application programs are executed, including system data (e.g., configuration parameters of the operating system) and user data. In an embodiment of the present application, program instructions are stored in memory 920 and processor 980 executes the program instructions in memory 920, implementing any of the methods discussed in the previous figures.
The above-described display unit 940 is used to receive input digital information, character information, or touch operation/noncontact gestures, and to generate signal inputs related to user settings and function controls of the computer device 900, and the like. Specifically, in an embodiment of the present application, the display unit 940 may include a display panel 941. The display panel 941, such as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the display panel 941 or on the display panel 941 using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a predetermined program.
In one possible embodiment, the display panel 941 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a player, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 980, and can receive commands from the processor 980 and execute them.
The display panel 941 may be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the display unit 940, in some embodiments, the computer device 900 may also include an input unit 930, and the input unit 930 may include an image input device 931 and other input devices 932, which may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.
In addition to the above, computer device 900 may also include a power supply 990 for powering other modules, audio circuitry 960, near field communication module 970, and RF circuitry 910. The computer device 900 may also include one or more sensors 950, such as acceleration sensors, light sensors, pressure sensors, and the like. Audio circuitry 960 may include, among other things, a speaker 961 and a microphone 962, for example, where the computer device 900 may collect a user's voice via the microphone 962, perform a corresponding operation, etc.
The number of processors 980 may be one or more, and the processors 980 and memory 920 may be coupled or may be relatively independent.
As an example, processor 980 in fig. 9 may be used to implement the functionality of acquisition module 801 and processing module 802 as in fig. 8.
As an example, the processor 980 in fig. 9 may be used to implement the functions associated with the servers or terminal devices discussed above.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or the like, which can store program codes.
Alternatively, the above-described integrated units of the present application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in essence or in a part contributing to the prior art in the form of a software product, for example, by a computer program product stored in a storage medium, comprising several instructions for causing a computer device to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (12)

1. A method of training an image classification model, comprising:
Acquiring a sample image set associated with a plurality of sample categories, wherein each sample category is associated with a plurality of sample images;
and respectively extracting the characteristics of each sample image by adopting a trained auxiliary classification model to obtain corresponding auxiliary image characteristics, and executing the following operations on each sample category: performing feature fusion on auxiliary image features of a plurality of sample images associated with one sample category to obtain corresponding auxiliary category features;
performing multiple rounds of iterative training on the image classification model to be trained based on the sample image set, and outputting a trained target image classification model, wherein each round of iteration comprises:
adopting the image classification model to respectively extract characteristics of a plurality of sample images associated with one sample class to obtain corresponding training image characteristics;
and adjusting model parameters of the image classification model based on the training image features and the auxiliary image features corresponding to the sample images and the auxiliary category features of the sample categories.
2. The method of claim 1, wherein the adjusting model parameters of the image classification model based on the training image features and auxiliary image features corresponding to each of the plurality of sample images, and the auxiliary class features of each of the plurality of sample classes comprises:
Determining a feature consistency loss of the image classification model based on training image features and auxiliary image features corresponding to each of the plurality of sample images, wherein the feature consistency loss characterizes: adopting the auxiliary classification model and the image classification model to perform consistency of feature extraction;
determining a relationship consistency loss of the image classification model based on the training image features and the auxiliary image features corresponding to each of the plurality of sample images and the auxiliary class features of each of the plurality of sample classes, wherein the relationship consistency loss characterizes: determining consistency of inter-class relationships between the plurality of sample classes using the auxiliary classification model and the image classification model;
and adjusting model parameters of the image classification model based on the obtained feature consistency loss and the relationship consistency loss.
3. The method of claim 2, wherein the determining a loss of relational consistency of the image classification model based on the training image features and auxiliary image features corresponding to each of the plurality of sample images and the auxiliary class features of each of the plurality of sample classes comprises:
For the plurality of sample images, the following operations are performed, respectively:
respectively taking the multiple sample categories as target categories, respectively determining the similarity between training image features corresponding to one sample image and auxiliary category features of the multiple target categories, and obtaining corresponding training similarity;
respectively determining the similarity between the auxiliary image features corresponding to the sample image and the auxiliary category features of each of the multiple target categories to obtain corresponding auxiliary similarity;
based on the obtained plurality of training similarities and the plurality of auxiliary similarities, a loss of relational consistency of the image classification model is determined.
4. The method of claim 2, wherein the determining a loss of relational consistency of the image classification model based on the training image features and auxiliary image features corresponding to each of the plurality of sample images and the auxiliary class features of each of the plurality of sample classes comprises:
for the plurality of sample images, the following operations are performed, respectively:
respectively determining the similarity between the auxiliary image features corresponding to the sample image and the auxiliary category features of the plurality of sample categories, respectively taking a plurality of similarities which are larger than a preset similarity threshold value in the obtained similarities as auxiliary similarities, and respectively taking the sample categories corresponding to the obtained auxiliary similarities as target categories;
Respectively determining the similarity between the training image features corresponding to the sample image and the auxiliary category features of the obtained multiple target categories to obtain corresponding training similarity;
based on the obtained plurality of training similarities and the plurality of auxiliary similarities, a loss of relational consistency of the image classification model is determined.
5. The method of claim 3 or 4, wherein determining a loss of relational consistency of the image classification model based on the obtained plurality of training similarities and plurality of auxiliary similarities comprises:
for the plurality of target categories, respectively performing the following operations:
taking an average value of training similarity between the plurality of sample images and auxiliary category characteristics of one target category as a training category relationship between the one sample category and the one target category;
taking an average value of auxiliary similarity between the plurality of sample images and auxiliary category characteristics of the one target category as an auxiliary category relationship between the one sample category and the one target category;
and determining a relationship consistency loss of the image classification model based on the obtained errors between the relationships among the plurality of training classes and the relationships among the plurality of auxiliary classes.
6. The method of claim 2, wherein the determining a feature consistency penalty for the image classification model based on the training image features and auxiliary image features corresponding to each of the plurality of sample images comprises:
respectively determining errors between training image features and auxiliary image features corresponding to the sample images to obtain corresponding feature consistency errors;
and determining the feature consistency loss of the image classification model based on the obtained weighted average value of the feature consistency errors.
7. The method of claim 2, wherein adjusting model parameters of the image classification model based on the obtained feature consistency loss and relationship consistency loss comprises:
taking the weighted sum of the obtained feature consistency loss and the relationship consistency loss as the training loss of the image classification model;
and when the obtained training loss is determined to not reach the training target, adjusting the model parameters of the image classification model.
8. The method according to any one of claims 1 to 7, further comprising, after said performing a plurality of iterative rounds of training on the image classification model to be trained based on said sample image set, outputting a trained target image classification model:
Obtaining an image to be classified;
extracting features of the images to be classified by adopting the target image classification model to obtain target image features;
and determining the target category of the image to be classified from the plurality of sample categories based on the target image characteristics by adopting the target image classification model.
9. An apparatus for training an image classification model, comprising:
the acquisition module is used for: the method comprises the steps of acquiring a sample image set associated with a plurality of sample categories, wherein each sample category is associated with a plurality of sample images;
the processing module is used for: the method is used for respectively carrying out feature extraction on each sample image by adopting a trained auxiliary classification model to obtain corresponding auxiliary image features, and carrying out the following operations on each sample category: performing feature fusion on auxiliary image features of a plurality of sample images associated with one sample category to obtain corresponding auxiliary category features;
the processing module is further configured to: performing multiple rounds of iterative training on the image classification model to be trained based on the sample image set, and outputting a trained target image classification model, wherein each round of iteration comprises:
the processing module is further configured to: adopting the image classification model to respectively extract characteristics of a plurality of sample images associated with one sample class to obtain corresponding training image characteristics;
And adjusting model parameters of the image classification model based on the training image features and the auxiliary image features corresponding to the sample images and the auxiliary category features of the sample categories.
10. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.
11. A computer device, comprising:
a memory for storing program instructions;
a processor for invoking program instructions stored in the memory and for performing the method according to any of claims 1-8 in accordance with the obtained program instructions.
12. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 8.
CN202211140067.6A 2022-09-20 2022-09-20 Method, device, equipment and storage medium for training image classification model Pending CN117011567A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211140067.6A CN117011567A (en) 2022-09-20 2022-09-20 Method, device, equipment and storage medium for training image classification model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211140067.6A CN117011567A (en) 2022-09-20 2022-09-20 Method, device, equipment and storage medium for training image classification model

Publications (1)

Publication Number Publication Date
CN117011567A true CN117011567A (en) 2023-11-07

Family

ID=88574979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211140067.6A Pending CN117011567A (en) 2022-09-20 2022-09-20 Method, device, equipment and storage medium for training image classification model

Country Status (1)

Country Link
CN (1) CN117011567A (en)

Similar Documents

Publication Publication Date Title
US20200143248A1 (en) Machine learning model training method and device, and expression image classification method and device
CN109325547A (en) Non-motor vehicle image multi-tag classification method, system, equipment and storage medium
CN107679447A (en) Facial characteristics point detecting method, device and storage medium
CN111666919B (en) Object identification method and device, computer equipment and storage medium
CN111339343A (en) Image retrieval method, device, storage medium and equipment
CN114387647B (en) Anti-disturbance generation method, device and storage medium
CN110222780A (en) Object detecting method, device, equipment and storage medium
CN111695458A (en) Video image frame processing method and device
CN113298152B (en) Model training method, device, terminal equipment and computer readable storage medium
CN112206541B (en) Game plug-in identification method and device, storage medium and computer equipment
CN116226785A (en) Target object recognition method, multi-mode recognition model training method and device
CN113569607A (en) Motion recognition method, motion recognition device, motion recognition equipment and storage medium
CN111126515A (en) Model training method based on artificial intelligence and related device
CN115131604A (en) Multi-label image classification method and device, electronic equipment and storage medium
CN113128526B (en) Image recognition method and device, electronic equipment and computer-readable storage medium
Zhang et al. Application and analysis of image recognition technology based on Artificial Intelligence--machine learning algorithm as an example
CN109376602A (en) A kind of finger vein identification method, device and terminal device
CN117011567A (en) Method, device, equipment and storage medium for training image classification model
CN114373098A (en) Image classification method and device, computer equipment and storage medium
CN116259083A (en) Image quality recognition model determining method and related device
CN116977686A (en) Method, device, equipment and storage medium for training image classification model
CN113569809A (en) Image processing method, device and computer readable storage medium
CN113392686A (en) Video analysis method, device and storage medium
WO2024066927A1 (en) Training method and apparatus for image classification model, and device
CN113095072B (en) Text processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination