CN112613575B

CN112613575B - Data set expansion method, training method and device of image classification model

Info

Publication number: CN112613575B
Application number: CN202011612335.0A
Authority: CN
Inventors: 黄高; 王朝飞; 宋士吉
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2024-02-09
Anticipated expiration: 2040-12-30
Also published as: CN112613575A

Abstract

The embodiment of the application discloses a data set expansion and training method and device of an image classification model, wherein the image classification model is realized based on a convolutional neural network, and the data set expansion method comprises the following steps: for each of at least some of the picture samples in the training dataset of the image classification model, performing the following operations, respectively: acquiring a class activation map CAM of the picture sample corresponding to a preset class; acquiring an area corresponding to a preset target from the CAM by adopting a preset algorithm, and determining the position coordinates of the area in the picture sample; obtaining a cut picture from the picture sample by utilizing the position coordinates; and marking the cut picture as the same category as the picture sample, and then storing the picture as the picture sample into the training data set. Through the scheme disclosed by the invention, an effective training data set sample can be added in the image classification model training process.

Description

Data set expansion method, training method and device of image classification model

Technical Field

The embodiment of the application relates to the field of image classification, but is not limited to, in particular to a data set expansion method, a training method and a training device of an image classification model.

Background

In recent years, a deep neural network has become the most important tool in the fields of computer vision, such as image classification, object detection, face recognition and the like, whether in the theoretical research level or the practical application level. However, training of deep neural networks generally requires a large amount of training data to obtain a desired result, and obtaining labeling data often requires a large cost. The sample enhancement or data expansion (Data Augmentation) method is a method for automatically generating training samples under the condition that manual labeling is not needed, so that the diversity of the training samples can be effectively improved, the robustness of a model is improved, and overfitting is avoided. Typical data enhancement methods are Flip (Flip), rotate (Rotate), scale, random clipping or zero padding (Random crop or pad), color dithering (Color jitter), noise (Noise), and the like.

Random clipping is used as the most commonly used image clipping strategy, and has a good effect on improving the performance of the neural network. However, random cropping tends to result in incomplete or even missing target information, and a large retention of non-target information, and the resulting samples tend to react, which is not apparent on large datasets, but can negatively impact normal target recognition on small datasets or small target datasets (where the target is less in the image).

Disclosure of Invention

The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

The embodiment of the disclosure discloses a data set expansion method, a training method and a device for an image classification model, which can increase effective training data set samples by performing target cutting on a CAM (computer aided manufacturing) graph.

The present disclosure provides a data set expansion method of an image classification model, the image classification model is implemented based on a convolutional neural network, the data set expansion method includes:

for each of at least some of the picture samples in the training dataset of the image classification model, performing the following operations, respectively:

acquiring a class activation map CAM of the picture sample corresponding to a preset class;

acquiring an area corresponding to a preset target from the CAM by adopting a preset algorithm, and determining the position coordinates of the area in the picture sample;

obtaining a cut picture from the picture sample by utilizing the position coordinates;

and marking the cut picture as the same category as the picture sample, and then storing the picture as the picture sample into the training data set.

In an exemplary embodiment, the obtaining the class activation map CAM of the picture sample corresponding to the preset class includes:

acquiring a feature map output by the picture sample at the last layer of convolution layer of the image classification model;

and respectively carrying out weighted summation on each pixel position point according to the weighting coefficients corresponding to the preset categories in all the feature images to obtain CAM images corresponding to the preset categories.

In an exemplary embodiment, the acquiring, by using a preset algorithm, an area corresponding to a preset target from the CAM map includes:

determining an area exceeding a preset threshold value in the CAM by using a preset method;

and taking the area as an area corresponding to a preset target.

In an exemplary embodiment, the preset method includes a threshold selection method and a maximum connected domain method.

In an exemplary embodiment, the labeling the cropped picture into the same category as the picture sample and then saving the same into the training data set includes:

marking the cut picture into the same category as the picture sample, and then storing the picture as the picture sample into an initial training data set;

in an exemplary embodiment, the labeling the cropped picture into the same category as the picture sample and saving the same into the training data set further includes:

and marking the cut picture into the same category as the picture sample, and then storing the cut picture into other training data sets except the initial training data set.

The present disclosure also provides a data set augmentation system of an image classification model, comprising: the device comprises a data input module, a thermodynamic diagram calculation module, an image clipping module and a data set reconstruction module; wherein,

the data input module is used for acquiring a class activation map CAM corresponding to a preset class of each picture sample in at least part of picture samples in a training data set of the image classification model;

the thermodynamic diagram calculation module is used for acquiring a region corresponding to a preset target from the CAM by adopting a preset algorithm, and determining the position coordinates of the region in the picture sample;

the image clipping module is used for obtaining a clipped picture from the picture sample by utilizing the position coordinates;

the data set reconstruction module is used for marking the cut picture into the same category as the picture sample and then storing the picture as the picture sample into the training data set.

The disclosure also provides an image classification model training method, comprising:

expanding the original data set of the image classification model according to the data set expansion method of the image classification model in the embodiment to obtain an expanded image data set;

training the image classification model by using the expanded image data set.

The present disclosure also provides an image classification model training apparatus, comprising: the data set expansion module and the training module;

the data set expansion module is used for expanding the original data set of the image classification model according to the data set expansion method of the image classification model in the embodiment, so as to obtain an expanded image data set;

the training module is used for training the image classification model by adopting the expanded image data set, and comprises an independent training model and a combined training module.

The present disclosure also provides an apparatus comprising a memory and a processor; the memory is used for storing a data set expansion or image classification model training program for an image classification model, and the processor is used for reading and executing the data set expansion or image classification model training program for the image classification model, and executing the data set expansion method of the image classification model in any one of the above embodiments or the image classification model training method in the above embodiments.

The present disclosure also provides a storage medium having stored therein a dataset augmentation for an image classification model or an image classification model training program, the program being arranged to perform the method of any of the above embodiments or the image classification model training method of the above embodiments at run-time.

The embodiment of the disclosure discloses a data set expansion and training method and device of an image classification model, wherein the image classification model is realized based on a convolutional neural network, and the data set expansion method comprises the following steps: for each of at least some of the picture samples in the training dataset of the image classification model, performing the following operations, respectively: acquiring a class activation map CAM of the picture sample corresponding to a preset class; acquiring an area corresponding to a preset target from the CAM by adopting a preset algorithm, and determining the position coordinates of the area in the picture sample; obtaining a cut picture from the picture sample by utilizing the position coordinates; and marking the cut picture as the same category as the picture sample, and then storing the picture as the picture sample into the training data set. Through the scheme disclosed by the invention, an effective training data set sample can be added in the image classification model training process.

Other aspects will become apparent upon reading and understanding the accompanying drawings and detailed description.

Drawings

FIG. 1 is a flowchart of a method for extending a data set of an image classification model according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of dataset augmentation of an image classification model in some exemplary embodiments;

FIG. 3 is a schematic diagram of independent training functions and joint training in some example embodiments;

FIG. 4 is a schematic diagram of a dataset expansion system architecture of an image classification model in some example embodiments;

fig. 5 is a schematic view of an apparatus according to an embodiment of the present application.

Detailed Description

Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments of the present application may be arbitrarily combined with each other.

The steps illustrated in the flowchart of the figures may be performed in a computer system, such as a set of computer-executable instructions. Also, while a logical order is depicted in the flowchart, in some cases, the steps depicted or described may be performed in a different order than presented herein.

In some techniques, a thermodynamic diagram (Class Activation Map, also known as a class activation diagram, abbreviated CAM) may show the most discriminative core region for identifying a particular class, and the CAM diagram for that class for that picture may be calculated by calculating a weighted sum of the last convolutional layer output feature diagram. There are a number of thermodynamic diagram calculation methods, including: grad-CAM, grad-CAM++, etc. Although thermodynamic diagrams are not the same, the goal is to find areas that contribute significantly to a particular class, or to find the complete particular goal. The inventor finds that, as the training process of the neural network continues, thermodynamic CAMs of the corresponding class of training images change, but always maintain two characteristics: firstly, the training process gradually concentrates on the target position, and secondly, the training process always highlights the characteristics of the target category in the image and weakens the characteristics of the non-target category. Therefore, the thermodynamic diagram is utilized to guide the image clipping process to replace the current random clipping strategy, and the problems of massive non-target information retention, incomplete target information or missing caused by random clipping can be effectively overcome.

The embodiment of the disclosure provides a data set expansion method of an image classification model, wherein the image classification model is realized based on a convolutional neural network, and as shown in fig. 1, the data set expansion method comprises the following steps:

s100, acquiring a class activation diagram CAM of a preset class corresponding to the picture sample;

s200, acquiring an area corresponding to a preset target from the CAM by adopting a preset algorithm, and determining the position coordinates of the area in the picture sample;

s300, obtaining a cut picture from the picture sample by utilizing the position coordinates;

s400, marking the cut picture as the same category as the picture sample, and storing the picture as the picture sample into the training data set.

In this embodiment, the class activation map CAM is a thermodynamic map (Class Activation Map, also called class activation map, abbreviated CAM) including rich space information. The CAM may present the most discriminative core region for identifying a particular class. Acquisition class activation map CAM may employGrad-CAM, grad-CAM++, and the like.

In this embodiment, the obtaining the class activation map CAM of the preset class corresponding to the picture sample includes: acquiring a feature map output by the picture sample at the last layer of convolution layer of the image classification model; and respectively carrying out weighted summation on each pixel position point according to the weighting coefficients corresponding to the preset categories in all the feature images to obtain CAM images corresponding to the preset categories. For example: first, let the last convolutional layer of deep neural network output n feature maps (feature maps), f _k (x, y) represents the value corresponding to the (x, y) coordinate in the kth feature map, the size of each feature map is fixed to be high H and wide W, and after global average pooling (global average pooling), the corresponding value of the feature map isThrough training and learning, aiming at the target class c, the corresponding weight of the value is +.>The weight corresponding to the target category is obtained through deep learning network training. Secondly, calculating a CAM diagram under the category c; the CAM diagram under category c for the input picture may be represented as CAM _c Wherein, the value of each pixel coordinate point can be expressed as: />

In this embodiment, a preset algorithm is adopted to obtain an area corresponding to a preset target from the CAM map, and determine a position coordinate of the area in the picture sample, including: determining the area exceeding a preset threshold value in the CAM by using a preset method; and taking the area as an area corresponding to a preset target.

In this embodiment, the preset method includes a threshold selection method and a maximum connected domain method.

In this embodiment, obtaining the cropped picture from the picture sample by using the position coordinates includes: and determining a cutting frame by using the position coordinates, and cutting the original picture at the corresponding position by using the cutting frame to obtain a cut picture.

In this embodiment, labeling the cropped picture into the same category as the picture sample and saving the same in the training data set includes:

in the first case, the cut picture is marked into the same category as the picture sample and then is stored as the picture sample into an initial training data set; wherein the initial training data set is the original training data set before the data set expansion method is performed; the first case is to directly merge the original training dataset and the cropped dataset.

In the second case, after labeling the cut picture as the same category as the picture sample, storing the picture in other training data sets except the initial training data set; i.e. the original data set and the cut data set are used separately and stored separately. The other training data set may be an independent training data set or any other training data set.

In different situations where the stores are separate and combined, different sets of training data may be provided for use with different classification models, for example: when the original data set and the clipping data set are directly combined, a single classification model can be directly adopted for training; when the original dataset and the cropped dataset are used alone, multiple independent classification models may be employed for joint training (meaning that the outputs of the multiple models are jointly trained).

The implementation of the above embodiment is described below by way of an example.

The present example presents a data set extension method of an image classification model. For at least part of the picture samples in the training data set of the image classification model, as shown in fig. 2, the following operations are performed respectively:

step 1, inputting an original training image;

step 2, calculating a CAM (CAM) diagram of the original training image;

step 3, obtaining a region corresponding to a preset target from the CAM to obtain a cutting frame;

in the step, a threshold selection method or a maximum connected domain method is adopted to obtain a region which is most important for a target task from a thermodynamic diagram, and the coordinate position of the region in an image is obtained; and forming a rectangular cutting frame by using the coordinate positions.

And 4, cutting the original training image by using a cutting frame.

And 5, obtaining the cut image.

And 6, classifying the cut image into the category corresponding to the original image.

In this example, along with the sample training process, each original training image may obtain a series of cropped images based on thermodynamic diagram changes, and the cropped images may also be used as training images to continue to participate in cropping; based on the image cut by the thermodynamic diagram, the training image data set is increased, meanwhile, the obtained cut image is aimed at a target area, the accuracy of image classification can be effectively improved, and the effect is remarkable particularly under the condition that the training data set is smaller.

The embodiment of the disclosure also provides a data set expansion system of the image classification model, which comprises: the device comprises a data input module, a thermodynamic diagram calculation module, an image clipping module and a data set reconstruction module; the data input module is used for acquiring a class activation map CAM corresponding to a preset class of each picture sample in at least part of picture samples in a training data set of the image classification model;

The following illustrates the dataset augmentation of the image classification model described above with one example, as shown in fig. 3, the dataset augmentation system of the image classification model comprises the following modules: the device comprises a data input module, a thermodynamic diagram calculation module, an image clipping module, a data set reconstruction module, a training module and a test module;

the input module is internally provided with a data preprocessing function, and can preprocess images of an original data set, wherein the preprocessing comprises the steps of size consistency, sequence of a shuffle image, mini-batch segmentation size and the like.

The thermodynamic diagram calculation module is internally provided with algorithms such as CAM, grad-CAM and the like, and can select specific time to calculate thermodynamic diagrams under specific categories corresponding to the input images.

The image clipping module is internally provided with clipping algorithms, comprises a threshold method and a maximum connected domain method, can calculate the position coordinates of the thermodynamic diagram, and clips the input image through the coordinate information.

And the data set reconstruction module is internally provided with a data set construction algorithm, and can independently establish a data set for the cut image or combine the cut image with the original data set. The module is internally provided with a data set preprocessing function, and can process corresponding data sets according to the next training requirement.

The training module is internally provided with a plurality of conventional image classification models, including ResNet18, resNet50 and ResNet101; denseNet110, etc., built-in with independent training functions and joint training functions. As shown in fig. 4, the implementation of the independent training function and the joint training function; independent training refers to individual model selection of individual data sets (or combined data sets) for training; the joint training refers to the joint training of the output ends of the multiple models, and the joint training ensures that the input of the multiple models has consistency, namely the multiple models must be identical images or clipping images corresponding to the identical images.

And the test module is used for testing according to the model obtained by training, inputting a test image, and selecting the trained model to obtain a test result.

The effect of the data set expansion method of the above-described image classification model is described below with an example.

In the example, taking a fine-grained image classification reference data set Cub bins as an example, two small sample fine-grained classification strategies are mainly adopted, namely, for 5 new classes, only 1 picture is needed in each class in a training set; secondly, 5-way 5-shot, namely, for 5 new classes, only 5 pictures are in each class of the training set, and 15 images are selected as test images in each class of 5-way 1-shot or 5-way 5-shot during test. The benchmark method is characterized in that ResNet18 is selected as a backbone neural network model by the baseline, a cosine classifier is adopted as a classifier, and random clipping is adopted as a sample enhancement strategy.

The experimental setup Baseline is the same as the data set expansion method in this example, and the parameters include: the batch Size is a part of data trained in the network every time, and the batch Size is the number of training samples in each batch; in this example, the batch size is selected to be 32, the iteration number is 100, the optimizer selects Adam, the coding tool is Pytorch, and the model training adopts a Titan Xp display card. The data set expansion method (DAcam) only selects thermodynamic diagrams with epochs of 20, 40, 60, 80 and 100 to cut; the epoch refers to the number of processes that all data is sent to the network to complete one forward calculation and one backward propagation. According to the different use and training modes of the data set, DAcam is divided into three methods: the comparison experiment between three methods and Baseline is performed by using the sample after thermodynamic diagram clipping as the DAcam-1 for training data set alone, using the combined data set as the DAcam-2 for training alone, and using the DAcam-3 for training with the original data set and the clipping data set together (see FIG. 4, which ensures that the input of the two networks has consistency during the combined training, i.e. is the same image or the clipping image corresponding to the same image), and the results are shown in Table 1.

Table 1: experimental result of the data set expansion method on the Cub data set

As can be seen from table 1, the data set expansion method (DAcam) in this embodiment greatly improves Baseline on both the 5-way 1-shot and the 5-way 5-shot; the three methods gave the best results (DAcam-3) with joint training on the original dataset and the cropped dataset. As a sample enhancement method, the DAcam method can have wider application prospect.

In the example, the thermodynamic diagram is utilized to guide the image clipping process, so that the problems of massive non-target information retention, incomplete target information or missing caused by random clipping can be effectively solved.

The embodiment of the disclosure also provides an image classification model training method, which comprises the following steps:

training the image classification model by using the expanded image data set.

According to the method, the original dataset of the image classification model is expanded through the dataset expansion method of the image classification model, the expanded image dataset is obtained, the image classification model is trained by adopting the expanded image dataset, the accuracy of image classification can be effectively improved, and the effect is remarkable especially under the condition that the training dataset is smaller.

The present disclosure also provides an apparatus, as shown in fig. 5, comprising a memory and a processor; the memory is used for storing a data set expansion or image classification model training program for an image classification model, and the processor is used for reading and executing the data set expansion or image classification model training program for the image classification model, and executing the data set expansion method of the image classification model in any one of the above embodiments or the image classification model training method in the above embodiments.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

Claims

1. A data set expansion method of an image classification model, the image classification model being implemented based on a convolutional neural network, the data set expansion method comprising:

marking the cut picture into the same category as the picture sample, and then storing the picture as the picture sample into the training data set;

the acquiring, by using a preset algorithm, an area corresponding to a preset target from the CAM map includes: determining an area exceeding a preset threshold value in the CAM by using a preset method; taking the area as an area corresponding to a preset target; the preset method comprises a threshold selection method and a maximum connected domain method.

2. The method for extending a data set of an image classification model according to claim 1, wherein the obtaining the class activation map CAM of the picture sample corresponding to the preset class includes:

3. The method for extending a data set of an image classification model according to claim 1, wherein the labeling the cropped picture into the same category as the picture sample and then storing the same in the training data set comprises:

and marking the cut picture as the same category as the picture sample, and then storing the picture as the picture sample into an initial training data set.

4. A method for extending a data set of an image classification model according to claim 3, wherein the labeling the cropped picture into the same category as the picture sample and saving the same into the training data set further comprises:

5. A method of training an image classification model, comprising:

expanding the original data set of the image classification model according to the data set expansion method of the image classification model as in claims 1-4 to obtain an expanded image data set;

training the image classification model by using the expanded image data set.

6. A training apparatus for an image classification model, comprising: the data set expansion module and the training module;

the data set expansion module is used for expanding the original data set of the image classification model according to the data set expansion method of the image classification model as claimed in claims 1-4 to obtain an expanded image data set;

the training module is used for training the image classification model by adopting the expanded image data set; the training module includes a single training model and a joint training module.

7. An apparatus comprising a memory and a processor; the method is characterized in that the memory is used for storing a data set expansion or image classification model training program for an image classification model, and the processor is used for reading and executing the data set expansion or image classification model training program for the image classification model and executing the data set expansion method for the image classification model or the image classification model training method according to any one of claims 1 to 4.

8. A storage medium, wherein a dataset expansion or image classification model training program for an image classification model is stored in the storage medium, the program being arranged to perform the dataset expansion method of an image classification model according to any of claims 1-4 or the image classification model training method according to claim 5 at run-time.