CN115482433A

CN115482433A - Small-scale data set amplification method based on learnable data enhancement

Info

Publication number: CN115482433A
Application number: CN202211121663.XA
Authority: CN
Inventors: 郑艳伟; 黄博文; 王鹏; 于东晓; 孙恩涛; 杜超
Original assignee: Shandong University; Shanghai Step Electric Corp
Current assignee: Shandong University; Shanghai Step Electric Corp
Priority date: 2022-09-15
Filing date: 2022-09-15
Publication date: 2022-12-16

Abstract

The invention discloses a small-scale data set amplification method based on learnable data enhancement, which comprises the following steps: performing data enhancement on an original image in the small-scale data set by using a probability p; inputting the image after data enhancement into a discriminator for generating a confrontation network, randomly sampling a normally distributed noise as the input of a generator for generating the confrontation network, generating an image by the generator, inputting the image into the discriminator, and alternately training the generator and the discriminator to optimize a target function; in the training process, continuously learning and updating the probability p and the parameters for generating the countermeasure network; and randomly sampling a plurality of normally distributed noises, and respectively inputting the noises into a generator for generating the confrontation network after training to respectively generate corresponding images so as to realize the amplification of the small-scale data set. The method disclosed by the invention can generate a large amount of high-quality images, realizes the amplification of small-scale data sets, and avoids the condition that the generated images are inconsistent with the original data.

Description

Small-scale data set amplification method based on learnable data enhancement

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a small-scale data set amplification method based on learnable data enhancement.

Background

The most typical method of image augmentation is to generate a countermeasure network, which contains two modules, a generator and a discriminator. The goal of the generator network is to try to generate a true picture to fool the discriminator, i.e. to make it impossible for the discriminator to tell whether an image is true or synthetic. The goal of the discriminator is to separate the picture generated by the generator and the real picture as much as possible, namely to enhance the discrimination capability of the discriminator. By continuously training the generator and the discriminator, the model can finally produce images with high trueness.

But generating a competing network requires a large amount of training data, and for small data sets the network is easily over-fitted, leading to a training crash.

In order to solve the problem of overfitting, the invention designs a series of data enhancement methods to increase the diversity of images, and simultaneously, in order to avoid the generated images from being influenced by enhancement data, a learnable parameter p is used as the probability of various enhancement modes to achieve the aim of leading a network to identify correct source images.

Disclosure of Invention

In order to solve the technical problems, the invention provides a small-scale data set amplification method based on learnable data enhancement, which can generate a large number of high-quality images, realize the amplification of small-scale data sets and avoid the situation that the generated images are inconsistent with the original data.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a method for amplification of small-scale data sets based on reinforcement of learnable data, comprising the steps of:

step one, data enhancement: performing data enhancement on an original image in the small-scale data set by using probability p to obtain an image after data enhancement;

step two, generating confrontation network training: inputting the image after data enhancement into a discriminator for generating a countermeasure network, randomly sampling normally distributed noise as the input of a generator for generating the countermeasure network, generating an image by the generator, inputting the image into the discriminator, judging whether the input image is a real image or a generated image by the discriminator, and alternately training the generator and the discriminator to optimize a target function; in the training process, continuously learning and updating the probability p and parameters for generating the countermeasure network, and repeating the first step and the second step to finish the training for generating the countermeasure network;

step three, amplifying the small-scale data set: and randomly sampling a plurality of normally distributed noises, and respectively inputting the noises into a generator for generating a countermeasure network after training to respectively generate corresponding images so as to realize the amplification of a small-scale data set.

In the above scheme, in the first step, the data enhancement method includes geometric transformation, pixel transformation and image filtering.

In a further technical solution, the geometric transformation includes displacement transformation, equal scaling and unequal scaling.

In a further technical solution, the pixel transformation includes brightness variation, contrast variation, saturation variation, noise addition, and random erasure.

Preferably, the specific method of step one is as follows: and sequentially carrying out displacement transformation, equal scaling, unequal scaling, brightness change, contrast change, saturation change, noise addition, random erasure and image filtering on the original image according to the probability p to obtain an image with enhanced data.

In a further technical solution, the displacement transformation is to perform overall displacement on the image with probability p, and the transformation process is as follows:

t _x ，t _y ～U(-0.1，0.1)

wherein, the probability p is a learnable parameter with an initial value of 0.5; t is t _x ，t _y Respectively representing the times of image width and height scaling; i.e. i ₁₁ U (0, 1), a random number from 0 to 1, for determining whether a shift transform operation is performed; u () represents uniform distribution, translate represents a shift transform operation, and w and h represent the width and height of an image; round () means round; x ₀ Representing an original image; x ₁₁ The image after displacement transformation is obtained;

the equal ratio scaling is to scale the width and height of the image according to the same ratio by probability p, and the transformation process is as follows:

s～U(0.5，2)

wherein i ₁₂ U (0, 1), a random number from 0 to 1, for determining whether an equal ratio scaling operation is performed; scale represents the zoom operation, s is the zoom multiple of the width and height of the image; crop is a Crop operation to change the width and height of the image back to w and h to keep the image size unchanged, X ₁₂ Is an image scaled by an equal ratio;

the non-equal scaling scales the width and height of the image according to different proportions by probability p, and the transformation process is as follows:

s1～U(0.5，2)，s2～U(0.5，2)

wherein i ₁₃ U (0, 1) of 0 to 1Is used for determining whether the non-equal ratio scaling operation is executed, s1 and s2 are the scaling times of the width and the height of the image respectively, X ₁₃ Representing an anisometric scaled image.

In a further technical solution, the luminance change is a change of the luminance of the image with a probability p, and the change process is as follows:

b～U(0.5，1.5)

wherein i ₂₁ U (0, 1) is a random number of 0 to 1 for determining whether a brightness change operation is performed, U () represents a uniform distribution, bright represents a brightness change operation, b is a multiple of the brightness change, X ₁₃ Representing an unequally scaled image, X ₂₁ Representing the image after brightness change;

the contrast change is to change the contrast of the image by a probability p, and the change process is as follows:

c～U(0.7，1.2)

wherein i ₂₂ U (0, 1) is a random number of 0 to 1 for determining whether a Contrast change operation is performed, contrast denotes a Contrast change operation, c is a multiple of the Contrast change, X ₂₂ Representing the image after the contrast is changed;

the saturation change is to change the saturation of the image by a probability p, and the change process is as follows:

s～U(0.6，1.2)

wherein i ₂₃ U (0, 1), a random number from 0 to 1, for determining whether a saturation change operation is performed,saturration denotes the Saturation change operation, s is the multiple of the Saturation change, X ₂₃ Representing the image after saturation change;

the noise addition is to add a random noise to the image by using the probability p, and the change process is as follows:

r，g，b～N(0，1)

wherein i ₂₄ U (0, 1) is a random number from 0 to 1 for determining whether to add noise, (m, n) represents a pixel coordinate on the image, and 0. Ltoreq. M < w, 0. Ltoreq. N < h, w and h represent the width and height of the image, R, G, B are three normally distributed random numbers, corresponding to the three components R, G, B of the pixel, respectively; n () represents a normal distribution; x ₂₄ Representing the image after the noise addition operation;

the random erasing is to randomly select a region in the image with probability p to remove the region, and the change process is as follows:

c _x ，c _y ～U(0.3，0.6)

left＝round((c _x -0.25)×w)

low＝round((c _y -0.25)×h)

right＝round((c _x +0.25)×w)

high＝round((c _y +0.25)×h)

wherein i ₂₅ U (0, 1) is a random number of 0 to 1 for determining whether or not to perform random erasure, (left, high) denotes the coordinates of the upper left vertex of the erasure area, (right, low) denotes the coordinates of the lower right vertex of the erasure area, and round () represents rounding; (c) _x ，c _y ) Coordinate indicating center point of erase region, mask is mask of erase region, an | _ indicates an OR operation, X ₂₅ Representing the image after the random erasure is performed.

In a further technical scheme, the image filtering is to perform filtering operation on an image with probability p, and 4 filters with different sizes are adopted, and the process is as follows:

size∈{(3，3)，(5，5)，(7，7)，(9，9)}

wherein i ₃₁ U (0, 1) is a random number of 0 to 1 for determining whether to perform an image filtering operation, filter represents the image filtering operation, size represents the size of the four filters, X ₂₅ Representing images after random erasure, X ₃₁ Is to filter the enhanced image and this operation randomly selects one of four different filters.

Preferably, in step two, the generation countermeasure network adopts StyleGAN.

In a further technical scheme, in the second step, when the generator is trained, the parameters of the discriminator are fixed, and at this time, the objective function is minimized:

when the arbiter trains, the parameters of the generator are fixed, at which time the objective function is maximized:

wherein V (D, G) is a training function containing probability p and parameters for generating a confrontation network, x is a real image subjected to data enhancement, z is noise subjected to random sampling, D and G respectively represent a discriminator and a generator, D () and G () respectively represent outputs of the discriminator and the generator, and E represents an expected value.

By the technical scheme, the small-scale data set amplification method based on the reinforcement of the learnable data has the following beneficial effects:

(1) According to the method, a series of data enhancement is performed on the images of the small-scale data set, so that the overfitting problem in the process of training a network by using the small-scale data set is solved, a large number of high-quality images are successfully generated, and the purpose of data set amplification is achieved.

(2) According to the method, a learnable parameter p is used as the probability of data enhancement, so that the network can learn the distribution of the original data (without data enhancement) in the training process, and the condition that the generated image is inconsistent with the original data is avoided.

(3) The invention realizes the training of generating the confrontation network by maximizing or minimizing the objective function at different stages of the training, and then generates a large amount of data with higher truth from the small-scale data by utilizing the trained generated confrontation network, thereby realizing the amplification of the small-scale data set.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a schematic flow chart of a small-scale data set amplification method based on learnable data enhancement according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

The invention provides a small-scale data set amplification method based on learnable data enhancement, which comprises the following steps as shown in figure 1:

step one, data enhancement: and performing data enhancement on the original image in the small-scale data set by using the probability p to obtain an image after data enhancement.

The data enhancement method comprises geometric transformation, pixel transformation and image filtering, wherein the geometric transformation comprises displacement transformation, equal ratio scaling and unequal ratio scaling, and the pixel transformation comprises brightness change, contrast change, saturation change, noise addition and random erasure. The sequence of the method is not fixed and can be adjusted.

In this embodiment, a fresh flower data set is used as a training image, and the data set is a small-scale data set. It contains 17 types of flowers, each with 80 images, and flowers in the dataset with obvious pose and light changes. Data enhancement is performed in the following order: and sequentially carrying out displacement transformation, equal ratio scaling, unequal ratio scaling, brightness change, contrast change, saturation change, noise addition, random erasure and image filtering on the original image according to the probability p to obtain an image with enhanced data.

The specific data enhancement process is as follows:

1. transformation of displacement

The displacement transformation is to displace the whole image by the probability p, and the transformation process is as follows:

t _x ，t _y ～U(-0.1，0.1)

wherein, the probability p is a learnable parameter with an initial value of 0.5; t is t _x ，t _y Respectively representing the times of image width and height scaling; i.e. i ₁₁ U (0, 1), a random number from 0 to 1, for determining whether a shift transform operation is performed; u () represents uniform distribution, translate represents a shift transform operation, and w and h represent the width and height of an image; round () represents rounding; x ₀ Representing an original image; x ₁₁ The transformed image is displaced.

2. Geometric scaling

Scaling is to scale the width and height of an image by the same ratio with probability p, and the transformation process is as follows:

s～U(0.5，2)

wherein i ₁₂ U (0, 1), a random number from 0 to 1, for determining whether an equal ratio scaling operation is performed; scale represents the zoom operation, s is the zoom multiple of the width and height of the image; crop is a Crop operation to change the width and height of the image back to w and h to keep the image size unchanged, X ₁₂ Is an image scaled by equal ratio.

3. Non-equal ratio scaling

Non-equal scaling scales the width and height of an image differently with probability p, and the transformation process is as follows:

s1～U(0.5，2)，s2～U(0.5，2)

wherein i ₁₃ U (0, 1), a random number from 0 to 1, for determining whether a non-equal ratio scaling operation is performed, s1, s2 are the scaling factors of image width and height, X, respectively ₁₃ Representing an anisometric scaled image.

4. Variation of brightness

The brightness change is to change the brightness of the image with probability p, and the change process is as follows:

b～U(0.5，1.5)

wherein i ₂₁ U (0, 1), which is a random number from 0 to 1, for determining whether a brightness change operation is performed, U () denotes a uniform distribution, bright denotes a brightness change operation, b is a multiple of the brightness change, X ₁₃ Representing an unequally scaled image, X ₂₁ Representing the image after the brightness change.

5. Contrast variation

The contrast change is to change the contrast of the image with a probability p, and the change process is as follows:

c～U(0.7，1.2)

wherein i ₂₂ U (0, 1) is a random number of 0 to 1 for determining whether a Contrast change operation is performed, contrast denotes a Contrast change operation, c is a multiple of the Contrast change, X ₂₂ Representing the image after the contrast change.

6. Change in saturation

s～U(0.6，1.2)

wherein i ₂₃ U (0, 1) is a random number from 0 to 1 for determining whether a Saturation change operation is performed, saturration denotes the Saturation change operation, s is a multiple of the Saturation change, X ₂₃ Representing the image after saturation change.

7. Noise addition

r，g，b～N(0，1)

wherein i ₂₄ U (0, 1), a random number from 0 to 1, for determining whether to add noise, (m, n) represents a pixel coordinate on the image, and 0. Ltoreq. M < w, 0. Ltoreq. N < h, w and h represent the width and height of the image, R, G, B are three normally distributed random numbers, corresponding to the three components R, G, B of the pixel, respectively; n () represents a normal distribution; x ₂₄ Representing the image after the noise addition operation.

8. Random erase

Random erasure is to remove a region in the image by randomly selecting it with a probability p, and the process of variation is as follows:

c _x ，c _y ～U(0.3，0.6)

left＝round((c _x -0.25)×w)

low＝round((c _y -0.25)×h)

right＝round((c _x +0.25)×w)

high＝round((c _y +0.25)×h)

wherein i ₂₅ U (0, 1) is a random number of 0 to 1 for determining whether or not random erasure is performed, (left, high) denotes the coordinates of the upper left vertex of the erasure area, (right, low) denotes the coordinates of the lower right vertex of the erasure area, and round () denotes rounding; (c) _x ，c _y ) Coordinate indicating center point of erase region, mask is mask of erase region, an | _ indicates an OR operation, X ₂₅ Representing the image after the random erasure is performed.

9. Image filtering

The image filtering is to perform filtering operation on an image with a probability p, and 4 filters with different sizes are adopted, and the process is as follows:

size∈{(3，3)，(5，5)，(7，7)，(9，9)}

wherein i ₃₁ U (0, 1), a random number from 0 to 1, for determining whether to perform an image filtering operation, filter representing an image filtering operation, and size representing four kinds of filtersSize of wave filter, X ₂₅ Representing images after random erasure, X ₃₁ Is filtering the enhanced image. This operation randomly selects one of four different filters.

Step two, generating confrontation network training: inputting the image after data enhancement into a discriminator for generating a confrontation network, randomly sampling a normally distributed noise as the input of a generator for generating the confrontation network, generating an image by the generator, inputting the image into the discriminator, judging whether the input image is a real image or a generated image by the discriminator, and alternately training the generator and the discriminator to optimize a target function; in the training process, continuously learning and updating the probability p and the parameters for generating the confrontation network, and repeating the first step and the second step to finish the training for generating the confrontation network;

in this embodiment, the generation countermeasure network adopts StyleGAN, which adopts a progressively growing network as a generator and a discriminator, that is, a shallow layer of the network generates an image with a lower resolution, and as the network is continuously deepened, the resolution of the generated image gradually increases, and finally a composite image with a higher resolution is obtained. The specific process is as follows:

(1) Image X to be data enhanced ₃₁ Put into a discriminator D, the discriminator judges the class of the image to be a real image or a composite image, namely, class = D (X) ₃₁ )；

(2) Randomly sampling a noise z as input to a generator G, placing it in the generator, and generating an image

Namely, it is

(3) The generated image is input to a discriminator to determine whether the image is a real image or a composite image, i.e.

And (3) repeating the steps (1) to (3) continuously, so that the generator generates an image similar to the real image as much as possible, and the discriminator discriminates whether the input image is synthesized or real as much as possible, and finally the discriminator cannot discriminate the source of the image.

The overall training target is as follows:

during generator training, the parameters of the arbiter are fixed, and the objective function is minimized:

The learnable probability p is updated in the training stage of the discriminator, during initial training, due to the fact that training data are too small, overfitting of the discriminator is serious, the discrimination capability of the discriminator is strong, no matter how the generator is optimized, the discriminator can distinguish the truth of an image, and therefore the generator behind cannot be trained effectively, and therefore the probability p can be increased continuously to enhance the diversity of the training data. At the later stage of model training, p will start to decrease slowly because p is too large, which causes the image generated by the generator to be closer to the data-enhanced image than the original image. After the model training is finished, the value of p is between 0.8 and 0.9.

The method comprises the following specific steps:

(1) Loading the generator G trained in the previous step;

(2) A noise z is randomly sampled from a normal distribution and input to the generator G to generate the corresponding Image.

Image＝G(z)

A plurality of noises are randomly sampled, and a plurality of corresponding images can be respectively generated, so that the amplification of the images is realized.

In this embodiment, through the above process, 10000 images are finally generated by using the method, and the amplification of the data set is realized.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for amplification of a small data set based on reinforcement of learnable data, comprising the steps of:

step two, generating confrontation network training: inputting the image after data enhancement into a discriminator for generating a countermeasure network, randomly sampling normally distributed noise as the input of a generator for generating the countermeasure network, generating an image by the generator, inputting the image into the discriminator, judging whether the input image is a real image or a generated image by the discriminator, and alternately training the generator and the discriminator to optimize a target function; in the training process, continuously learning and updating the probability p and the parameters for generating the confrontation network, and repeating the first step and the second step to finish the training for generating the confrontation network;

2. The method for amplifying the small-scale data set based on the learnable data enhancement of claim 1, wherein in the first step, the data enhancement method includes geometric transformation, pixel transformation and image filtering.

3. The method of claim 2, wherein the geometric transformation comprises a shift transformation, an equal scaling and an unequal scaling.

4. The method of claim 2, wherein the pixel transformation comprises brightness variation, contrast variation, saturation variation, noise addition, and random erasure.

5. The method for amplifying the small-scale data set based on the learnable data enhancement, according to the claim 1, is characterized in that the specific method of the first step is as follows: and sequentially carrying out displacement transformation, equal ratio scaling, unequal ratio scaling, brightness change, contrast change, saturation change, noise addition, random erasure and image filtering on the original image according to the probability p to obtain an image with enhanced data.

6. The method for small-scale data set amplification based on learnable data enhancement of claim 5, wherein the displacement transformation is a whole displacement of the image with probability p, and the transformation process is as follows:

t _x ,t _y ～U(-0.1,0.1)

wherein, the probability p is a learnable parameter with an initial value of 0.5; t is t _x ,t _y Representing the multiple of image width and height scaling respectively; i.e. i ₁₁ U (0, 1), a random number from 0 to 1, for determining whether a shift transform operation is performed; u () denotes uniform distribution, translate denotes a shift transform operation, and w and h denote the width and height of an image; round () represents rounding; x ₀ Representing an original image; x ₁₁ The image after displacement transformation is obtained;

s～U(0.5,2)

wherein i ₁₂ U (0, 1), a random number from 0 to 1, for determining whether an equal ratio scaling operation is performed; scale represents the zoom operation, s is the zoom multiple of the width and height of the image; crop is a Crop operation to change the width and height of the image back to w and h to keep the image size unchanged, X ₁₂ Is an image after geometric scaling;

the non-equal scaling scales the width and height of the image according to different proportions by a probability p, and the transformation process is as follows:

s1～U(0.5,2)，s2～U(0.5,2)

wherein i ₁₃ U (0, 1), a random number from 0 to 1, for determining whether a non-equal ratio scaling operation is performed, s1, s2 being the scaling of the image width and height, respectivelyMultiple, X ₁₃ Representing an anisometric scaled image.

7. The method for amplifying the small-scale data set based on the learnable data enhancement, according to the claim 5, wherein the brightness variation is the variation of the brightness of the image with probability p, and the variation process is as follows:

b～U(0.5,1.5)

c～U(0.7,1.2)

s～U(0.6,1.2)

wherein i ₂₃ U (0, 1), is a random number from 0 to 1,for determining whether a Saturation change operation is performed, saturration denotes the Saturation change operation, s is a multiple of the Saturation change, X ₂₃ Representing the image after saturation change;

r,g,b～N(0,1)

wherein i ₂₄ U (0, 1) is a random number of 0 to 1 for determining whether or not to add noise, and (m, n) represents a pixel coordinate on an image, and 0 ≦ m<w,0≤n<h, w and h represent the width and height of the image, and R, G and B are three normally distributed random numbers which respectively correspond to three components R, G and B of the pixel; n () represents a normal distribution; x ₂₄ Representing the image after the noise addition operation;

c _x ,c _y ～U(0.3,0.6)

left＝round((c _x -0.25)×w)

low＝round((c _y -0.25)×h)

right＝round((c _x +0.25)×w)

high＝round((c _y +0.25)×h)

wherein i ₂₅ U (0, 1), a random number from 0 to 1, is used to determine whether to perform random erasure, (left, high) denotes the coordinates of the top left vertex of the erasure area, (right)Low) represents the coordinates of the lower right vertex of the erased area, and round () represents rounding; (c) _x ,c _y ) Coordinate indicating center point of erase region, mask is mask of erase region, an | _ indicates an OR operation, X ₂₅ Representing the image after the random erasure is performed.

8. The method for amplifying the small-scale data set based on the learnable data enhancement is characterized in that the image filtering is to perform filtering operation on the image with probability p, and 4 filters with different sizes are adopted, and the process is as follows:

size∈{(3,3),(5,5),(7,7),(9,9)}

wherein i ₃₁ U (0, 1), a random number from 0 to 1, for determining whether to perform an image filtering operation, filter representing the image filtering operation, size representing the size of the four filters, X ₂₅ Representing images after random erasure, X ₃₁ Is to filter the enhanced image and this operation randomly selects one of four different filters.

9. The method for small-scale data set amplification based on learnable data enhancement of claim 1, wherein in the second step, the generation countermeasure network employs StyleGAN.

10. The method for amplifying the small-scale data set based on the learnable data enhancement as set forth in claim 1 or 9, wherein in the second step, the parameters of the discriminator are fixed during the training of the generator, and the objective function is minimized: