CN110852352A

CN110852352A - Data enhancement method for training deep neural network model for target detection

Info

Publication number: CN110852352A
Application number: CN201911007178.8A
Authority: CN
Inventors: 田大湧; 周德云; 谢照
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2019-10-22
Filing date: 2019-10-22
Publication date: 2020-02-28
Anticipated expiration: 2039-10-22
Also published as: CN110852352B

Abstract

The invention discloses a data enhancement method for training a target detection deep neural network model, which enhances data by adding three elements of three-dimensional change, lens distortion and illumination change into the traditional data enhancement method, so that a data set contains more pictures, and the enhanced new data is used for training the deep neural network to have higher precision in target detection.

Description

Data enhancement method for training deep neural network model for target detection

Technical Field

The invention belongs to the field of target detection, and particularly relates to a data enhancement method for training a target detection deep neural network model.

Technical Field

Data enhancement is an important means for improving the training effect of the deep neural network, and the essence of the data enhancement is to generate data similar to training data so as to solve the problem of insufficient training data; object detection is the detection of objects of interest from an image or a video segment, such as the detection of pedestrians, vehicles and traffic management signs in the video receiver of an unmanned vehicle; at present, two data enhancement methods for a target detection deep neural network model in the market are provided, namely a traditional enhancement method and a network generation-based method;

the conventional enhancement method is to directly carry out transformations such as rotation, scaling and noise addition on existing data in a training set to generate new data, the transformation process of the data is commonly called affine transformation, besides affine transformation, another common new data generation method is a countermeasure generation network, the image enhancement effects of the affine transformation and the countermeasure generation network are compared and found aiming at the image classification problem, the affine transformation effect is close to that of the countermeasure generation network, but the calculation amount of the affine transformation is much smaller;

however, in the prior art, the widely adopted traditional data enhancement method only adopts two-dimensional changes of images/videos to obtain new data, the obtained new data has errors compared with real data, and the target detection precision is not high by using a neural network trained by the new data.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a data enhancement method for training a target detection deep neural network model, which can enable the generated new data to be closer to real data by adding three elements of three-dimensional change, lens distortion and illumination change in the traditional enhancement method, and has higher target detection precision by using the neural network trained by the new data.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

a data enhancement method for training a target detection deep neural network model comprises the following specific calculation steps:

the method comprises the following steps: reading images in the training set, and performing normalization processing on read image pixels;

step two: extracting all marked targets from the existing training data in the image read in the step one, namely, manually marking partial data in the training data and intercepting all marked data;

step three: randomly zooming the extracted target, namely generating a random number positioned in an interval [ a, b ] by using a computer as a zooming multiple, zooming the extracted target by utilizing a cubic spline interpolation method, and generating a zoomed target;

step four: performing three-dimensional random rotation on the zoomed target, namely generating three located intervals by using a computer

Random number of (theta)_x，θ_yAnd theta_zThe rotation angles of the target around the three axes X, Y and Z are respectively represented, and if the original coordinates of the point on the target in the three-dimensional space are (X, Y and Z) and the coordinates after rotation around the coordinate axes are (X ', Y' and Z '), then (X, Y, Z) and (X', Y ', Z') satisfy the following relations:

step five: mapping the three-dimensionally rotated target to an XY plane where the image is located, namely assigning a pixel value at the coordinate of the original target (x, y) to a pixel point of a new target round (x ', y');

step six: simulating illumination change by adopting a simplified model, and processing the mapped target to obtain new data added with the illumination change;

step seven: adding lens distortion, wherein the specific distortion process is to set the target coordinates after the fifth processing as (x, y) and the coordinates after the radial distortion as (x ', y'), and then tangentially distortion is carried out on the coordinates (x ', y') to obtain (x ', y') so as to generate a final new target;

step eight: and randomly putting the generated new target into the background of the images in the training set.

Preferably, the reading of the images in the training set and the normalization of the images in the step one specifically include:

s1, reading an image in a training set, wherein the pixel value range of the image is [0,225], the image has three channels of R, Y and B, and the image is regarded as a three-dimensional matrix of W multiplied by H multiplied by 3;

s2, carrying out normalization processing on the image pixel value, namely dividing the image pixel value by 255 to change the value range of the pixel value into [0,1 ];

wherein: w is the image width and H is the image height.

3. The method for enhancing data of target detection deep neural network model training as claimed in claim 1, wherein the specific process of transforming the three-dimensional original coordinates of the points on the target in step four from (x, y, z) around the coordinate axis to (x ', y ', z ') is as follows: and (3) rotating the point (X, Y, Z) on the target around the X axis, then rotating around the Y axis, and finally rotating around the Z axis to obtain a new coordinate point (X ', Y ', Z ').

Preferably, the specific processing procedure of adding the new data obtained by change in the step six is as follows:

s1, randomly generating four numbers mu by a computer₁,μ₂∈(a_μ,b_μ),σ₁,σ₂∈(a_σ,b_σ) Respectively as the mean and standard deviation of the two-dimensional normal distribution probability density function, wherein the two-dimensional normal distribution probability density function is as follows:

s2, randomly generating four numbers x by a computer_min,x_max∈(0,w'),y_min,y_max∈(0,h')；

Wherein: w 'and h' are the maximum width and the maximum height of the newly generated target matrix;

s3, dividing the interval [ x ]_min,x_max]And [ y_min,y_max]Discretizing into a set { x_min,x_min+1,x_min+2,···,x_maxAnd { y }_min,y_min+1,y_min+2,···,y_maxIs given by [ x ]_min,x_max]Total M elements, [ y ]_min,y_max]A total of N elements;

s4, sampling two-dimensional normal distribution probability density functions on the two sets to obtain an MXN two-dimensional matrix S; s m th row and n th column of f (x)_min+m,y_min+n)，

Wherein: f is a two-dimensional normal distribution probability density function;

s5, the newly generated target is also an image and is provided with three channels of R, G and B, the pixel value of each channel is a matrix, and the dot product of the matrix formed by the pixel values of each channel and the S is respectively calculated, namely the multiplication of the corresponding position elements of the matrix is carried out, so that the new target added with the illumination change can be obtained.

Preferably, the relationship between the target coordinates (x, y) and the coordinates (x ', y') after radial distortion in step seven satisfies:

x'＝x(1+k₁r²+k₂r⁴+k₃r⁶)

y'＝y(1+k₁r²+k₂r⁴+k₃r⁶)

wherein: r is the distance from a point to the center of the target, k₁,k₂And k₃Is a preset parameter and can be adjusted according to the equipment for acquiring the image.

Preferably, the relationship between the coordinates (x ', y') of the radial distortion and the coordinates (x ", y") of the tangential distortion satisfies:

x”＝x'+(2p₁y'+p₂(r²+2x'²))

y”＝y'+(2p₁x'+p₁(r²+2y'²))

wherein: p is a radical of₁、p₂Is a preset parameter and can be adjusted according to the equipment for acquiring the image.

The invention has the beneficial effects that: the invention provides a data enhancement method for training a target detection deep neural network model, which is characterized in that three factors of three-dimensional change, lens distortion and illumination change are added in the traditional enhancement method to enhance data, so that a data set contains more pictures, and the enhanced new data is used for training the deep neural network to have higher precision in target detection.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the following further describes the technical solution of the present invention with reference to the embodiments.

the method comprises the following steps: reading images in a training set, and carrying out normalization processing on extracted image pixels, wherein the specific steps are as follows;

s1, reading an image in a training set, wherein the pixel value range of the image is [0.255], the image has three channels of R, G and B, and the image is regarded as a three-dimensional matrix of W multiplied by H multiplied by 3;

wherein: w is the image width, H is the image height;

s2, carrying out normalization processing on the pixel values, namely dividing the pixel values by 255 to change the value range of the pixel values into [0,1 ].

Step two: extracting all marked targets from the existing training data of the image, namely, manually marking partial data in the training data and intercepting all marked data; the manual marking and the extraction method are the optimal extraction method of the invention, the implementation is simple, and the marking principle is as follows: if the manual marking is that the coordinates of the upper left corner of the area are (100, 200) and the coordinates of the lower right corner are (300, 400), then the rectangular areas of the two points contain the marking target; besides the method, other marking methods can be used, such as marking the target by using a matrix with only 0 and 1 values, wherein the position with 1 represents the target, and the position with 0 represents the background.

Step three: randomly zooming the extracted target, namely generating a random number positioned in an interval [ a, b ] by using a computer as a zooming multiple, zooming the extracted target by utilizing a cubic spline interpolation method, and generating a zoomed target; wherein [ a, b ] is the value range of the random number, which is adjusted according to the characteristics of the data set itself, and is not a fixed value.

Random number of (theta)_x，θ_yAnd theta_zThe rotation angles of the target around the three axes of X, Y and Z are respectively represented, and the original coordinates of the point on the target in the three-dimensional space are (X, Y and Z), and the coordinates after the rotation around the coordinate axes are (X ', Y ' and Z '), then(x, y, z) and (x ', y ', z ') satisfy the following relationship:

step five: and mapping the target after three-dimensional rotation to an XY plane where the image is located, namely, assigning a pixel value at the coordinate of the original target (x, y) to a pixel point of the new target round (x ', y'), wherein for example, the pixel value at the original target (142,173) is (0.5,0.1,0.4), the pixel value at the new target (100.200) is (0.5,0.1,0.4) after the three-dimensional rotation and the mapping to the XY plane, and the point (142,173) is transformed to (100.37,200.5).

Wherein: round means rounding;

step six: simulating illumination change by using a simplified model, processing the mapped target to obtain new data added with the illumination change, and specifically comprising the following steps of:

s5, a newly generated target is also an image and is provided with three channels of R, G and B, the pixel value of each channel is a matrix, the dot product of the matrix formed by the pixel values of each channel and S is respectively calculated, namely the pixel value of the corresponding position of the matrix is multiplied, so that a new target matrix S ' w ' h ' with illumination change can be obtained, because the target is essentially an area in the image, the target is also provided with three R, G, B channels, the target can be represented into a matrix with width x height 3 no matter whether the target is a new target or an untransformed target, w ' is the width of the new target, and h ' is the height of the new target;

the S-M-N is a sample of a normal distribution function, and since the normal distribution function is continuous, only a discrete interval is available, for example, a function in the interval of (-1,1) is sampled into 20 points, and then function values at 20 points of-1, -0.9, -0.8,. 9, 0.8, 0.9, 1 are retained; the same continuous interval can sample different numbers of function values, such as-1, -0.8, -0.6, …,0.6,0.8,1, so that only 10 points are sampled;

carrying out different sampling on the continuous two-dimensional normal distribution function to generate matrixes with different sizes; the generated matrix is dot-product with each channel of the new target, i.e. the two matrices are multiplied by corresponding position elements, so that the matrix obtained by sampling the normal distribution should be the same as the maximum width and the maximum height of the new target.

Step seven: adding lens distortion:

s1, radial distortion, namely distortion distributed along the boundary direction of the lens, wherein the target coordinates after the first step to the fifth step are (x, y), and the coordinates after the radial distortion are (x ', y'), which satisfy the following relations:

x'＝x(1+k₁r²+k₂r⁴+k₃r⁶)

y'＝y(1+k₁r²+k₂r⁴+k₃r⁶)

wherein: r is the distance from a point to the center of the target, k₁,k₂And k₃The parameters are preset parameters and can be adjusted according to equipment for acquiring images;

s2, tangential distortion, namely, setting the coordinates subjected to radial distortion as (x ', y '), and the coordinates subjected to tangential distortion as (x ', y "), wherein the coordinates satisfy the following relation:

x”＝x'+(2p₁y'+p₂(r²+2x'²))

y”＝y'+(2p₁x'+p₁(r²+2y'²))

Finally, the pixel value of the target coordinate (x, y) mapped in the step five is assigned to (x', y ") to generate a final new target;

The first embodiment is as follows: in practice, due to the fact that the data volumes of data sets of different classes are different, a data enhancement technology (data augmentation technology) is used in training, we simply ignore the classes with the instances smaller than 100, the remaining 45 classes can be used for classification, the classes with the instances between 100 and 1000 are enhanced to 1000 by the method, and the rest is more than 1000;

then, a traffic sign detection algorithm based on a deep neural network is trained by using a newly generated data set, the training effect is evaluated by using the average accuracy, and the accuracy of the traffic sign detection algorithm is compared with the accuracy of a neural network trained by using a traditional data enhancement method, as shown in table 1:

TABLE 1 average accuracy comparison (%)

As can be seen from the table I, the average accuracy of the training data enhanced by the method is generally higher than that of the neural network trained by the traditional data enhancement method.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A data enhancement method for training a target detection deep neural network model is characterized by comprising the following specific calculation steps:

step four: performing three-dimensional random rotation on the zoomed target, namely generating three located intervals by using a computerRandom number of (theta)_x，θ_yAnd theta_zRespectively representing the rotation angles of the target around the three axes of X, Y and Z, and setting the eyesThe original coordinates of the target point in the three-dimensional space are (x, y, z), and the coordinates after rotating around the coordinate axis are (x ', y', z '), then (x, y, z) and (x', y ', z') satisfy the following relations:

2. The method as claimed in claim 1, wherein the step one of reading the images in the training set and normalizing the images comprises the specific steps of:

wherein: w is the image width and H is the image height.

4. The method for enhancing data of training the deep neural network model for target detection according to claim 1, wherein the specific process of adding new data obtained by change in the sixth step is as follows:

Wherein: w 'and h' are the maximum width and the maximum height of the newly generated target matrix respectively;

5. The method of claim 1, wherein the relationship between the target coordinates (x, y) and the radially distorted coordinates (x ', y') in step seven satisfies the following relationship:

x'＝x(1+k₁r²+k₂r⁴+k₃r⁶)

y'＝y(1+k₁r²+k₂r⁴+k₃r⁶)

6. The data enhancement method for training the target detection deep neural network model of claim 5, wherein the relationship between the coordinates (x ', y') after the radial distortion and the coordinates (x ", y") of the tangential distortion satisfies:

x”＝x'+(2p₁y'+p₂(r²+2x'²))

y”＝y'+(2p₁x'+p₁(r²+2y'²))