CN115331078A

CN115331078A - ESR-YOLOv 5-based optical remote sensing image target detection method

Info

Publication number: CN115331078A
Application number: CN202211074246.4A
Authority: CN
Inventors: 方坤; 黄旭光
Original assignee: South China Normal University
Current assignee: South China Normal University
Priority date: 2022-09-02
Filing date: 2022-09-02
Publication date: 2022-11-11

Abstract

The invention discloses an ESR-YOLOv 5-based optical remote sensing image target detection method, which comprises the following steps: acquiring an optical remote sensing image data set, inputting the optical remote sensing image data set to a super-resolution generation countermeasure network model for preprocessing, and generating a preprocessed optical remote sensing image data set; constructing an improved YOLOv5 network model; training the improved YOLOv5 network model based on the preprocessed optical remote sensing image data set to obtain a trained ESR-YOLOv5 model; and inputting the remote sensing image data set to be detected into the trained ESR-YOLOv5 model to obtain a target detection result of the remote sensing image. By using the method and the device, the quality of the optical remote sensing image can be improved, and the overall detection accuracy and universality degree of the ESR-YOLOv5 network can be improved. The ESR-YOLOv 5-based optical remote sensing image target detection method can be widely applied to the technical field of optical remote sensing image processing.

Description

ESR-YOLOv 5-based optical remote sensing image target detection method

Technical Field

The invention relates to the technical field of optical remote sensing image processing, in particular to an ESR-YOLOv 5-based optical remote sensing image target detection method.

Background

The target detection technology is a fundamental problem for researching the understanding of optical remote sensing images, and is a player and a remarkable role thereof and a very important application value from the civil field or the military field, in the military field, the remote sensing image technology is widely applied to aspects such as strategic reconnaissance, weapon identification, land and sea surveying and the like, in the civil field, the remote sensing image technology provides important information guidance for national production and provides convenient guidance for life of common people, and the existing data enhancement type target detection method mainly has the following defects, most of the existing methods are suitable for remote sensing image data sets with more samples, and few of unbalanced problems in the data samples are solved, and the data enhancement mode of the input end of a target detection model mainly depends on a rule mode of manual manufacture, so that real samples which are diversified and closer to the original data are difficult to generate; the main network of the target detection model, the anchor frame in the input end and the IOU loss function used by the network can not accurately reflect the coincidence degree of the target frame and the prediction frame, and a larger optimization space still exists.

Disclosure of Invention

In order to solve the above technical problems, an object of the present invention is to provide a method for detecting an object in an optical remote sensing image based on ESR-YOLOv5, which can improve the quality of the optical remote sensing image and can also improve the detection accuracy and universality of the whole ESR-YOLOv5 network.

The first technical scheme adopted by the invention is as follows: an ESR-YOLOv 5-based optical remote sensing image target detection method comprises the following steps:

acquiring an optical remote sensing image data set, inputting the optical remote sensing image data set to a super-resolution generation countermeasure network model for preprocessing, and generating a preprocessed optical remote sensing image data set;

constructing an improved YOLOv5 network model;

training the improved YOLOv5 network model based on the preprocessed optical remote sensing image data set to obtain a trained ESR-YOLOv5 model;

and inputting the remote sensing image data set to be detected into the trained ESR-YOLOv5 model to obtain a target detection result of the remote sensing image.

Further, the step of acquiring the optical remote sensing image data set and inputting the optical remote sensing image data set to the super-resolution generation countermeasure network model for preprocessing, and generating the preprocessed optical remote sensing image data set specifically includes:

acquiring an optical remote sensing image data set;

carrying out reduction processing on the optical remote sensing image data set by a bicubic interpolation method to obtain a low-resolution image data set, and inputting the low-resolution image data set to a super-resolution generation countermeasure network model;

the super-resolution generation countermeasure network model comprises a generator and a discriminator;

a generator for generating a countermeasure network model based on super-resolution, and performing amplification processing on the low-resolution image data set to generate a preliminary super-resolution image data set;

a discriminator for generating a confrontation network model based on super-resolution, and performing discrimination processing on the preliminary super-resolution image data set to generate a super-resolution image data set;

and merging the low-resolution image data set and the super-resolution image data set to obtain a preprocessed remote sensing image data set.

Further, the step of generating the super-resolution image data set by performing discrimination processing on the preliminary super-resolution image data set by the discriminator based on the super-resolution generation countermeasure network model specifically includes:

and the discriminator of the super-resolution generation countermeasure network model calculates and judges the similarity between the initial super-resolution image data set generated by the generator of the super-resolution generation countermeasure network model and the optical remote sensing image data set until a preset precision condition is met, and outputs the super-resolution image data set.

Further, the step of constructing the improved YOLOv5 network model specifically includes:

based on the original YOLOv5 network model, the aspect ratio of the anchor frame value in the original YOLOv5 network model is reset, and the improved YOLOv5 network model is obtained.

Further, the step of training the improved YOLOv5 network model based on the preprocessed optical remote sensing image dataset to obtain a trained ESR-YOLOv5 model specifically includes:

dividing the preprocessed optical remote sensing image data set to obtain a training set, a verification set and a test set;

training the improved YOLOv5 network model based on a training set to obtain a remote sensing image weight model;

and (4) performing parameter adjusting and testing on the remote sensing image weight model based on the verification set and the test set to obtain the trained ESR-YOLOv5 model.

Further, the step of training the improved YOLOv5 network model based on the training set to obtain a remote sensing image weight model specifically includes:

inputting the training set into a modified YOLOv5 network model for training, wherein the modified YOLOv5 network model comprises an input layer, a downsampling layer, a convolutional layer, a CSPlayer layer, an SPPBottleeck layer, an upsampling layer and an output layer;

performing image matrixing processing on the training set based on an input layer of an improved YOLOv5 network model to obtain a training set after image matrixing;

carrying out dimensionality reduction and abstract processing on the size of the remote sensing image in the training set after image matrixing based on a downsampling layer of an improved YOLOv5 network model to obtain a training set after dimensionality reduction;

performing convolution kernel calculation processing on the training set after dimensionality reduction on the basis of a convolution layer of an improved YOLOv5 network model to obtain characteristic information of the training set;

performing convolution calculation and matrix normalization processing on the characteristic information of the training set based on a CSPlayer layer of an improved YOLOv5 network model to obtain a normalized training set;

performing convolution kernel calculation processing on the normalized training set by an SPPBottlenck layer based on an improved YOLOv5 network model to obtain a calculation result;

reducing the size of the remote sensing image in the calculation result based on an upper sampling layer of the improved YOLOv5 network model to obtain a matrixed remote sensing image;

and outputting the matrixed remote sensing image based on an output layer of the improved YOLOv5 network model, and constructing a remote sensing image weight model.

Further, the step of performing tuning and testing on the remote sensing image weight model based on the verification set and the test set to obtain the trained ESR-YOLOv5 model specifically includes:

inputting the verification set into a remote sensing image weight model for verification processing to obtain a primary remote sensing image target detection result;

adjusting the hyper-parameters of the remote sensing image weight model according to the preliminary remote sensing image target detection result to obtain an adjusted remote sensing image weight model;

and inputting the test set into the adjusted remote sensing image weight model for testing to obtain the trained ESR-YOLOv5 model.

The method and the system have the beneficial effects that: the method has the advantages that the optical remote sensing image data set is preprocessed through the super-resolution generation confrontation network model, the characteristic space of the remote sensing image is enhanced, the low-quality remote sensing image is improved, the target texture and the boundary of the remote sensing image can be optimized and supplemented more clearly, the preprocessed remote sensing image data set is closer to a real image sample, the initialized self-adaptive anchor frame scale of the YOLOv5 network model is improved, the possibility of losing characteristic information due to the fact that targets with different scales exist in the remote sensing image is avoided, the improved YOLOv5 network model is trained through the preprocessed remote sensing image data set, and the overall detection precision of a target detection network and the degree of network universality are effectively improved.

Drawings

FIG. 1 is a flow chart of steps of an ESR-YOLOv 5-based optical remote sensing image target detection method of the invention;

FIG. 2 is a schematic diagram of an ESR-YOLOv5 network model according to the present invention;

FIG. 3 is a flow chart of an embodiment of the ESR-YOLOv5 network remote sensing image target detection of the present invention;

FIG. 4 is a schematic structural diagram of the ESR-YOLOv5 model after improvement.

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments. For the step numbers in the following embodiments, they are set for convenience of illustration only, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

Referring to fig. 1 and 3, the invention provides an ESR-YOLOv 5-based optical remote sensing image target detection method, which comprises the following steps:

s1, acquiring an optical remote sensing image data set, inputting the optical remote sensing image data set to a super-resolution generation countermeasure network model for preprocessing, and generating a preprocessed optical remote sensing image data set;

s11, acquiring an optical remote sensing image data set;

specifically, an original two-dimensional image dataset, that is, an optical remote sensing image dataset (optical remote sensing image UCAS AOD dataset and VEDAI dataset) is prepared as an input of the super-resolution generation countermeasure network model.

S12, carrying out reduction processing on the optical remote sensing image data set through a bicubic interpolation method to obtain a low-resolution image data set, and inputting the low-resolution image data set to a super-resolution generation countermeasure network model;

specifically, bicubic interpolation is the most commonly used interpolation processing method in a two-dimensional image space, and compared with other methods (such as a bilinear filter algorithm and the like), the advantage and basis of the method are that the detail quality in image pixels can be better kept in the scaling of the processed image, the processing process of the bicubic interpolation method is to perform 4-time scaling on one two-dimensional image, the value of a corresponding target pixel point in the scaled image is obtained by performing weighted calculation on 16 peripheral points (A, B, C, \ 8230;, N, O and P) of each pixel point in an original image (HR) by using the bicubic interpolation method, and thus, a program is used for performing batch scaling on the two-dimensional image data set.

S13, the super-resolution generation countermeasure network model comprises a generator and a discriminator;

s14, a generator for generating a confrontation network model based on super-resolution, and carrying out amplification processing on the low-resolution image data set to generate a preliminary super-resolution image data set;

s15, a discriminator of the countermeasure network model is generated based on the super-resolution, and the preliminary super-resolution image data set is discriminated and processed to generate a super-resolution image data set;

specifically, a discriminator of the super-resolution generation countermeasure network model calculates and judges the similarity between a preliminary super-resolution image data set generated by a generator of the super-resolution generation countermeasure network model and an optical remote sensing image data set until a preset precision condition is met, and outputs the super-resolution image data set, wherein for the problem of the similarity, in the scheme, the similarity between an original image and a super-resolution image is mainly compared, so that the quality of the image generated by the network is measured, SSIM is an index for measuring the similarity between two images, the value range of SSIM is [0,1], the larger the value of SSIM is, the smaller the image distortion is represented, and the formula of SSIM is as follows:

SSIM(X，Y)＝l(X，Y)*c(X，Y)*s(X，Y)

in the above formula, X represents an original image, Y represents a super-resolution image, l (-) represents brightness between the two images, c (-) represents contrast between the two images, and s (-) represents a structure between the two images;

it can be seen that SSIM measures image similarity from three angles of brightness, contrast, and structure, respectively, where:

in the above formula,. Mu. _X 、μ _Y Means, σ, of X, Y images _X 、σ _Y Representing the variance, σ, of the images X, Y _XY Representing the covariance of the images X, Y, C ₁ 、C ₂ And C ₃ All represent constants;

further, to avoid the condition that the denominator is zero, C is usually adopted ₁ ＝(K ₁ *L) ² ，C ₂ ＝(K ₂ *L) ² ，C ₃ ＝(C ₂ ) ² In general, take K ₁ ＝0.01，K ₂ ＝0.03，L＝255；

The discriminator network D (X) carries out similarity calculation on the super-resolution image generated by the generation network G (X) and the real label, and D is used as a binary classifier to try to distinguish and judge whether the generated image is close to the real label or not, G deceives D by improving the quality of the synthesized output image, and finally G synthesizes a clearer super-resolution image SR through repeated game, so that generally, the precision D (G (X)) = 0.5 is more than or equal to the precision D (G (X)) =1, and the generated network G obtained at the moment can be used for outputting the generated image;

and S16, combining the low-resolution image data set and the super-resolution image data set to obtain a preprocessed remote sensing image data set.

Specifically, the preprocessing method is to convert a low-resolution remote sensing target image and cut an original image for enhancing optical remote sensing through a BICUBIC mode to reduce the original image by 4 times to be used as low-resolution image data, and then the low-resolution image is input to a generator to perform x 4-scale image super-resolution generation, namely as follows:

G：SR＝g(LR)

in the above formula, G represents a generator, SR represents a super-resolution image, LR represents a low-resolution image, G represents a function G (X) of the generator G, X represents the low-resolution image LR here, and the super-resolution remote sensing image SR is generated by the G (LR) function;

generating a super-resolution remote sensing image with the same size as an original image, inputting a super-resolution image generated by a generator into a discriminator to discriminate with an original optical remote sensing image data set, trying to discriminate the similarity between an SR image and an HR original image generated by the discriminator as a binary classifier, deceiving the discriminator by improving the quality of a synthesized output image by the generator, finally synthesizing a clearer super-resolution image by the generator through repeated game, merging the super-resolution image data set and low-resolution image data into a preprocessed remote sensing image data set, wherein the merging process is to introduce database data of one low-resolution image into a database storing the super-resolution image randomly and disorderly for inputting of model training, and then inputting the database data into a detection model, namely an improved YOLOv5 network model;

compared with the prior art, many traditional remote sensing image data are influenced by factors such as hardware conditions and complex environments shot in the past, so that the remote sensing image data have the characteristics of low image resolution, small sample amount of a data set and the like, and therefore the effect of identifying the image targets by a target detection model is not ideal.

S2, constructing an improved YOLOv5 network model;

specifically, anchor frames of different target detection algorithms are provided with great differences; the initial detection anchor frames of three sizes of the original YOLOv5 framework are set to adapt to the characteristics of the COCO data set, and are divided into three layers, wherein the first layer is used for matching the largest feature map and is set to be 10x13, 16x30 and 33x23; the second layer is an anchor frame on the characteristic diagram as the middle, and is set to be 30x61, 62x45 and 59x119 in size; the third layer is an anchor box on the smallest feature map, set to sizes 116x90, 156x198, and 373x326; for the task of object detection, it is generally desirable to detect small objects on a larger feature map, because the large feature map contains more information about small objects, the anchor value on the larger feature map is usually set smaller, and the anchor frame value of the smaller feature map is set relatively larger for detecting large objects, however, due to the particularity of the remote sensing data, the background field of view of the image is large and far, the size of the object is smaller compared to the whole image, and the smaller size objects such as those of motorcycles, cars, etc. are difficult to be detected by the detector as effective identification objects compared to the larger size object such as trucks, airplanes, etc., so the initial anchor frame of the small/medium/large scale object detection layer of the original yolv 5s, yolv 5m, yolv 5l, and yolv 5x models is improved based on the analysis of the size of the object in the foreground and the size analysis of the remote sensing image, wherein yolv 5s, yolv 5m, yolv 5l, and the small/medium/large size detection layer of the original yolv 5x models are the series of ESR 5-5, such as ESR 5-5 series; according to the idea of lightweight model, the four methods are characterized in that the parameter quantity of model inference is different, the inference speed and the detection precision are directly influenced, the slower the inference speed with the larger parameter quantity is, and the detection precision is relatively reduced, wherein the Yolov5s parameter quantity is the least and is 7.2M, and the inference speed on the GPU V100b1 is 6.4ms; YOLOv5M was 21.2M, speed 8.2ms; yolov5l at 46.5M and speed at 10.1ms; YOLOv5x is 86.7M at the maximum, the speed is 12.1ms, and in addition, each of the four specific methods comprises a small/medium/large-scale target detection layer which is used for framing and detecting the size of a target object from three scales, and observation of more experimental data shows that a remote sensing image is different from a natural image, the remote sensing image has large image size and the target object in the image is small, so that the size of an anchor frame of an original method which is suitable for the natural image is large, the framing and the detection of the target object in the remote sensing image are difficult to better, for this reason, the aspect ratio of the value of the anchor frame in the method is tried to be further improved so as to be adapted to the field of the remote sensing image of a special application scene, the aspect ratio of the anchor frame of the small/medium scale is set to be about 1/2, and the improvements are 15x35, 20x50 and 25x55, 55x100, 60x110 and 65x120; and the length-width ratio of the large-scale anchor frame is about 1/4, and is adjusted to 25x100, 30x130 and 35x140, so that the accurate identification of the remote sensing image target is realized, when a target object detection experiment is performed on a remote sensing image scene by using the anchor frame size of the original YOLOv5 method and the ESR-YOLOv5 method of the improved anchor frame size before improvement, the improved precision is found to be higher, and if on a UCAS AOD remote sensing image data set, the detection precision of the ESR-YOLOv5s method is improved by 5.1-6.9% compared with that of the original method.

And S3, training the improved YOLOv5 network model based on the preprocessed optical remote sensing image data set to obtain the trained ESR-YOLOv5 model.

S31, dividing the preprocessed optical remote sensing image data set;

specifically, in order to adapt to the requirements of model training, the method comprises the following steps of 1, dividing into a training set, a verification set and a test set according to the proportion of 6;

s32, training the improved YOLOv5 network model based on a training set to obtain a remote sensing image weight model;

specifically, referring to fig. 4, in order to obtain the improved YOLOv5 model concrete structure and the relation of the connection layers, which includes Inputs (input layer), focus (down-sampling layer), conv2D _ BN (convolution layer), CSPlayer (consisting of convolution layer + normalization layer), sppbottleeck (combination of several convolution layers), upSampling2D (up-sampling layer), and yohead (output layer), the names of the layers in the model are abstracted, and the calculation of matrix parameters is actually one by one in the computer program, therefore, the connection relation between the layers is also output to the next layer through the matrix data value of the previous layer, and all the connection layers are implemented by computer program codes; wherein Inputs (input layer): carrying out image matrixing on the remote sensing image; focus (downsampled layer): the input image is subjected to dimensionality reduction and abstraction, the image is represented by features of higher layers, the image is divided into a plurality of rectangular areas, and the maximum value is output to each sub-area, so that the input size of the next layer can be reduced, and the calculated amount and the number of parameters are further reduced; conv2D _ BN (convolutional layer): performing convolution kernel calculation on the output of the previous layer, and extracting different input characteristic information; CSPlayer (consisting of convolution layer + normalization layer): performing convolution calculation and normalizing the matrix after convolution; sppbottlene (combined from several convolutional layers): performing convolution kernel calculation on convolution layers with different structure compositions; upSampling2D (UpSampling layer): reducing the down-sampling layer with the original reduced size, and reducing the input characteristic image to the size of the original image; yoloHead (output layer): outputting the matrixed image;

the preprocessed data set is led into a detection model for training, and then enters a Focus part of a Backbone of the detection model, pixel points periodically extracted from a remote sensing image are reconstructed into a low-resolution image, namely, four adjacent positions of the image are stacked, w and h dimension information is focused to a conv channel space, the receptive field of each pixel point is improved, and loss of original information is reduced.

S33, inputting the verification set and the test set into the remote sensing image weight model for parameter adjustment and test processing to obtain a target detection result of the remote sensing image;

specifically, referring to fig. 2, as long as the verification set is used for evaluating the effect of the model and adjusting the hyper-parameters, the verification set comprises a plurality of remote sensing images and labels of target objects corresponding to the plurality of remote sensing images, when a group of remote sensing images in the training set is input into a remote sensing image weight model generated in the process of initial model training, when the training set reaches 10epoch, a plurality of remote sensing images in the verification set are input into the remote sensing image weight model for matrix calculation, the model outputs a plurality of image labels, then the output image labels and the original image labels of the target objects are evaluated and calculated, and mAP, PR curves, accuracy and recall rate are calculated; entering the next 10epoch stage till 100epoch, observing the result of verification calculation, and further adjusting parameters such as a model structure, a learning rate and an optimizer; testing the good and bad of the weight model generated by training and verification, inputting a plurality of remote sensing images in the test set into the weight model of the remote sensing images for matrix calculation, outputting a plurality of image labels by the model, evaluating and calculating the output image labels and the original labels of the image target objects, calculating mAP (maximum likelihood ratio), PR (resistance random) curves, precision and recall rate, and representing the evaluation results such as average precision and the like obtained by one-time calculation of the remote sensing images in the test set to represent the performance of the proposed method model;

the method comprises the steps that a remote sensing image weight model is used for a verification set to automatically obtain a preliminary identification detection result of a remote sensing image, the hyper-parameters of the remote sensing image weight model are adjusted, and some model parameters such as model structure, learning rate, an optimizer, anchor frame length-width ratio, calculation batch and the like are adjusted, parameters such as epoch setting training 100 iterations, batch size, learning rate and the like need to be adjusted, wherein one epoch represents a process of training all training samples once, the batch represents the number of parameters which are transmitted to a program for training once, the batch size determines the descending direction of the quantity gradient, the quantity of the undersized batch is small, and in extreme cases, for example, the batch size is 1, namely, each sample corrects the gradient direction once, the difference between the samples is larger and the samples are difficult to converge, and the oversize batch makes the gradient direction basically stable, easily falls into a local optimal solution, and reduces the precision; the learning rate is used as an important super-parameter in supervised learning and deep learning, and determines whether and when the objective function can converge to a local minimum value; the proper learning rate can enable the target function to be converged to a local minimum value in proper time, and the learning rate directly controls the magnitude of network gradient updating in training and directly influences the effective tolerance capability of the model; finally, the remote sensing image weight model is required to be placed in a test set to test the detection performance so as to verify that the proposed ESR-YOLOv5 model for detecting the automatic remote sensing image target is feasible.

The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An ESR-YOLOv 5-based optical remote sensing image target detection method is characterized by comprising the following steps:

acquiring an optical remote sensing image data set, inputting the optical remote sensing image data set to a super-resolution generation confrontation network model for preprocessing, and generating a preprocessed optical remote sensing image data set;

constructing an improved YOLOv5 network model;

2. The ESR-YOLOv 5-based optical remote sensing image target detection method according to claim 1, wherein the step of acquiring the optical remote sensing image dataset and inputting the optical remote sensing image dataset into a super-resolution generation countermeasure network model for preprocessing, and generating a preprocessed optical remote sensing image dataset specifically comprises:

acquiring an optical remote sensing image data set;

3. The ESR-YOLOv 5-based optical remote sensing image target detection method according to claim 2, wherein the super-resolution generation countermeasure network model-based discriminator performs discrimination processing on the preliminary super-resolution image data set to generate the super-resolution image data set, and specifically comprises:

4. The ESR-YOLOv 5-based optical remote sensing image target detection method according to claim 1, wherein the step of constructing the improved YOLOv5 network model specifically comprises:

5. The ESR-YOLOv 5-based optical remote sensing image target detection method according to claim 4, wherein the step of training an improved YOLOv5 network model based on the preprocessed optical remote sensing image dataset to obtain the trained ESR-YOLOv5 model specifically comprises:

and (5) performing parameter adjustment and testing on the remote sensing image weight model based on the verification set and the test set to obtain the trained ESR-YOLOv5 model.

6. The ESR-YOLOv 5-based optical remote sensing image target detection method according to claim 5, wherein the step of training the improved YOLOv5 network model based on a training set to obtain a remote sensing image weight model specifically comprises:

carrying out convolution kernel calculation processing on the training set after dimensionality reduction on the basis of a convolution layer of the improved YOLOv5 network model to obtain characteristic information of the training set;

7. The ESR-YOLOv 5-based optical remote sensing image target detection method according to claim 5, wherein the step of performing tuning and testing on the remote sensing image weight model based on the validation set and the test set to obtain the trained ESR-YOLOv5 model specifically comprises:

adjusting the hyper-parameters of the remote sensing image weight model according to the primary remote sensing image target detection result to obtain an adjusted remote sensing image weight model;