CN112508787A

CN112508787A - Target detection method based on image super-resolution

Info

Publication number: CN112508787A
Application number: CN202011470434.XA
Authority: CN
Inventors: 华尧; 梁涛
Original assignee: Panji Technology Co ltd
Current assignee: Panji Technology Co ltd
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-03-16

Abstract

A target detection method based on image super-resolution comprises the following steps: step 1, sending an original image into a target detection network to obtain an image needing super-resolution; step 2, performing super-resolution on the output characteristic diagram by using a super-resolution network to obtain a characteristic diagram with a larger size; step 3, cutting the characteristic diagram in the step 2 into a plurality of small areas again, and then searching a target frame on each small area; and 4, comparing the characteristic diagram with the characteristic diagram obtained in the step 3 to obtain a loss function of the whole network. According to the invention, the resolution ratio is improved by using the super-resolution network on the feature map or part of the original map, so that the target detection efficiency of the whole network is improved, the complexity of super-resolution and the calculation complexity for processing the super-resolution original map are reduced, and the real-time performance of the whole target detection task based on the super-resolution is improved.

Description

Target detection method based on image super-resolution

Technical Field

The invention belongs to the technical field of image target detection, and particularly relates to a target detection method based on image super-resolution.

Background

The Super Resolution (SR) is to increase an image with Low Resolution (LR) to High Resolution (HR) by a certain algorithm. The high-resolution image has higher pixel density, more detailed information and finer image quality. In order to obtain a high-resolution image, the most direct method is to use a high-resolution camera, however, in the practical application process, due to the consideration of the manufacturing process and the engineering cost, the high-resolution and super-resolution camera is not used in many occasions to acquire the image signal. Therefore, there is a certain application demand for obtaining HR by the super-resolution technique.

As can be seen from the definition of the technique, the super-resolution technique mainly has the function of recovering missing information in an image to form an image with higher resolution, and is generally used in the fields of photo restoration, film restoration, transmission image compression, and the like.

The super-resolution technology can be realized by traditional difference and other methods, and can also be realized by Deep Neural Network (DNN) calculation.

In the field of image processing, there is another important application field, which is target detection, and the target detection task is to identify the category of an object in a given picture and label the area where the object is located. The target detection technology which is widely applied at present is based on a Deep Neural Network (DNN) algorithm, a plurality of feature maps are generated by processing an input image through a DNN network, and a target to be detected is finally found in the feature maps by a picture frame searching method. Note that the size of the feature map is usually much smaller than that of the original image, and taking YOLOV3 as an example, the size of the input image is 416 × 416 pixels, and the size of the feature map is only 52 × 52,26 × 26, and 13 × 13 pixels.

In the task of target detection, one of the more difficult problems is: because the small target is difficult to extract by using a DNN network due to the characteristics of the small target, the target detection algorithm based on the DNN is generally low in detection success rate of the small target. A very intuitive idea is therefore: after the original picture is amplified by using an image super-resolution technology, the amplified picture is used for target detection, and the size of a small target is amplified, so that the recognition probability can be improved.

There is an optimal TDSR algorithm based on super-resolution target detection, i.e. a super-resolution algorithm driven by target detection. In the algorithm, a super-resolution network is used firstly, and the super-resolution processing is carried out on an original image needing to be detected to obtain an amplified image. And then, detecting by using a DNN network for object detection under the amplified image, and finally detecting a smaller object on the original image.

In a neural network system, an important concept is neural network training, which refers to using a loss function tool to calculate parameter values in a neural network. The loss function is a measure of the contribution of each parameter in the neural network to the output of the neural network. Firstly, calculating the loss of the output result using the current neural network parameters by using a loss function, then carrying out incremental adjustment on the neural network parameters once according to the loss value, and calculating the output result again. The above operations are repeated for a plurality of times, and finally the network parameters converge to a stable value. The above process is called training.

In the training process of the TDSR network, due to the introduction of the super-resolution network, a loss function based on a target detection result needs to be modified to revise parameters of the super-resolution network, that is, a task of super-resolution is designed for the accuracy of target detection.

The prior art has two disadvantages.

One is as follows: for the object detection network, the calculation speed of the network is related to the size of the input image, and if the side length of the input image is increased by 1 time, the calculation amount of the entire following object detection DNN network is increased by possibly more times. But the super-resolution itself is also very computationally intensive. Taking YOLOV3 as an example, when the input image is increased from 320 × 320 to 608 × 608, the image size is increased by less than 1 time, but the DNN network computation amount is increased by nearly 4 times (38.97vs 140.69TFLOPS), so that the resolution of the original image is improved, and the cost of the computation amount is very large, which has a great influence on the real-time performance of the system.

A second drawback of this technique is: because the DNN network uses the feature map after the operation process to perform the object recognition, the original image and the feature map have a certain difference, so that the resolution of the original image is improved and the feature of the system for recognizing the object cannot be obviously improved. Therefore, performing super-resolution operation on the original image has a limited capability of detecting the target for improving the DNN network to extract the target feature.

Disclosure of Invention

The invention aims to provide a target detection method based on image super-resolution to solve the problems.

In order to achieve the purpose, the invention adopts the following technical scheme:

a target detection method based on image super-resolution comprises the following steps:

step 1, sending a low-resolution original image into a target detection network to obtain a feature map needing super-resolution;

step 2, performing super-resolution on the output characteristic diagram by using a super-resolution network to obtain a characteristic diagram with a larger size;

step 3, cutting the characteristic diagram in the step 2 into a plurality of small areas again, ensuring that each small area is consistent with the size of the extracted network of the target frame of the original network, and then searching the target frame on each small area;

step 4, sending a high-resolution original image corresponding to the original low-resolution image into a current network, and obtaining a feature map with the same size after passing through the step 1-3 by using the original target detection network without super-resolution work; and comparing the characteristic diagram with the characteristic diagram obtained in the step 3 to obtain a loss function of the whole network.

Further, in step 1, case 1 is included: and (3) obtaining the size of an original image by down-sampling the high-definition original image, sending the size of the original image into a target detection network, and obtaining a characteristic image through the target detection network.

Furthermore, a super-resolution characteristic map is obtained by using a gradient map or a residual error network.

Further, in step 1, case 2 is included: sending the original image into a two-stage target detection network, and obtaining a candidate frame area of the image after the image passes through a first-stage network; and calculating the corresponding image area of the candidate frame area in the original image according to the obtained candidate frame to be used as the image needing super-resolution.

Further, if a plurality of candidate frames are found, extracting image areas corresponding to the candidate frames respectively, and taking a part of the union set; some threshold condition may be used to filter valid candidate frames, such as the size, position, overlap between candidate frames, etc.

Further, in step 4, the first order or second order loss of the feature map and the feature map obtained in step 3 is used as a super-resolution loss, and the loss after target detection and the super-resolution loss are combined to be used as a loss function of the whole network.

Further, in step 4, the resolution times of the low-resolution image and the high-resolution original image are consistent with the super-resolution network magnification of step 2.

Compared with the prior art, the invention has the following technical effects:

according to the invention, the resolution ratio is improved by using the super-resolution network on the feature map or part of the original map, so that the target detection efficiency of the whole network is improved, the complexity of super-resolution and the calculation complexity for processing the super-resolution original map are reduced, and the real-time performance of the whole target detection task based on the super-resolution is improved.

The method and the device perform super-resolution on the feature map, so that the feature of resolution improvement is closer to the feature of target detection, and the capability of target detection can be improved instead of simply improving the resolution of the image.

Drawings

FIG. 1 is a schematic diagram of a conventional target detection network;

fig. 2 is an original drawing of a super-resolution part in an improved target detection network.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

referring to fig. 1 and 2, the scheme uses a super-resolution network to improve the resolution of a feature map, rather than using the resolution of an original feature map, and sends the resolution-improved feature map to a subsequent target frame extraction algorithm for extraction. Because super-resolution processing on the feature map is used. The target detection network needs to be processed accordingly so that it can respond to the target frame extraction on the larger feature map after super-resolution.

Step 1: and (3) obtaining the original image size of the high-definition original image (HR) through downsampling, sending the original image size to a target detection network, and obtaining a characteristic image through the target detection network.

Taking YOLOV3 network as an example, an image with 1684 × 1684 resolution is down-sampled by 4 times to obtain an original 416 × 416 image, and the original 416 × 416 image is sent to a target detection network, and the size of a feature map output by the network is 13 × 13.

Step 2: and performing super-resolution on the output characteristic diagram by using a super-resolution network to obtain a characteristic diagram with a larger size. Super-resolution feature maps need to be obtained by means of gradient maps and the like.

Step 2 a: optionally, step 2 may obtain the super-resolution feature map by using a residual error network

Taking the YOLOV3 network as an example, the size of the final output feature map is 13 × 13, and if the feature map is subjected to 4 times of super-resolution, a feature map of 52 × 52 is obtained.

And step 3: and cutting the characteristic diagram into a plurality of small areas again to ensure that the size of each small area is consistent with that of the target frame extraction network of the original network, and then searching the target frame in each small area, thereby ensuring that the subsequent target detection network does not need to be modified.

Taking YOLOV3 network as an example, after obtaining a 52 × 52 feature map by super-resolution, the feature map is cut into 16 blocks of 13 × 13 regions, and then each region is sent to the original 13 × 13 target frame extraction network for identification.

If the analogy is to perform super-resolution directly on the original image, the whole target detection network needs to perform target detection on 1684 × 1684 picture, and the calculation amount is far greater than the current difficulty of detection on 52 × 52 feature map.

And 4, step 4: and (3) sending the high-definition original (HR) image into the current network, and obtaining a feature map with the same size after the step 1-3 by using the same network. Then, the difference between the feature map and the feature map obtained in step 3 is used as a super-resolution loss, and the loss after target detection (such as common classification loss and bbox loss) and the super-resolution loss are processed together (such as weighted average) as a loss function of the whole network.

In a two-stage target detection network (a typical fast RCNN network), because a two-stage feature extraction mode is used, candidate frames possibly having targets are extracted in the first stage, and only the candidate frame regions are subjected to target detection in the second stage, the candidate frame regions can be subjected to super-resolution by using a super-resolution network, so that a complete feature map does not need to be subjected to super-resolution, and the speed of the whole network is higher.

Step 1, sending an original image into a two-stage target detection network. And obtaining the candidate frame area of the graph after the first-stage network.

Taking the Fasterrcnn network as an example, after the original image passes through the backbone network and the RPN network, a plurality of candidate frame regions are output.

Step 2: and according to the obtained candidate frame, calculating the candidate frame, and cutting the image at the corresponding position in the original image to be used as the image needing super-resolution.

Step 2 a; if multiple candidate boxes are found, portions of the multiple candidate boxes are extracted.

And step 2 b: some threshold condition may be used to filter valid candidate frames, such as the size, position, overlap between candidate frames, etc.

Step 3-5: in the same manner as in step 2-4 of embodiment 1, the clipped original image obtained in step 2 is used as a feature map for subsequent input. The target detection DNN network used therein may be the two-stage target detection network used in step 1, or any other target detection network.

In most cases, step 2 does not find a candidate box, and the whole super-resolution network is not executed, so that the whole system is not overloaded.

Claims

1. A target detection method based on image super-resolution is characterized by comprising the following steps:

step 1, sending a low-resolution image into a target detection network for processing to obtain a first characteristic diagram;

step 2, using a super-resolution network to perform super-resolution on the first characteristic diagram to obtain a second characteristic diagram with a larger size;

step 3, the second characteristic diagram in the step 2 is cut into more than two small areas again, the size of each small area is ensured to be consistent with that of an output characteristic diagram of a target frame extraction network of the target detection network, and then subsequent target processing is carried out on each small area;

step 4, sending the high-definition image into the target detection network in the step 1 to obtain a third feature map with the same size as the second feature map; the third feature map is then compared with the second feature map as a loss function of the target detection network and used for training of the target detection network.

2. The method for detecting the target based on the image super-resolution as claimed in claim 1, wherein the step 1 comprises the following steps: and (3) down-sampling and reducing the high-definition original image to the original image size, sending the original image size to a target detection network, and obtaining the first characteristic image through the target detection network.

3. The method for detecting the target based on the image super-resolution according to claim 1, wherein the super-resolution feature map is obtained by using a gradient map or a residual error network.

4. The method of claim 1, wherein in step 4, the difference between the feature map and the feature map obtained in step 3 is used as super-resolution loss, and the target detection loss after target detection and the super-resolution loss are combined as a loss function of the whole network.

5. The image super-resolution-based target detection method according to claim 1, wherein in step 1, the target detection network is a two-stage target detection network, and the target detection network obtains one or more candidate frame areas after passing through a first-stage network; and calculating a corresponding image area in the original image according to the candidate frame area, and taking the image area as a feature map needing super-resolution in the step 2.

6. The method according to claim 5, wherein if more than one candidate frame region is found, a union of image regions respectively calculated by a plurality of candidate frames is used as the feature map.

7. The method of claim 5, wherein a threshold condition is used to screen valid candidate frames, such as size, position, and overlapping portion between candidate frames.

8. The method according to claim 5, wherein after obtaining the image region corresponding to the clipped candidate frame, the original target detection network or any other target detection network is used for subsequent target detection.