CN117475498B

CN117475498B - Self-adaptive target detection method and device

Info

Publication number: CN117475498B
Application number: CN202311833244.3A
Authority: CN
Inventors: 尹云峰; 史宏志; 温东超; 任智新; 赵雅倩
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-03-15
Anticipated expiration: 2043-12-28
Also published as: CN117475498A

Abstract

The invention provides a self-adaptive target detection method and a device, which relate to the technical field of artificial intelligence, and the method comprises the following steps: acquiring a target image to be detected; inputting the target image to be detected into a preset target detection model to obtain a target detection frame output by the target detection model; wherein the object detection model comprises: the system comprises a pre-decision module, a superdivision module set and a detection module, wherein the pre-decision module determines the definition level of a target image to be detected, the superdivision module performs superdivision processing on the target image to be detected, and the detection module performs target detection on the target image to be detected after the superdivision processing to obtain a target detection frame, and the superdivision module set comprisesNSuper-division modules with different magnification factors, and definition grades are divided into low-to-high gradesNGrade, the firstiTarget image to be detected with grade definition characteristics, and selecting and amplifying 2 ^{N i‑+1} The superdivision module performs superprocessing. According to the method, the target detection precision is improved on the premise that the calculation efficiency is not obviously reduced.

Description

Self-adaptive target detection method and device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a self-adaptive target detection method and device.

Background

The most widely known in the face analysis field is face recognition, which is a biological recognition technology for performing identity recognition based on facial feature information of a person, and is widely applied to the fields of security, education, finance and the like. In order to identify a face in an image, the face in the image must first be detected quickly and accurately. If there is a significant defect in the face detected from the image, for example: the uneven ambient illumination leads to high face highlight and large shadow contrast, or the insufficient illumination or the problem of angles leads to the darkness of the face, and the matching precision of a face recognition system is directly influenced, so that the inaccuracy of the face recognition result can be caused.

The traditional face detection technology has the problem of low detection accuracy, and the face detection technology based on deep learning can effectively improve the problems. The existing technology for performing face detection based on deep learning generally pre-processes an input image by using an image enhancement technology such as a super-resolution generation countermeasure network (SRGAN) before performing face detection, so that although the final face detection accuracy can be improved, the image super-resolution is simply generated into a single-size high-definition image. The detection precision can be improved by the larger single-amplification-size superdivision module, the face detection speed is reduced, the detection speed can be improved by the smaller single-amplification-size superdivision module, the face detection precision can be lost, and the local detail information of the image can be omitted when the face detection is performed, so that the detection precision is affected. Therefore, how to improve the detection accuracy, that is, to achieve both the detection speed and the detection accuracy, without significantly reducing the detection speed is a technical problem to be solved urgently.

Disclosure of Invention

The invention aims to provide a self-adaptive target detection method and device, which are used for solving the problem that the target detection method in the prior art cannot achieve both detection speed and detection precision.

The invention provides a self-adaptive target detection method, which comprises the following steps:

acquiring a target image to be detected;

inputting the target image to be detected into a preset target detection model to obtain a target detection frame output by the target detection model;

wherein the object detection model comprises: a pre-decision module, a superdivision module set and a detection module,

the pre-decision module is used for calculating the variance of the target image to be detected, determining the definition level of the target image to be detected based on the variance, and detecting the target image to be detectedThe target image is input into a superdivision module corresponding to the definition level in a superdivision module set, the superdivision module is used for superdivision processing of the target image to be detected, the detection module is used for target detection of the target image to be detected after the superdivision processing, so as to obtain the target detection frame, wherein the superdivision module set comprisesNSuper-division modules with different magnification factors, wherein the definition level is divided into low-to-high NGrade, the firstiTarget image to be detected with grade definition characteristics, and selecting and amplifying 2 ^{N i-+1} The super-division module performs super-processing,N≥ 2，i = 1,2,…,N。

according to the adaptive target detection method provided by the invention, the pre-decision module is specifically used for:

filtering the target image to be detected, and calculating the variance of the filtered target image to be detected;

and determining the definition level of the target image to be detected based on the variance and a mapping relation between the preset variance and the definition level.

According to an adaptive object detection method provided by the invention,N=3, the superdivision module set includes: the system comprises an amplification eight-time superdivision module, an amplification four-time superdivision module and an amplification two-time superdivision module, wherein the definition grades comprise three grades from low definition to high definition: the system comprises a first definition grade, a second definition grade and a third definition grade, wherein the first definition grade corresponds to an eight-times amplified superdivision module, the second definition grade corresponds to a four-times amplified superdivision module, and the third definition grade corresponds to a two-times amplified superdivision module.

According to the self-adaptive target detection method provided by the invention, the amplification eight-times superdivision module comprises: a first depth residual network module and a first sub-pixel convolution module;

The first depth residual error network module comprises a plurality of cascaded first residual error modules and is used for extracting the characteristics of the target image to be detected step by step to obtain a characteristic image;

the first sub-pixel convolution module is used for amplifying the characteristic image by eight times to obtain a characteristic image with eight times of amplified definition.

According to the adaptive target detection method provided by the invention, the first residual error module comprises the following components: the first convolution network and the second convolution network with the same structure are connected through a first activation function layer, and the first convolution network and the second convolution network both comprise: a first convolution set, the first convolution set comprising: the method comprises the steps of sequentially connecting a 1×1×16 convolution layer, a 1×3×16 convolution layer, a 3×1×16 convolution layer and a 3×3×16 hole convolution layer with an expansion rate of 2, wherein the 3×3×16 hole convolution layer with the expansion rate of 2 of a first convolution group in a first convolution network is connected with a first activation function layer, and the first activation function layer is connected with the 1×1×16 convolution layer of the first convolution group in a second convolution network.

According to the adaptive target detection method provided by the invention, the first convolution network and the second convolution network both further comprise: a second convolution set, a third convolution set, and a first aggregation layer;

The second convolution set includes: a 3×3 average pooling layer, a 1×1×16 convolution layer and a 3×3×16 cavity convolution layer with an expansion rate of 2, which are sequentially connected;

the third convolution set includes: a 1 multiplied by 32 convolution layer and a 3 multiplied by 32 cavity convolution layer with the expansion rate of 2 which are connected in sequence;

the 3 multiplied by 16 hole convolution layers with the expansion rate of 2 of each of the first convolution group and the second convolution group in the first convolution network, and the 3 multiplied by 32 hole convolution layers with the expansion rate of 2 of the third convolution group in the first convolution network are connected with the first aggregation layer in the first convolution network;

the first aggregation layer in the first convolution network is connected with a first activation function layer, and the first activation function layer is connected with a 1 multiplied by 16 convolution layer of a first convolution group in the second convolution network, a 3 multiplied by 3 average pooling layer of the second convolution group and a 1 multiplied by 32 convolution layer of a third convolution group;

the 3×3×16 hole convolution layers with the respective expansion rates of 2 in the first convolution group and the second convolution group in the second convolution network, and the 3×3×32 hole convolution layers with the expansion rates of 2 in the third convolution group in the second convolution network are connected with the first aggregation layer in the second convolution network.

According to the adaptive target detection method provided by the invention, the first sub-pixel convolution module comprises: three sets of sequentially connected first subpixel convolution sets, each first subpixel convolution set comprising: the sub-convolution layer, the two-multiplying-power up-sampling layer and the sub-activation function layer are sequentially connected.

According to the self-adaptive target detection method provided by the invention, the amplification quadruple superdivision module comprises the following components: a second depth residual network module and a second sub-pixel convolution module;

the second depth residual error network module comprises a plurality of cascaded second residual error modules and is used for extracting the characteristics of the target image to be detected step by step to obtain a characteristic image;

the second sub-pixel convolution module is used for amplifying the characteristic image by four times to obtain the characteristic image with four times of amplified definition.

According to the adaptive target detection method provided by the invention, the second residual error module comprises the following components: the third convolution network and the fourth convolution network with the same structure are connected through a second activation function layer, and each of the third convolution network and the fourth convolution network comprises: a fourth convolution set, the fourth convolution set comprising: the first activation function layer is connected with the first convolution layer, and the second activation function layer is connected with the first convolution layer, and the first activation function layer is connected with the second convolution layer.

According to the adaptive target detection method provided by the invention, the third convolution network and the fourth convolution network each further comprise: a fifth convolution set, a sixth convolution set, and a second convolution layer;

the fifth convolution set includes: a 3×3 average pooling layer and a 1×1×16 convolution layer connected in sequence;

the sixth convolution set includes: a 1×1×32 convolutional layer;

the 3×1×16 convolution layers of the fourth convolution group, the 1×1×16 convolution layers of the fifth convolution group, and the 1×1×32 convolution layers of the sixth convolution group in the third convolution network connect the second convolution layers in the third convolution network;

the second aggregation layer in the third convolution network is connected with a second activation function layer, and the second activation function layer is connected with a 1 multiplied by 16 convolution layer of a fourth convolution group in the fourth convolution network, a 3 multiplied by 3 average pooling layer of a fifth convolution group and a 1 multiplied by 32 convolution layer of a sixth convolution group;

the 3×1×16 convolutional layers of the fourth convolutional group, the 1×1×16 convolutional layers of the fifth convolutional group, and the 1×1×32 convolutional layers of the sixth convolutional group in the fourth convolutional network connect the second aggregation layer in the fourth convolutional network.

According to the adaptive target detection method provided by the invention, the second sub-pixel convolution module comprises: two sets of sequentially connected second subpixel convolutions, each second subpixel convolutions including: the sub-convolution layer, the two-multiplying-power up-sampling layer and the sub-activation function layer are sequentially connected.

According to the self-adaptive target detection method provided by the invention, the amplification double super-division module comprises the following components: a third depth residual network module and a third sub-pixel convolution module;

the third depth residual error network module comprises a plurality of third cascade residual error modules and is used for extracting the characteristics of the target image to be detected step by step to obtain a characteristic image;

and the third sub-pixel convolution module is used for amplifying the characteristic image by two times to obtain the characteristic image with the amplified definition by two times.

According to the adaptive target detection method provided by the invention, the third residual error module comprises the following components: two 3×3×16 convolutional layers and a third activation function layer, the two 3×3×16 convolutional layers being connected by the third activation function layer.

According to the adaptive target detection method provided by the invention, the third sub-pixel convolution module comprises: two sets of third subpixel convolutions connected in sequence, each third subpixel convolutions including: the sub-convolution layer, the up-sampling layer with one multiplying power and the sub-activation function layer are sequentially connected.

According to the self-adaptive target detection method provided by the invention, the target detection model is based on a target sample image and a corresponding target sample image NTraining one of the individual labels, saidNThe labels are respectively corresponding to the target sample images and amplified 2 ^{N i-+1} And amplifying the images by times, and marking a detection frame true value in each amplified image.

According to the self-adaptive target detection method provided by the invention, the training mode of the target detection model is as follows:

acquiring a training set comprising a target sample image and a corresponding label;

determining initial parameters of a target detection model, including determining training period numbers, the number of samples of a sample batch corresponding to each training period and network weight parameters;

and training the target detection model based on the target sample image of each batch and the corresponding label so as to update the network weight parameters until all training periods are completed, and obtaining the trained target detection model.

According to the adaptive target detection method provided by the invention, the target detection model is trained based on target sample images of each batch and corresponding labels so as to update the network weight parameters, and the method comprises the following steps:

inputting the target sample image into the target detection model, wherein the pre-decision module calculates a variance of the target sample image, and determines a sharpness level of the target sample image based on the variance of the target sample image i；

And inputting the target sample image into the super-division module to concentrate with the definition leveliCorresponding 2 ^{N i-+1} The superdivision module is used for obtaining a target sample image after superdivision processing;

inputting the target sample image subjected to the super-division treatment into a detection module to obtain a prediction detection frame output by the detection module;

2 corresponding the prediction detection frame and the target sample image ^{N i-+1} True value of multiplied image detection frame is substituted into preset lossFunction, in case the calculated loss value exceeds a preset threshold, updating the detection module and 2 ^{N i-+1} The weight parameters of the super-division modules are respectively calculated, otherwise, the detection modules and 2 ^{N i-+1} The weight parameters of the super division modules are kept unchanged.

The present invention also provides an adaptive target detection apparatus, the apparatus comprising:

the target image acquisition unit is used for acquiring a target image to be detected;

the model running unit is used for inputting the target image to be detected into a preset target detection model to obtain a target detection frame output by the target detection model;

the pre-decision module is configured to calculate a variance of the target image to be detected, determine a sharpness level of the target image to be detected based on the variance, and input the target image to be detected into a superdivision module corresponding to the sharpness level in a superdivision module set, where the superdivision module is configured to perform superdivision processing on the target image to be detected, and the detection module is configured to perform target detection on the target image to be detected after the superdivision processing, so as to obtain the target detection frame, where the superdivision module set includes NSuper-division modules with different magnification factors, wherein the definition level is divided into low-to-highNGrade, the firstiTarget image to be detected with grade definition characteristics, and selecting and amplifying 2 ^{N i-+1} The super-division module performs super-processing,N≥ 2，i = 1,2,…,N。

the present invention also provides an electronic device including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the adaptive target detection method as described in any one of the above when the program is executed.

The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the adaptive target detection method as described in any of the above.

According to the self-adaptive target detection method and device, the target image to be detected is acquired and is input into the preset target detection model to obtain the target detection frame output by the target detection model, wherein the pre-decision module calculates the variance of the target image to be detected, the definition grade of the target image to be detected is determined based on the variance, the superdivision module corresponding to the definition grade is selected in a centralized manner from the superdivision module in an assisted manner to perform superdivision processing on the target image to be detected, so that excessive superdivision is reduced, calculation resources are saved, the overall detection performance is improved, namely the detection speed is improved, meanwhile, the shortage of superdivision can be avoided, the face is missed to be detected, the detection accuracy is improved, and the self-adaptive target detection method of the embodiment can achieve the improvement of the target detection accuracy and the detection speed on the premise that the calculation efficiency or the detection speed is not obviously reduced.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of an adaptive target detection method provided by the invention;

FIG. 2 is a schematic diagram of a target detection model in the adaptive target detection method according to the present invention;

FIG. 3 is a schematic diagram of an eight-fold super-division module structure of an object detection model in the adaptive object detection method provided by the invention;

fig. 4 is a schematic diagram of a first residual error module structure of an eight-fold super-resolution module in the adaptive target detection method provided by the invention;

FIG. 5 is a schematic diagram of an amplification four-time superdivision module structure of a target detection model in the adaptive target detection method provided by the invention;

FIG. 6 is a schematic diagram of a second residual error module structure of the four-time amplification superdivision module in the adaptive target detection method provided by the invention;

FIG. 7 is a schematic diagram of an amplified twice superdivision module structure of a target detection model in the adaptive target detection method provided by the invention;

FIG. 8 is a schematic diagram of a third residual error module structure of a twice amplified super-division module in the adaptive target detection method according to the present invention;

FIG. 9 is a schematic diagram of an adaptive target detection apparatus according to the present invention;

fig. 10 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The following provides embodiments of the present invention through specific embodiments and application scenarios thereof with reference to the accompanying drawings. The method is described in detail.

The specific flow of the adaptive target detection method in the embodiment of the invention is shown in fig. 1, and the method comprises the following steps:

Step S110: the method comprises the steps of obtaining an image to be detected, wherein the image to be detected can be a color image of three channels (R, G, B three channels) with any size, can be an image containing animals, vehicles or faces, and can obtain different images to be detected according to different application scenes. For example: for a face detection application scene, the target image to be detected can be a face image acquired from a monitoring camera, and for a specific access control application scene, the target image to be detected can be a face image acquired from an access control camera.

Step S120: and inputting the target image to be detected into a preset target detection model to obtain a target detection frame output by the target detection model.

Wherein, as shown in fig. 2, the object detection model includes: a pre-decision module 210, a superset module set 220, and a detection module 230. The detection module 230 may select different types of detection modules according to different detection targets, and for a face detection scene, the detection module 230 may select a mainstream face detection model, for example: retinaface or multitasking convolutional neural network (Multi-Task Cascaded Convolutional Networks, MTCNN), etc.

The pre-decision module 210 is configured to calculate a variance of the target image to be detected, determine a sharpness level of the target image to be detected based on the variance, and input the target image to be detected into a supermodule corresponding to the sharpness level in the supermodule set 220. The super-division module is configured to perform super-division processing on the target image to be detected, and the detection module 230 is configured to perform target detection on the target image to be detected after the super-division processing, so as to obtain the target detection frame. Wherein the hyper-split module set 220 comprisesNSuper-division modules with different magnification factors, wherein the definition level is divided into low-to-highNGrade, the firstiTarget image to be detected with grade definition characteristics, and selecting and amplifying 2 ^{N i-+1} The super-division module performs super-processing,N≥ 2，i = 1,2,…,N。

the higher the definition level is, the clearer the target image to be detected is, and the more abundant the image features are contained, so that the high-definition target image can be obtained by selecting the superdivision module with lower magnification, the calculated amount of superdivision processing of the target image to be detected is saved, and the final detection precision can be ensured; the lower the definition level is, the less clear the target image to be detected is, and the fewer the image features are included, so that the super-resolution module with higher magnification is selected, and the detection precision of the target image to be detected is improved.

It should be noted thatIs as follows: the sharpness level is related to the variance of the target image to be detected, the variance of the image is the mean value of the square of the degree of difference between the pixel value of the image and the mean value of the pixels, and the greater the variance of the image, the higher the sharpness of the image. The variance of the image can be divided from small to largeNNon-overlapping variance sections respectively corresponding toN2 amplifies ^{N i-+1} The 1 st variance section has the smallest corresponding variance and is amplified by 2 ^N The super-division module, the firstNThe variance corresponding to each variance section is the largest and corresponds to a 2 times super-division moduleiThe variance section corresponds to an enlargement of 2 ^{N i-+1} And a superdivision module.

NThe magnification superdivision modules each include: the depth residual module and the sub-pixel convolution module, and the convolution module before the depth residual module, the convolution module between the depth residual module and the sub-pixel convolution module, and the convolution module after the sub-pixel convolution module, the depth residual modules and the sub-pixel convolution modules of different magnification superdivision modules are different. Trained and trainedNThe seed amplification super-division module processes the target image to be detected to obtainNThe high-definition target image with the amplification factor comprises larger pixel density, richer texture details and higher reliability compared with the low-definition target image, and the texture detail characteristics of the image are not easy to miss when the high-definition target image is used for target detection, so that the aim of improving the target detection precision is fulfilled.

In the adaptive target detection method of the embodiment, the target image to be detected is acquired and input into the preset target detection model to obtain the target detection frame output by the target detection model, wherein the pre-decision module 210 calculates the variance of the target image to be detected, determines the sharpness level of the target image to be detected based on the variance, and assists the subsequent selection of the superdivision module corresponding to the sharpness level from the superdivision module set 220 to perform superdivision processing on the target image to be detected, thereby reducing excessive superdivision, saving calculation resources, improving overall detection performance, namely improving detection speed, avoiding the shortage of superdivision, leading to missed detection of a human face, improving detection accuracy, and improving the detection accuracy of the human face on the premise of ensuring that the calculation efficiency or the detection speed does not significantly decrease for the target image to be detected of any sharpness.

In some embodiments, the pre-decision module 210 is specifically configured to perform filtering processing on the target image to be detected, and calculate a variance of the filtered target image to be detected; and determining the definition level of the target image to be detected based on the variance and a mapping relation between the preset variance and the definition level.

Specifically, the pre-decision module 210 calculates the second derivative of the target image to be detected by applying the laplace operator to filter the target image to be detected, and calculates the variance of the image by the following formula after filtering：

Wherein,representing pixel coordinates +.>Pixel value at +.>The pixel mean value of the whole target image to be detected is the average gray value. The mapping relation between variance and definition level can be set in advance according to actual situation, namely the firstiMapping of variance bins to magnification 2 ^{N i-+1} And a superdivision module.

In some embodiments of the present invention, in some embodiments,N=3, i.e. the super-division module set includes three kinds of super-division modules of magnification, in this embodiment, the super-division module set 220 includes: eight-fold amplification superdivision module 221, four-fold amplification superdivision module 222 and amplificationTwice as large supersplit module 223. The sharpness level includes three levels of sharpness from low to high: the first definition grade, the second definition grade and the third definition grade, the first definition grade is lower than the second definition grade, the second definition grade is lower than the third definition grade, and because the lower the definition grade is, the larger the magnification that needs to be amplified is, the higher definition target image to be detected can be obtained, therefore, the first definition grade corresponds to the eight times of amplification super-division module 221, the second definition grade corresponds to the four times of amplification super-division module 222, and the third definition grade corresponds to the two times of amplification super-division module 223.

The number of super modules of different magnifications in the set of super modules 220 (i.eNThe value of (2) can be set according to the actual situation, theoreticallyNThe larger the value is, the more the number of superdivision modules with different amplification factors is, the finer the definition level is divided, and the final detected face result is more accurate, but the target detection model can train more superdivision modules during training, more target sample images and corresponding labels are required, and the training complexity is higher. In the present embodiment of the present invention,N=3, the superdivision module set 220 includes: the eight-fold amplification superdivision module 221, the four-fold amplification superdivision module 222 and the two-fold amplification superdivision module 223 divide the definition level of the image into three levels of low (first definition level), medium (second definition level) and high (third definition level), and the eight-fold amplification superdivision module 221, the four-fold amplification superdivision module 222 and the two-fold amplification superdivision module 223 in the superdivision module set 220 respectively correspond to the two-fold amplification superdivision module, the four-fold amplification superdivision module 222 and the two-fold amplification superdivision module 223, so that the accuracy of a target detection result is ensured, and meanwhile, the complexity of a target detection model during training is reduced.

In practice, the amplified octave superdivision module 221 may correspond to a variance segment of 0, 25, the four times amplified superdivision module 222 may correspond to a variance section of 25, 45, the twice-amplified hyper-segmentation module 223 may correspond to a variance segment of 45, +++).

As shown in fig. 3, a schematic structural diagram of the amplifying octave superdivision module 221, where the amplifying octave superdivision module 221 includes: the first depth residual network module and the first sub-pixel convolution module further comprise: a convolutional layer Conv31 and an activation function layer RELU31 before the first depth residual network module, a convolutional layer Conv32 and a normalized layer BN31 between the first depth residual network module and the first sub-pixel convolution module, and a convolutional layer Conv36 after the first sub-pixel convolution module.

The first depth residual network module comprises a plurality of cascaded first residual modules, and is used for extracting the characteristics of the target image to be detected step by step to obtain a characteristic image.

In this embodiment, as shown in fig. 4, the first residual module includes: the first convolution network and the second convolution network with the same structure are connected through a first activation function layer RELU, and the first convolution network and the second convolution network both comprise: a first convolution set, the first convolution set comprising: the method comprises the steps of sequentially connecting a 1×1×16 convolution layer, a 1×3×16 convolution layer, a 3×1×16 convolution layer and a 3×3×16 hole convolution layer with an expansion rate of 2, wherein the 3×3×16 hole convolution layer with the expansion rate of 2 of a first convolution group in a first convolution network is connected with a first activation function layer RELU41, and the first activation function layer RELU is connected with the 1×1×16 convolution layer of the first convolution group in a second convolution network. By introducing the hole convolution layer with the expansion rate of 2 into the two convolution networks of the first residual error module, the network depth can be increased while the calculated amount is reduced, and the image features with more abundant types can be obtained, so that the amplified eight-times superdivision module 221 finally generates a high-quality high-definition target image to be detected.

Further, the first convolutional network and the second convolutional network each further comprise: the second convolution set, the third convolution set, and the first aggregation layer (first aggregation layer 41 in the first convolution network, first aggregation layer 42 in the second convolution network), the first convolution set, the second convolution set, and the third convolution set being convolution sets that are parallel to each other.

The second convolution set includes: the 3×3 average pooling layer, the 1×1×16 convolution layer, and the 3×3×16 hole convolution layer with an expansion rate of 2 are sequentially connected.

The third convolution set includes: the 1×1×32 convolution layer and the 3×3×32 hole convolution layer with an expansion rate of 2 are sequentially connected.

The 3×3×16 hole convolution layers with the respective expansion rates of 2 in the first convolution group and the second convolution group in the first convolution network, and the 3×3×32 hole convolution layers with the expansion rates of 2 in the third convolution group in the first convolution network are connected with the first aggregation layer Depth Concat41 in the first convolution network.

The first aggregation layer Depth Concat41 in the first convolution network is connected with a first activation function layer, and the first activation function layer is connected with a 1×1×16 convolution layer of the first convolution group in the second convolution network, a 3×3 average pooling layer of the second convolution group and a 1×1×32 convolution layer of the third convolution group.

The 3×3×16 hole convolution layers with the respective expansion rates of 2 in the first convolution group and the second convolution group in the second convolution network, and the 3×3×32 hole convolution layers with the expansion rates of 2 in the third convolution group in the second convolution network are connected to the first aggregation layer Depth Concat42 in the second convolution network.

In this embodiment, two parallel pooling and convolution are introduced into the first residual module, that is, the second convolution set and the third convolution set are introduced to increase the width of the residual network in the amplified octave superdivision module 221, so as to prevent the feature bottleneck. Specifically, conv1×1×32 as shown in FIG. 4 represents a convolution kernel of 11. A convolution layer with 32 channels and a convolution step of 1, conv1×1×16 means that the convolution kernel is 1 +.>1. A convolution layer with 16 channels and a convolution step of 1, conv1×3×16 means that the convolution kernel is 1 +.>3. A convolution layer with 16 channels and a convolution step of 1, conv3×1×16 means that the convolution kernel is 3 +.>1. A convolution layer with 16 channels and a convolution step of 1, conv3×3×16rate=2 means that the convolution kernel is 3 +.>3. A hole convolution kernel with 16 channels and 2 expansion Rate, conv3×3×32rate=2 indicates a convolution kernel of 3 +.>3. A cavity convolution kernel with 32 channels and 2 dilations. Since the convolution kernel sizes of the multipath parallel layers in front of the first aggregation layer Depth Concat41 in the first convolution network and the first aggregation layer Depth Concat42 in the second convolution network are 1 and 3, the convolution step size is 1, and the filling value (padding) of the target image to be detected can be set to 0 and 1 respectively, the characteristic of the same dimension can be obtained after convolution, and for the first convolution network and the second convolution network, the characteristic values can be spliced together directly through the respective first aggregation layers.

In this embodiment, the first subpixel convolution module includes: three sets of sequentially connected first subpixel convolution sets, each first subpixel convolution set comprising: the sub-convolution layer, the two-multiplying-power up-sampling layer PixelShuffer multiplied by 2 and the sub-activation function layer RELU are sequentially connected. As shown in fig. 3, the three first subpixel convolution groups are sequentially connected to form a hierarchical structure of conv33→pixelshuffler×2→relu32→conv34→pixelshuffler×2→relu33→conv35→pixelshuffler×2→relu 34. And three up-sampling layers PixelShuffer multiplied by 2 with two multiplying powers are cascaded to realize the amplification of the characteristic image by eight times, obtain the characteristic image with eight times of definition amplification, and finally output the characteristic image with eight times of amplification through a convolution layer conv 36.

As shown in fig. 5, a schematic structural diagram of the four-times amplification super-division module 222, where the four-times amplification super-division module 222 includes: the second depth residual network module and the second sub-pixel convolution module further comprise: the convolutional layer Conv51 and the activation function layer RELU51 before the second depth residual network module, the convolutional layer Conv52 and the normalized layer BN51 between the second depth residual network module and the second sub-pixel convolution module, and the convolutional layer Conv55 after the first sub-pixel convolution module.

The second depth residual network module comprises a plurality of cascaded second residual modules, and is used for extracting the characteristics of the target image to be detected step by step to obtain a characteristic image.

In this embodiment, as shown in fig. 6, the second residual module includes: the third convolution network and the fourth convolution network with the same structure are connected through a second activation function layer, and each of the third convolution network and the fourth convolution network comprises: a fourth convolution set, the fourth convolution set comprising: the 1×1×16 convolution layers, the 1×3×16 convolution layers, and the 3×1×16 convolution layers are sequentially connected, the 3×1×16 convolution layer of the fourth convolution group in the third convolution network is connected to the second activation function layer RELU61, and the second activation function layer RELU61 is connected to the 1×1×16 convolution layer of the fourth convolution group in the fourth convolution network.

Further, the third convolutional network and the fourth convolutional network each further comprise: the fifth convolution set, the sixth convolution set, and the second aggregation layer Depth Concat (the first aggregation layer 61 in the third convolution network, the first aggregation layer 62 in the fourth convolution network), the fourth convolution set, the fifth convolution set, and the sixth convolution set are convolution sets that are parallel to each other.

The fifth convolution set includes: the 3×3 average pooling layer and the 1×1×16 convolution layer are connected in sequence.

The sixth convolution set includes: 1 x 32 convolutional layers.

The 3×1×16 convolutional layers of the fourth convolutional group, the 1×1×16 convolutional layers of the fifth convolutional group, and the 1×1×32 convolutional layers of the sixth convolutional group in the third convolutional network are connected to the second aggregation layer Depth Concat61 in the third convolutional network.

The second aggregation layer Depth Concat61 in the third convolution network is connected to a second activation function layer, and the second activation function layer is connected to the 1×1×16 convolution layer of the fourth convolution group, the 3×3 average pooling layer of the fifth convolution group, and the 1×1×32 convolution layer of the sixth convolution group in the fourth convolution network.

The 3×1×16 convolutional layers of the fourth convolutional group, the 1×1×16 convolutional layers of the fifth convolutional group, and the 1×1×32 convolutional layers of the sixth convolutional group in the fourth convolutional network are connected to the second aggregation layer Depth Concat62 in the fourth convolutional network.

In this embodiment, two parallel pooling and convolution are introduced into the second residual module, that is, the fifth convolution set and the sixth convolution set are introduced to increase the width of the residual network in the four-fold amplification superdivision module 222, so as to prevent the feature bottleneck. Specifically, conv1×1×32 as shown in FIG. 6 represents a convolution kernel of 1 1. A convolution layer with 32 channels and a convolution step of 1, conv1×1×16 means that the convolution kernel is 1 +.>1. A convolution layer with 16 channels and a convolution step of 1, conv1×3×16 means that the convolution kernel is 1 +.>3. A convolution layer with 16 channels and a convolution step of 1, conv3×1×16 means that the convolution kernel is 3 +.>1. A convolution layer with 16 channels and a convolution step of 1. Since the convolution kernel sizes of the multipath parallel layers in front of the second aggregation layer Depth Concat61 in the third convolution network and the second aggregation layer Depth Concat62 in the fourth convolution network are 1 and 3, the convolution step size is 1, and the filling value (padding) of the target image to be detected can be set to 0 and 1 respectively, the characteristic of the same dimension can be obtained after convolution, and for the third convolution network and the fourth convolution network, the characteristic values can be directly spliced together through the respective second aggregation layers.

In this embodiment, the second subpixel convolution module includes: two sets of sequentially connected second subpixel convolutions, each second subpixel convolutions including: the sub-convolution layer, the two-multiplying-power up-sampling layer PixelShuffer multiplied by 2 and the sub-activation function layer RELU are sequentially connected. As shown in fig. 5, the two second sub-pixel convolution groups are sequentially connected to form a hierarchical structure of conv53→pixelshuffler×2→relu52→conv54→pixelshuffler×2→relu 53. And two up-sampling layers PixelShuffer multiplied by 2 with two multiplying powers are cascaded to realize the amplification of the characteristic image by four times, obtain the characteristic image with four times of definition amplification, and finally output the characteristic image with four times of amplification through a convolution layer conv 55.

As shown in fig. 7, a schematic structural diagram of the double-amplification super-division module 223, the double-amplification super-division module 223 includes: the third depth residual network module and the third sub-pixel convolution module further comprise: a convolutional layer Conv71 and an activation function layer RELU71 before the third depth residual network module, a convolutional layer Conv72 and a normalization layer BN71 between the third depth residual network module and the third sub-pixel convolution module, and a convolutional layer Conv75 after the third sub-pixel convolution module.

The third depth residual network module comprises a plurality of third cascade residual modules, and is used for extracting the characteristics of the target image to be detected step by step to obtain a characteristic image.

In this embodiment, as shown in fig. 8, the third residual module includes: two 3×3×16 convolution layers and a third activation function layer RELU81, the two 3×3×16 convolution layers being connected by the third activation function layer. Since the sharpness of the target image to be detected corresponding to the twice-amplified superdivision module is relatively high, the third residual module can use two 3×3×16 convolution layers for efficient feature extraction, and can weaken noise points in the target image to be detected to a certain extent.

In this embodiment, the third subpixel convolution module includes: two sets of third subpixel convolutions connected in sequence, each third subpixel convolutions including: the sub-convolution layer, the up-sampling layer PixelShuffer multiplied by 1 and the sub-activation function layer RELU are connected in sequence. As shown in fig. 7, the two second subpixel convolution groups are sequentially connected to form a hierarchical structure of conv73→pixelshuffler×1→rel73→conv74→pixelshuffler×1→rel73. And cascading two up-sampling layers PixelShuffer multiplied by 1 with one multiplying power to realize the amplification of the characteristic image by two times, obtain the characteristic image with the definition amplified by two times, and finally output the characteristic image with the amplification of two times through a convolution layer conv 75.

It should be noted that: in the expressions above each convolution layer in fig. 3, 5 and 7, k represents the size of the convolution kernel, n represents the number of channels of the convolution output, and s represents the step size. In the above embodiment, one of three supermodules with different amplification factors is selected according to the sharpness level of the image to be detected, and three convolution, pooling and hole convolution combinations with expansion rate of 2 are introduced in parallel in the first residual network of the eight-fold supermodule 221 in fig. 4, 3 convolution and pooling combinations are introduced in parallel in the second residual network of the four-fold supermodule 222 in fig. 6, and two 3 are introduced in the third residual network of the two-fold supermodule 223 in fig. 8 And 3, the convolution layer increases the network depth and the receptive field while not increasing the calculated amount or even reducing the calculated amount (the definition of the target image to be detected is originally higher), outputs a plurality of scale features of the image, then fuses the features of different scales, acquires the image features with richer types, and finally generates a high-definition target image with higher quality by the superdivision module, thereby improving the target detection precision.

In some embodiments, the object detection model is based on an object sample image and its correspondingNTraining one of the individual labels, saidNThe labels are respectively corresponding to the target sample images and amplified 2 ^{N i-+1} And amplifying the images by times, and marking a detection frame true value in each amplified image. Due to the presence ofNA number of sharpness levels corresponding to different magnification 2 ^{N i-+1} Multiple superdivision dieThe block, and the target sample image, has unknown sharpness level before inputting the target detection model, so it is necessary to make a preparation for each target sample imageNThe number of tags to be used in the process of the label,Nthe labels are respectively amplified by 2 ^{N i-+1} The multiplied image is marked with a detection frame true value.

In this embodiment, the training manner of the target detection model is as follows, including the following training steps:

Step one, a training set comprising target sample images and corresponding labels is obtained. For the face detection application scenario, the training set may be obtained from a currently existing face sample database, specifically, the face sample image in the training set may be a color image of three channels (R, G, B three channels) with a fixed size, for example: each face sample image may be 640 x 480 x 3 pixels in size, where 640 represents the width of the image, 480 represents the height of the image, and 3 represents 3 channels. The more face sample images included in the face training set are, the better, and the images should come from various scenes, including factors such as illumination change, face posture change, age change and the like; the more factors that are included, the better, and in addition, different ethnicities, different sexes, different skin colors, etc. should be included. When the face training set contains more training samples with larger differentiation, the detection accuracy of the face detection model obtained by the training step provided by the embodiment is higher.

For the case where the superdivision module set 220 includes an eight times magnification superdivision module 221, a four times magnification superdivision module 222, and a two times magnification superdivision module 223, the training set includes a face sample image and a label, the label includes: eight times of face sample image and face detection frame true value marked therein (namely coordinate information of the face detection frame), four times of face sample image and face detection frame true value marked therein, two times of face sample image and face detection frame true value marked therein.

Step two, determining initial parameters of a target detection model, including determining the number of training periods (epochs), the number of samples of a corresponding sample batch (batch) of each training period, and a network weight parameter. Wherein, the network weight parameter is thatEach amplification 2 ^{N i-+1} The system comprises a superdivision module and a weight parameter of a detection module. For a face detection scene, each amplification 2 of a face detection model is determined ^{N i-+1} The weight parameter of the multiple superdivision module and the weight parameter of the face detection module.

And thirdly, training the target detection model based on the target sample image of each batch and the corresponding label so as to update the network weight parameters until all training periods are completed, and obtaining the trained target detection model. Specifically, one training period (epoch) refers to updating the model parameters of the neural network by using training samples in a training set without repeating, and each training period takes one batch (batch) of data, and one batch may contain 512 target sample images to update the weight parameters of the target detection model. And training the face detection model based on the face sample images of each batch and the corresponding labels for the face detection scene to update the network weight parameters until all training periods are completed, and obtaining the trained face detection model.

The number of samples per sample lot may be set according to the memory limit of the GPU card, and the number of samples per sample lot may also be 128 or 256 target sample images.

In this embodiment, the third step specifically includes:

inputting the target sample image into the target detection model, wherein the pre-decision module calculates a variance of the target sample image, and determines a sharpness level of the target sample image based on the variance of the target sample imagei。

And inputting the target sample image into the super-division module to concentrate with the definition leveliCorresponding 2 ^{N i-+1} And the superdivision module is used for obtaining the target sample image after superdivision processing.

And inputting the target sample image subjected to the super-division processing into a detection module to obtain a prediction detection frame output by the detection module.

2 corresponding the prediction detection frame and the target sample image ^{N i-+1} True value substitution of multiplied image detection frameEntering a preset loss function, and updating the detection module and 2 under the condition that the calculated loss value exceeds a preset threshold value ^{N i-+1} The weight parameters of the super-division modules respectively, otherwise, the detection modules and 2 ^{N i-+1} The weight parameters of the super division modules are kept unchanged. Specifically, the loss function is that of a predicted detection frame of the target output by the whole target detection model Loss function->The use of the DIoU loss function is defined as follows:

（1）

（2）

wherein,predictive detection box output for target detection model, < +.>For the detection frame true value corresponding to the target sample image,IoUto predict the cross-ratios of the detection frames and the detection frame truth values,bfor predicting the detection frame center point +.>For the center point of the detection frame true value, +.>For predicting the Euclidean distance from the center point of the detection frame to the center point of the true value of the detection frame, +.>The distance between the top left corner and the bottom right corner of the minimum box that predicts the detection box and the detection box true value for the package.

If it isUpdating the detection module and 2 in case of exceeding a preset threshold value ^{N i-+1} The weight parameters of the super-division modules respectively, otherwise, the detection modules and 2 ^{N i-+1} The weight parameters of the super-division modules are kept unchanged, wherein the preset threshold value can be 0.003.

In this embodiment, a batch random gradient descent algorithm (mini-batch Stochastic Gradient Descent method) may be used to train the target detection model, i.e., update the detection module and 2 ^{N i-+1} The weight parameters of the superdivision modules are respectively used. The use of mini-batch SGD may also be employed, as may some variations of SGD algorithms, such as: adam, adaGrad, etc. gradient descent algorithm. In addition, to speed up convergence of the object detection model to an optimal value, engineering technicians may also utilize various parallel techniques (e.g., multi-machine multi-card training, etc.). In the batch random gradient descent algorithm, the initial learning rate may be 0.05, the weight decay coefficient may be 0.0005, the momentum may be 0.85, and the learning rate decay strategy may be a cosine decay strategy.

The adaptive target detection method of the embodiment can be widely used for different target detection in different application scenes, for example: in an automatic driving application scene, the adaptive target detection method of the embodiment can detect a target vehicle and an obstacle in front of the vehicle, and the target image to be detected is an image shot by a camera in front of the vehicle. When the vehicle detection model is trained, the target sample image in the training set is a driving scene sample image shot by the vehicle in the driving process, wherein the driving scene sample image comprises the driving vehicle, and the corresponding label is 2 ^{N i-+1} And multiplying the driving scene sample image and the true value of the vehicle detection frame marked in the driving scene sample image. Also for example: the face recognition access control application scene can detect the face of the person to be passed through the access control by the self-adaptive target detection method, and the target image to be detected is a face image shot by a camera in front of the access control machine. When the face detection model is trained, the target sample image in the training set is a faceSample image, corresponding label 2 ^{N i-+1} And multiplying the human face sample image and marking a true value of a human face detection frame in the human face sample image.

The adaptive target detection apparatus provided by the present invention will be described below, and the adaptive target detection apparatus described below and the adaptive target detection method described above may be referred to correspondingly to each other.

Fig. 9 is a schematic structural diagram of an adaptive target detection apparatus according to an embodiment of the present invention, as shown in fig. 9, where the apparatus specifically includes:

a target image acquisition unit 910, configured to acquire a target image to be detected.

The model running unit 920 is configured to input the target image to be detected to a preset target detection model, so as to obtain a target detection frame output by the target detection model.

the pre-decision module is configured to calculate a variance of the target image to be detected, determine a sharpness level of the target image to be detected based on the variance, and input the target image to be detected into a superdivision module corresponding to the sharpness level in a superdivision module set, where the superdivision module is configured to perform superdivision processing on the target image to be detected, and the detection module is configured to perform target detection on the target image to be detected after the superdivision processing, so as to obtain the target detection frame, where the superdivision module set includesNSuper-division modules with different magnification factors, wherein the definition level is divided into low-to-highNGrade, the firstiTarget image to be detected with grade definition characteristics, and selecting and amplifying 2 ^{N i-+1} The super-division module performs super-processing,N≥ 2，i = 1,2,…,N。

the higher the definition level of the target image to be detected is, the clearer the target image to be detected is, and the more abundant the image features are included, so that the high-definition target image can be obtained by selecting the superdivision module with lower magnification, the calculated amount of superdivision processing of the target image to be detected is saved, and the final detection precision can be ensured; the lower the definition level is, the less clear the target image to be detected is, and the fewer the image features are included, so that the super-resolution module with higher magnification is selected, and the detection precision of the target image to be detected is improved.

It should be noted that: the sharpness level is related to the variance of the target image to be detected, the variance of the image is the mean value of the square of the degree of difference between the pixel value of the image and the mean value of the pixels, and the greater the variance of the image, the higher the sharpness of the image. The variance of the image can be divided from small to largeNNon-overlapping variance sections respectively corresponding toN2 amplifies ^{N i-+1} The 1 st variance section has the smallest corresponding variance and is amplified by 2 ^N The super-division module, the firstNThe variance corresponding to each variance section is the largest and corresponds to a 2 times super-division module iThe variance section corresponds to an enlargement of 2 ^{N i-+1} And a superdivision module.

According to the self-adaptive target detection device provided by the invention, the target image to be detected is input into the preset target detection model to obtain the target detection frame output by the target detection model, wherein the pre-decision module calculates the variance of the target image to be detected, determines the definition level of the target image to be detected based on the variance, and assists the subsequent superdivision module which corresponds to the definition level to perform superdivision processing on the target image to be detected selected from the superdivision module set, so that excessive superdivision is reduced, calculation resources are saved, the overall detection performance is improved, namely the detection speed is improved, meanwhile, the shortage of superdivision can be avoided, the face is missed to be detected, and the detection accuracy is improved, so that the self-adaptive target detection method of the embodiment can realize the improvement of the target detection accuracy and the detection accuracy on the target image to be detected with any definition on the premise that the calculation efficiency or the detection speed is not obviously reduced.

Optionally, the pre-decision module is specifically configured to:

and filtering the target image to be detected, and calculating the variance of the filtered target image to be detected.

Alternatively, the process may be carried out in a single-stage,N=3, the superdivision module set includes: the system comprises an amplification eight-time superdivision module, an amplification four-time superdivision module and an amplification two-time superdivision module, wherein the definition grades comprise three grades from low definition to high definition: the system comprises a first definition grade, a second definition grade and a third definition grade, wherein the first definition grade corresponds to an eight-times amplified superdivision module, the second definition grade corresponds to a four-times amplified superdivision module, and the third definition grade corresponds to a two-times amplified superdivision module.

Optionally, the amplifying eight-times superdivision module includes: a first depth residual network module and a first sub-pixel convolution module;

Optionally, the first residual module includes: the first convolution network and the second convolution network with the same structure are connected through a first activation function layer, and the first convolution network and the second convolution network both comprise: a first convolution set, the first convolution set comprising: the method comprises the steps of sequentially connecting a 1×1×16 convolution layer, a 1×3×16 convolution layer, a 3×1×16 convolution layer and a 3×3×16 hole convolution layer with an expansion rate of 2, wherein the 3×3×16 hole convolution layer with the expansion rate of 2 of a first convolution group in a first convolution network is connected with a first activation function layer, and the first activation function layer is connected with the 1×1×16 convolution layer of the first convolution group in a second convolution network.

Optionally, the first convolutional network and the second convolutional network each further comprise: a second convolution set, a third convolution set, and a first aggregation layer;

Optionally, the first subpixel convolution module includes: three sets of sequentially connected first subpixel convolution sets, each first subpixel convolution set comprising: the sub-convolution layer, the two-multiplying-power up-sampling layer and the sub-activation function layer are sequentially connected.

Optionally, the amplification quadruple superdivision module comprises: a second depth residual network module and a second sub-pixel convolution module;

Optionally, the second residual module includes: the third convolution network and the fourth convolution network with the same structure are connected through a second activation function layer, and each of the third convolution network and the fourth convolution network comprises: a fourth convolution set, the fourth convolution set comprising: the first activation function layer is connected with the first convolution layer, and the second activation function layer is connected with the first convolution layer, and the first activation function layer is connected with the second convolution layer.

Optionally, each of the third convolutional network and the fourth convolutional network further comprises: a fifth convolution set, a sixth convolution set, and a second convolution layer;

the sixth convolution set includes: a 1×1×32 convolutional layer;

Optionally, the second subpixel convolution module includes: two sets of sequentially connected second subpixel convolutions, each second subpixel convolutions including: the sub-convolution layer, the two-multiplying-power up-sampling layer and the sub-activation function layer are sequentially connected.

Optionally, the twice-amplified superdivision module includes: a third depth residual network module and a third sub-pixel convolution module;

Optionally, the third residual module includes: two 3×3×16 convolutional layers and a third activation function layer, the two 3×3×16 convolutional layers being connected by the third activation function layer.

Optionally, the third subpixel convolution module includes: two sets of third subpixel convolutions connected in sequence, each third subpixel convolutions including: the sub-convolution layer, the up-sampling layer with one multiplying power and the sub-activation function layer are sequentially connected.

Optionally, the target detection model is based on a target sample image and its correspondingNTraining one of the individual labels, saidNThe labels are respectively corresponding to the target sample images and amplified 2 ^{N i-+1} And amplifying the images by times, and marking a detection frame true value in each amplified image.

Optionally, the training manner of the target detection model is as follows:

Optionally, training the target detection model based on the target sample image and the corresponding label for each batch to update the network weight parameters includes:

2 corresponding the prediction detection frame and the target sample image ^{N i-+1} Substituting true value of multiplied amplified image detection frame into preset loss function, and updating detection module and 2 under the condition that calculated loss value exceeds preset threshold value ^{N i-+1} The weight parameters of the super-division modules are respectively calculated, otherwise, the detection modules and 2 ^{N i-+1} The weight parameters of the super division modules are kept unchanged.

Fig. 10 illustrates a physical structure diagram of an electronic device, as shown in fig. 10, which may include: processor (processor) 101, communication interface (Communications Interface) 102, memory (memory) 103 and communication bus 104, wherein processor 101, communication interface 102, memory 103 accomplish the communication between each other through communication bus 104. The processor 101 may invoke logic instructions in the memory 103 to perform an adaptive target detection method comprising:

acquiring a target image to be detected;

further, the logic instructions in the memory 103 may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the adaptive target detection method provided by the above methods, the method comprising:

acquiring a target image to be detected;

in yet another aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor is implemented to perform the above provided adaptive target detection methods, the method comprising:

acquiring a target image to be detected;

the apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An adaptive target detection method, comprising:

acquiring a target image to be detected;

the pre-decision module is configured to calculate a variance of the target image to be detected, determine a sharpness level of the target image to be detected based on the variance, and input the target image to be detected into a superdivision module corresponding to the sharpness level in a superdivision module set, where the superdivision module is configured to perform superdivision processing on the target image to be detected, and the detection module is configured to perform target detection on the target image to be detected after the superdivision processing, so as to obtain the target detection frame, where the superdivision module set includes NSuper-division modules with different magnification factors, wherein the definition level is divided into low-to-highNGrade, the firstiTarget image to be detected with grade definition characteristics, and selecting and amplifying 2 ^{N i-+1} The super-division module performs super-processing,N ≥ 2，i = 1,2,…,Nwherein the definition level is related to the variance of the target image to be detected, the variance of the image is the mean value of the square of the degree of difference between the pixel value of the image and the mean value of the pixel of the image, the larger the variance of the image is, the higher the definition of the image is, and the variance of the image is divided from small to largeNNon-overlapping variance segments, theiVariance section corresponds toiDefinition grade, and corresponding magnification 2 ^{N i-+1} And a superdivision module.

2. The adaptive target detection method according to claim 1, wherein the pre-decision module is specifically configured to:

3. The method for adaptive target detection according to claim 1, wherein,N=3, the superdivision module set includes: the system comprises an amplification eight-time superdivision module, an amplification four-time superdivision module and an amplification two-time superdivision module, wherein the definition grades comprise three grades from low definition to high definition: the system comprises a first definition grade, a second definition grade and a third definition grade, wherein the first definition grade corresponds to an eight-times amplified superdivision module, the second definition grade corresponds to a four-times amplified superdivision module, and the third definition grade corresponds to a two-times amplified superdivision module.

4. The adaptive target detection method of claim 3, wherein the amplified octave superdivision module comprises: a first depth residual network module and a first sub-pixel convolution module;

5. The adaptive target detection method of claim 4, wherein the first residual module comprises: the first convolution network and the second convolution network with the same structure are connected through a first activation function layer, and the first convolution network and the second convolution network both comprise: a first convolution set, the first convolution set comprising: the method comprises the steps of sequentially connecting a 1×1×16 convolution layer, a 1×3×16 convolution layer, a 3×1×16 convolution layer and a 3×3×16 hole convolution layer with an expansion rate of 2, wherein the 3×3×16 hole convolution layer with the expansion rate of 2 of a first convolution group in a first convolution network is connected with a first activation function layer, and the first activation function layer is connected with the 1×1×16 convolution layer of the first convolution group in a second convolution network.

6. The adaptive target detection method of claim 5, wherein the first and second convolutional networks each further comprise: a second convolution set, a third convolution set, and a first aggregation layer;

7. The adaptive target detection method of claim 4, wherein the first sub-pixel convolution module comprises: three sets of sequentially connected first subpixel convolution sets, each first subpixel convolution set comprising: the sub-convolution layer, the two-multiplying-power up-sampling layer and the sub-activation function layer are sequentially connected.

8. The adaptive target detection method of claim 3, wherein the magnification quadruple superdivision module comprises: a second depth residual network module and a second sub-pixel convolution module;

9. The adaptive target detection method of claim 8, wherein the second residual module comprises: the third convolution network and the fourth convolution network with the same structure are connected through a second activation function layer, and each of the third convolution network and the fourth convolution network comprises: a fourth convolution set, the fourth convolution set comprising: the first activation function layer is connected with the first convolution layer, and the second activation function layer is connected with the first convolution layer, and the first activation function layer is connected with the second convolution layer.

10. The adaptive target detection method of claim 9, wherein the third convolutional network and the fourth convolutional network each further comprise: a fifth convolution set, a sixth convolution set, and a second convolution layer;

the sixth convolution set includes: a 1×1×32 convolutional layer;

11. The adaptive target detection method of claim 8, wherein the second sub-pixel convolution module comprises: two sets of sequentially connected second subpixel convolutions, each second subpixel convolutions including: the sub-convolution layer, the two-multiplying-power up-sampling layer and the sub-activation function layer are sequentially connected.

12. The adaptive target detection method of claim 3, wherein the double-amplification superdivision module comprises: a third depth residual network module and a third sub-pixel convolution module;

13. The adaptive target detection method of claim 12, wherein the third residual module comprises: two 3×3×16 convolutional layers and a third activation function layer, the two 3×3×16 convolutional layers being connected by the third activation function layer.

14. The adaptive target detection method of claim 12, wherein the third sub-pixel convolution module comprises: two sets of third subpixel convolutions connected in sequence, each third subpixel convolutions including: the sub-convolution layer, the up-sampling layer with one multiplying power and the sub-activation function layer are sequentially connected.

15. The adaptive target detection method of any one of claims 1 to 14, wherein the target detection model is based on a target sample image and its corresponding NTraining one of the individual labels, saidNThe labels are respectively corresponding to the target sample images and amplified 2 ^{N i-+1} And amplifying the images by times, and marking a detection frame true value in each amplified image.

16. The adaptive target detection method of claim 15, wherein the training of the target detection model is as follows:

17. The adaptive target detection method of claim 16, wherein training the target detection model based on target sample images and corresponding labels for each batch to update the network weight parameters comprises:

inputting the target sample image into the target detection model, and calculating the variance of the target sample image by the pre-decision module based on a target sample image Image variance determining a sharpness level of the target sample imagei；

18. An adaptive target detection apparatus, the apparatus comprising:

The pre-decision module is configured to calculate a variance of the target image to be detected, determine a sharpness level of the target image to be detected based on the variance, and input the target image to be detected into a superdivision module corresponding to the sharpness level in a superdivision module set, where the superdivision module is configured to perform superdivision processing on the target image to be detected, and the detection module is configured to perform target detection on the target image to be detected after the superdivision processing, so as to obtain the target detection frame, where the superdivision module set includesNSuper-division modules with different magnification factors, wherein the definition level is divided into low-to-highNGrade, the firstiTarget image to be detected with grade definition characteristics, and selecting and amplifying 2 ^{N i-+1} The super-division module performs super-processing,N ≥ 2，i = 1,2,…,Nthe definition level is related to the variance of the target image to be detected, the variance of the image is the mean value of the square of the difference degree between the pixel value of the image and the mean value of the pixels, the larger the variance of the image is, the higher the definition of the image is, and the variance of the image is divided from small to largeNNon-overlapping variance segments, theiVariance section corresponds toiDefinition grade, and corresponding magnification 2 ^{N i-+1} And a superdivision module.

19. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the adaptive object detection method according to any one of claims 1 to 17 when the program is executed.

20. A computer readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the steps of the adaptive target detection method according to any of claims 1 to 17.