CN112001448A

CN112001448A - Method for detecting small objects with regular shapes

Info

Publication number: CN112001448A
Application number: CN202010869854.9A
Authority: CN
Inventors: 王锡纲; 李杨; 赵育慧
Original assignee: Dalian Xinwei Technology Co ltd
Current assignee: Dalian Xinwei Technology Co ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2020-11-27

Abstract

The invention relates to the technical field of target detection, and provides a method for detecting small objects with regular shapes, which comprises the following steps: step 100, collecting an image of a target; 200, extracting a region containing a target in the image by using a target detection algorithm to obtain the classification and bounding box regression results of the target in the image; step 300, carrying out denoising processing on an image target region by using an image denoising algorithm; step 400, extracting the target contour by utilizing the generated countermeasure network to obtain a target contour image; 500, carrying out distortion correction on the obtained target contour image by using an image distortion correction algorithm; step 600, calculating the similarity of each region of the target contour image and the template image by adopting a block calculation target and template similarity calculation method; and 700, adopting an overall calculation target and template similarity probability algorithm to obtain the overall similarity probability of the target and the template. The invention can improve the accuracy and reliability of the detection of the small objects with regular shapes.

Description

Method for detecting small objects with regular shapes

Technical Field

The invention relates to the technical field of target detection, in particular to a method for detecting small objects with regular shapes.

Background

In recent years, target detection based on deep learning is receiving more and more attention as one of important research directions in the field of computer vision, and small object detection has been a difficult problem in deep learning convolutional neural network models.

The existing research generally detects a medium-sized or large-sized object in an image, and for a small object in the image, the small object has low resolution, and the extracted small object has unobvious features, so that the accuracy of a small object detection result is low. Moreover, small object detection is a fine-grained image analysis, the class detection is finer, the model is required to be capable of distinguishing different sub-class objects under the same large class with extremely high visual similarity, the difference of the fine-grained images is often only reflected in a fine part, and therefore the small object detection model needs to find a local area with the distinguishing degree in the object, the detection is more difficult, and the detection accuracy is lower.

Disclosure of Invention

The invention mainly solves the technical problem of low detection accuracy of small objects in the prior art, and provides a method for detecting small objects with regular shapes, so as to achieve the purposes of improving the accuracy and reliability of the detection of small objects with regular shapes and improving the detection efficiency.

The invention provides a method for detecting small objects with regular shapes, which comprises the following steps:

step 100, collecting an image of a target;

200, extracting a region containing a target in the image by using a target detection algorithm to obtain the classification and bounding box regression results of the target in the image;

step 400, extracting the target contour by utilizing the generated countermeasure network to obtain a target contour image;

step 600, calculating the similarity of each region of the target contour image and the template image by adopting a block calculation target and template similarity calculation method;

and 700, adopting an overall calculation target and template similarity probability algorithm to obtain the overall similarity probability of the target and the template.

Further, step 200 includes the following process:

step 201, performing convolution neural network extraction on the image to extract features;

step 202, performing primary classification and regression on the regional extraction network;

step 203, carrying out alignment operation on the candidate frame feature map;

and 204, classifying and regressing the target by using the convolutional neural network to obtain an extraction result of the target object.

Further, between step 200 and step 400, the method further comprises:

and 300, denoising the image target region by using an image denoising algorithm.

Further, between step 400 and step 600, the method further includes:

and 500, carrying out distortion correction on the obtained target contour image by using an image distortion correction algorithm.

Further, the network structure of the image distortion correction algorithm constructed by the image distortion correction algorithm comprises: a decoder and an encoder;

the decoder comprises an input layer, four convolutional blocks and four pooling layers; the rolling blocks and the pooling layers are distributed in a staggered manner;

the encoder comprises four transposed convolutions, four convolution blocks and an output layer; transpose convolution and convolution block staggered distribution;

wherein, the characteristic diagram obtained by the decoder and the characteristic diagram obtained by the encoder with the same size are subjected to jump connection according to a superposition mode.

Further, step 600 includes the following process:

step 601, constructing a network structure of a block calculation target and template similarity algorithm;

step 602, reasonably dividing the target contour image and the template image after the distortion correction into a plurality of areas, and calculating a target and template similarity algorithm network structure in pairs sequentially by blocks to obtain the similarity between the target contour image after the distortion correction and the template image;

step 603, calculating the Euclidean distance between the target contour image after the distortion correction and the template feature vector to obtain the similarity of each region of the target contour image after the distortion correction and the template image.

Further, the network structure of the block calculation target and template similarity algorithm comprises an input layer, a convolution block 1, a convolution block 2, a convolution block 3, a convolution block 4, a convolution layer and an output layer; the convolution block 1 comprises three convolution layers and a pooling layer; the convolution block 2 comprises three convolution layers and a pooling layer; the convolution block 3 comprises three convolution layers and a pooling layer; the volume block 4 includes two convolution layers and a pooling layer.

Further, step 700 includes the following process:

step 701, constructing an overall calculation target and template similarity probability algorithm network structure;

step 702, taking the similarity of the target contour image after the finger distortion correction and each area of the template image as input data together, and transmitting the input data to an overall calculation target and template similarity probability algorithm network structure;

and 703, multiplying the similarity between the areas by the corresponding weight, and calculating to obtain the overall similarity probability of the target and the template.

Furthermore, the overall calculation target and template similarity probability algorithm network structure comprises an input layer, a neural network layer and an output layer, wherein the neural network layer comprises a full connection layer.

The invention provides a method for detecting small objects with regular shapes, which extracts the area containing the target in the image through a target detection algorithm, denoising the image target region by an image denoising algorithm, completely extracting the contour of the target by a target contour extraction algorithm, the position, the direction and the size of the target shape are normalized by an image distortion correction algorithm, the corrected target and the template image are compared in a blocking way by a blocking calculation target and template similarity calculation method, by calculating the influence weight of each region on the overall similarity judgment of the object and combining the similarity result of each region, therefore, the similarity of the two graphs is comprehensively evaluated, whether the two small objects have difference or not is judged, the small objects with regular shapes are detected, and the accuracy and the reliability of the detection of the small objects with regular shapes are improved. When the method is used for detecting small objects with regular shapes, the method is less influenced by the illumination intensity, shadow, background interference and the like of the images, has strong adaptability to the images and does not depend on specific image scenes. The invention can detect small objects with different shapes, and has high accuracy and stability of detection results.

Drawings

FIG. 1 is a flow chart of an implementation of a method for detecting a small object with a regular shape according to the present invention;

FIG. 2 is a schematic diagram of a feature pyramid network structure;

FIG. 3 is a schematic view of a bottom-up configuration;

FIG. 4 is a schematic diagram of the generation of a feature map for each stage in a bottom-up configuration;

FIG. 5 is a schematic diagram of a regional extraction network architecture;

FIG. 6 is an effect diagram of an alignment operation performed on a feature map;

FIG. 7 is a schematic diagram of a classification, regression network structure;

FIG. 8 is a schematic diagram of a network structure of an image denoising algorithm;

FIG. 9 is a schematic diagram of generating a countermeasure network structure;

FIG. 10 is a schematic diagram of a generating network structure;

FIG. 11 is a schematic diagram of a discrimination network architecture;

FIG. 12 is a schematic diagram of a network architecture of an image distortion correction algorithm;

FIG. 13 is a schematic diagram of a block computation target and template similarity algorithm network structure;

FIG. 14 is a schematic diagram of an overall computational target and template similarity probability algorithm network structure;

Detailed Description

In order to make the technical problems solved, technical solutions adopted and technical effects achieved by the present invention clearer, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings.

Fig. 1 is a flowchart of an implementation of a method for detecting a small object with a regular shape according to an embodiment of the present invention. As shown in fig. 1, the method for detecting a small object with a regular shape provided by the embodiment of the present invention includes the following steps:

step 100, acquiring an image of a target.

This step may be performed by capturing an image of the target with a camera.

And 200, extracting a region containing the target in the image by using a target detection algorithm to obtain the classification and bounding box regression results of the target in the image.

The region containing the target in the image is extracted in the step, so that the influence of the background or other objects on the result of calculating the similarity between the target and the template can be reduced. The method comprises the following steps:

step 201, performing convolution neural network extraction on the image to extract features.

In the step, the size of the target in the image is considered to be different, so a multi-scale feature extraction scheme, namely a feature pyramid network, is adopted. The feature pyramid network structure is shown in fig. 2.

The feature pyramid network is divided into two parts of structure. The left structure is called the bottom-up structure, which yields a feature map of different dimensions, as shown at C1 through C5. C1 to C5 are feature maps with different scales, and the feature map size becomes smaller from bottom to top, which also means that the feature dimension of extraction is higher and higher. The shape is pyramid-shaped, thus becoming a characteristic pyramid network. The right structure is called a top-down structure and respectively corresponds to each layer of characteristics of the characteristic pyramid, and arrows of characteristic processing connection at the same level between the two structures are transversely connected.

The purpose of doing so is because the higher-level characteristic that the size is less has more semantic information, and the lower-level characteristic semantic information that the size is great is few but positional information is many, through such connection, the characteristic map of each layer has all fused the characteristic of different resolutions and different semantic intensity, therefore when detecting the object of different resolutions, detection effect can obtain promoting.

The bottom-up structure is shown in fig. 3, and the network structure includes five stages, each for calculating a feature map of a different size, with a scaling step size of 2. The principle of generating a profile at each stage is shown in fig. 4. We use the C1, C2, C3, C4, C5 feature maps of each stage output for constructing a feature pyramid network structure.

The top-down structure is shown on the right side of the word tower network structure in fig. 2. Firstly, the high-level feature map with stronger semantic information is up-sampled to obtain the same size as the low-level feature map. Then, the feature maps in the bottom-up and top-down structures having the same dimensions are connected laterally. And combining the two feature map mappings in an element addition mode. Finally, in order to reduce aliasing effect caused by up-sampling, a convolution layer is added to each combined feature map to obtain final feature maps, namely P2, P3, P4 and P5.

Step 202, the regional extraction network performs a preliminary classification and regression.

The area extraction network structure is shown in fig. 5. Based on the feature maps P2, P3, P4 and P5 obtained by the feature pyramid network, firstly, generating an anchor frame of the original map corresponding to each point on the feature map according to an anchor frame generation rule, then inputting the P2, P3, P4 and P5 feature maps into an area extraction network, wherein the area extraction network comprises a convolution layer and a full connection layer, and finally obtaining classification and regression results of each anchor frame, wherein the classification and regression results specifically comprise foreground and background classification scores of each anchor frame and boundary frame coordinate correction information of each anchor frame. And finally, selecting an anchor frame meeting the foreground score condition according to the threshold value and correcting the boundary frame, wherein the corrected anchor frame is called a candidate frame.

And step 203, performing alignment operation on the candidate frame feature map.

And obtaining candidate frames meeting the score requirement through a regional extraction network, and mapping the candidate frames back to the feature map. And obtaining the layer number of the feature map corresponding to the candidate frame according to the following formula:

wherein w represents the width of the candidate frame, h represents the height of the candidate frame, k represents the number of feature layer layers corresponding to the candidate frame, and k represents the number of feature layer layers corresponding to the candidate frame₀The number of layers mapped when w, h is 224 is generally 4, i.e., P4 layers. Then, a feature map corresponding to the candidate frame is obtained by a bilinear interpolation method, and the obtained feature map is consistent in size. The effect of the alignment operation on the feature map is shown in fig. 6.

The classification and regression network structure is shown in fig. 7. And calculating the classification score and the coordinate offset of the candidate frame through classification and regression networks based on the obtained candidate frame feature map with the fixed size, and performing boundary frame correction on the candidate frame. Finally, classification and bounding box regression results of the targets in the image can be obtained through a target detection algorithm.

The image can be subjected to denoising processing in the target region through an image denoising algorithm, so that the image quality is improved, and the target contour can be more completely and accurately extracted through a subsequent algorithm.

The image denoising algorithm is mainly used for processing an image through a low-layer feature extraction network, a dense residual block network and a feature fusion network in sequence. The network structure of the image denoising algorithm is shown in fig. 8. In the dense residual block network, each dense residual block fully utilizes the hierarchical information of all the convolutional layers, extracts rich convolutional features through densely connecting the convolutional layers and retains the accumulated features. The feature fusion network uses global residual, and combines shallow features and deep features to obtain global dense features. Therefore, in the process of removing noise from the image, the detail information of the image itself is not lost, and the recovered image is a high-quality image.

And the low-layer feature extraction network extracts the low-layer features by using two layers of convolution layers, wherein the output feature map of the second layer of convolution layers is used as the input feature map of the dense residual block network.

The dense residual block network comprises 3 dense residual blocks, and the 3 dense residual blocks can extract 3 features of different levels. As shown in the above figure, each dense residual block contains 4 convolutional layers. In the dense residual block, an input feature map is input into each convolution layer, the output of each convolution layer is also used as the input of the subsequent convolution layer, and finally, the two feature maps are mapped and merged according to the element addition mode of the input feature map and the output feature map of the final convolution layer for local feature fusion, so that the dense connection aims to keep the previous features in the feature extraction process, namely, the detail features such as the inherent texture of the image and the like do not disappear along with the deepening of the network.

The feature fusion network comprises three convolutional layers, and the network fuses deep features of different layers extracted by three dense residual blocks and low-layer features extracted by the first convolutional layer, so that global fusion features are obtained, and finally high-quality recovery is performed on the image without noise.

And 400, extracting the target contour by using the generated countermeasure network to obtain a target contour image.

In step 300, a high quality image containing the target is obtained. Since the image background and the target may have information of various materials, textures and the like, the complex scene brings great interference to the extraction of the target contour. Therefore, the contour of the target can be completely extracted in the step, so that the interference of information such as material and texture of the target on similarity calculation is eliminated.

In the step, a countermeasure network is generated to extract the target contour, and a target contour image is obtained. The resulting antagonistic network structure is shown in fig. 9. The generation of the countermeasure network includes two structures, a generation network and a discrimination network. The generation network is used to generate images, and the purpose of the generation network is to generate real images as much as possible, so that the discrimination network cannot judge whether the images are true or false. The discrimination network compares the image generated by the generating network with the real image to discriminate whether the generated image is a real image or not, in order to distinguish the generated image from the real image as much as possible, so that the generating network and the discrimination network constitute a dynamic game process. The game process can be iterated until the generation network and the judgment network can not promote themselves, namely the judgment network can not judge whether a picture is generated or real and ends, and the generation network is a perfect generation model and can be used for generating the picture.

Global information can be grasped more accurately by approximating a real image through the discrimination network instead of directly using a loss function. The purpose of this is to make the generating network generate a more accurate and clear image of the contour of the object. Because the subsequent similarity comparison calculation of the target is based on the target contour image obtained here, the more accurate the extracted target contour is, the more credible the comparison result is.

The resulting network structure is shown in fig. 10. The resulting network structure is a deep convolutional neural network. The method comprises six convolutional layers, wherein different convolutional layers are used for extracting the features of different levels of the image, and finally the target contour image is generated.

The discriminating network structure is shown in fig. 11. The discrimination network structure comprises an input layer, five convolution layers, four pooling layers and an output layer, wherein the convolution layers and the pooling layers are distributed in a staggered mode. The network input is judged as an image, and the authenticity probability of the image is output. The convolution layer is used for extracting features, and the pooling layer carries out down-sampling to finally obtain the true and false probability of the image.

Since the target itself may have the conditions of inclination, distortion and the like, after the target contour image is obtained, the position, the direction and the size of the target shape are normalized through an image distortion correction algorithm, so that the calculation of the degree of identity is performed in the same scale space.

The network structure of the image distortion correction algorithm constructed in this step is shown in fig. 12. The network architecture of the image distortion correction algorithm includes a decoder and an encoder. The decoder structure is on the left and the encoder structure is on the right, and between the two structures, the arrow of the feature processing connection at the same level is a jump connection. The purpose of doing so is to ensure that the finally corrected image not only fuses the geometric detail information of the lower layer, but also fuses the features of different resolutions and different semantic strengths, so that the finally obtained detailed information of the corrected image edge and the like is more complete.

The decoder comprises an input layer, four rolling blocks and four pooling layers; the volume blocks and the pooling layers are distributed in a staggered manner. The purpose of the pooling layer is to obtain features of different scales by downsampling with a scaling step of 2, for a total of four downsamplings performed by the decoder as shown in fig. 12. The purpose of the rolling block is to achieve extraction of features of different scales. The data processed by the decoder is input into the encoder structure after being processed by the rolling block.

The encoder restores the feature map to the resolution of the original image mainly by means of transposed convolution, convolution blocks and jump connections. The encoder comprises four transposed convolutions, four convolution blocks and an output layer; transpose convolution and convolution block interleaving distributions. The purpose of the transposed convolution is to restore the feature map to the original image size step by step through upsampling, so as to realize pixel level regression, wherein the scaling step size is 2, and the four upsampling processes are performed in total by four times in symmetry with a decoder. The feature maps obtained by the decoder and the encoder having the same size are subjected to skip connection in a superimposed manner. The convolution block then fuses the features. Finally, an image after rectification is obtained.

And step 600, calculating the similarity of the target contour image after the finger distortion correction and each region of the template image by adopting a block calculation target and template similarity calculation method.

The detection of small objects is a fine grained identification. There may be differences in any area of the target and template images. In order to improve the accuracy of the algorithm in the link, the corrected target is compared with a template image prestored in a database in a blocking mode, and the similarity of each region is calculated respectively.

Generally, the similarity of two graphs is calculated and compared integrally, but in the invention, in order to more accurately judge whether the difference exists between two small objects, an image blocking comparison method is adopted, so that a fine difference between the small objects can be found, and the fine granularity identification is carried out. The method comprises the following specific steps:

step 601, constructing a network structure of a block calculation target and template similarity algorithm.

The network structure of the block calculation target and template similarity algorithm constructed in this step is shown in fig. 13. The network structure of the block calculation target and template similarity algorithm comprises an input layer, a convolution block 1, a convolution block 2, a convolution block 3, a convolution block 4, a convolution layer and an output layer; the convolution block 1 comprises three convolution layers and a pooling layer; the convolution block 2 comprises three convolution layers and a pooling layer; the convolution block 3 comprises three convolution layers and a pooling layer; the volume block 4 includes two convolution layers and a pooling layer.

Step 602, reasonably dividing the target contour image and the template image after the distortion correction into a plurality of areas, and calculating the similarity of the target and the template network structure in pairs sequentially by blocks to obtain the similarity of the target contour image after the distortion correction and the template image.

A first convolution block in a network structure of a block calculation target and a template similarity algorithm is mainly used for extracting low-level detail features of an image. As the number of convolutional layers increases, the network can extract high-level semantic features. Then, through a convolution layer, a feature vector which has the distinguishing capability and contains the spatial position relation is obtained.

The similarity between the target and each region of the template image is obtained through step 600. The overall similarity probability is calculated by adopting a mean algorithm, but considering that the influence weights of all the regions on the similarity discrimination between the objects are different, the weight of all the regions on the overall similarity is obtained by adopting a neural network, so that the overall discrimination similarity probability is obtained. Step 700 includes the following process:

step 701, constructing a network structure of the overall calculation target and template similarity probability algorithm.

The overall calculation target and template similarity probability algorithm network structure is shown in fig. 14. The overall calculation target and template similarity probability algorithm network structure comprises an input layer, a neural network layer and an output layer, wherein the neural network layer comprises a full connection layer.

And 702, taking the similarity of the target contour image after the finger distortion correction and each area of the template image as input data, and transmitting the input data to the network structure of the overall calculation target and template similarity probability algorithm.

The neural network comprises a fully connected layer, and the layer is used for learning the weight of each region for judging the overall similarity.

In the step, the influence weight of each region on the overall similarity judgment of the object is calculated, and the similarity result of each region is combined, so that the similarity of two graphs is comprehensively evaluated, and whether the two small objects are different or not is judged. Finally, the invention can detect small objects with regular shapes.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: modifications of the technical solutions described in the embodiments or equivalent replacements of some or all technical features may be made without departing from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for detecting small objects with regular shapes is characterized by comprising the following steps:

step 100, collecting an image of a target;

2. The method for detecting small regular-shaped objects according to claim 1, wherein the step 200 comprises the following steps:

step 203, carrying out alignment operation on the candidate frame feature map;

3. The method for detecting the small regular-shaped object according to claim 1, wherein between the step 200 and the step 400, the method further comprises the following steps:

4. The method for detecting the small regular-shaped object according to claim 1, wherein between the step 400 and the step 600, the method further comprises the following steps:

5. The method for detecting the small object with the regular shape according to claim 4, wherein the network structure of the image distortion correction algorithm constructed by the image distortion correction algorithm comprises: a decoder and an encoder;

6. The method for detecting small regular-shaped objects according to claim 1, wherein the step 600 comprises the following processes:

7. The shape rule small object detection method according to claim 6, wherein the block computation target and template similarity algorithm network structure comprises an input layer, a convolution block 1, a convolution block 2, a convolution block 3, a convolution block 4, a convolution layer, and an output layer; the convolution block 1 comprises three convolution layers and a pooling layer; the convolution block 2 comprises three convolution layers and a pooling layer; the convolution block 3 comprises three convolution layers and a pooling layer; the volume block 4 includes two convolution layers and a pooling layer.

8. The method for detecting a small regular-shaped object according to claim 1, wherein the step 700 comprises the following steps:

9. The method according to claim 8, wherein the overall calculation target and template similarity probability algorithm network structure comprises an input layer, a neural network layer and an output layer, and the neural network layer comprises a fully connected layer.