CN114529462A

CN114529462A - Millimeter wave image target detection method and system based on improved YOLO V3-Tiny

Info

Publication number: CN114529462A
Application number: CN202210025035.5A
Authority: CN
Inventors: 陈国平; 彭之玲; 王馨; 陈茹; 冯志宇
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-01-11
Filing date: 2022-01-11
Publication date: 2022-05-24

Abstract

The invention discloses a millimeter wave image target detection method and system based on improved YOLO V3-Tiny. The method comprises the following steps: acquiring a millimeter wave image to be detected; inputting the preprocessed image into an improved feature extraction network, extracting features of the millimeter wave image, and generating a feature map; sending the feature graph to a feature fusion layer module for feature fusion; inputting the feature map after feature fusion into a prediction layer for decoding; outputting a final result according to the decoding information; according to the invention, the characteristic extraction network is optimized, the convolutional layer is added to enhance the characteristic extraction capability of the network, and meanwhile, an attention mechanism is introduced into the characteristic fusion layer, so that the network automatically learns effective characteristics on an important characteristic layer, and the influence of redundant information such as background noise in a millimeter wave image on a result is reduced. The efficiency of millimeter wave image target detection is improved under the condition that excessive parameters are not increased, and the efficiency and the quality of detection are improved.

Description

Improved YOLO V3-Tiny-based millimeter wave image target detection method and system

Technical Field

The invention belongs to the technical field of article identification in a three-dimensional millimeter wave security inspection imaging system, and relates to an article detection method based on YOLO V3-Tiny.

Background

In recent years, terrorist activities in the world are endless, and it is important to ensure personal and property safety of people in public places such as airports, stations and the like. The millimeter wave is used for efficient and humanized security inspection, and is a research hotspot in the field of security inspection at present. In a millimeter wave security inspection system, the efficient and quick target detection method can save cost, improve efficiency and ensure the safety of a security inspection process.

In the field of target detection, the traditional target detection method has many limitations, and the poor generalization capability also does not meet the detection requirement of complex targets in millimeter wave images. The deep learning model extracts the characteristics of the image by using the convolutional neural network, and the development of target detection is greatly promoted. However, many small targets exist in the millimeter wave image, the resolution of the image is low, and the boundary with the background is not clear, which brings certain difficulty to the target detection of the millimeter wave image. The current target detection algorithm mainly comprises a two-stage algorithm and a single-stage algorithm, wherein the two-stage algorithm is, for example, Faster R-CNN, a candidate region is extracted from an image, and then target detection is carried out. Although the method has higher precision, the real-time performance is poor, and the model has higher requirements on the calculation power of equipment. The single-stage algorithm such as YOLO V3 can obtain a result through single calculation, and the speed is greatly improved compared with the two-stage algorithm, but the precision is slightly poor. YOLO V3-Tiny is a light single-stage algorithm, the real-time performance of the algorithm is good, the parameter quantity is small, a network model is simple, the extraction capability of millimeter wave image features is not enough, and the precision is yet to be improved.

Through retrieval, application publication No. CN112966700A, a millimeter wave image target detection method relates to the technical field of computer vision, wherein the method comprises the following steps: acquiring original millimeter wave image data; recovering three-dimensional space structure data of the millimeter wave image according to the data format of the original millimeter wave image data, and compressing the three-dimensional space structure data into two-dimensional plane data; denoising the two-dimensional plane data, and standardizing the denoised data; manufacturing a millimeter wave data set, analyzing the characteristics of the data in the millimeter wave data set, and selecting a deep learning model according to the characteristics of the data; training and testing the selected deep learning model by using a millimeter wave data set to obtain a test result of the deep learning model; and optimizing the deep learning model according to the test result and the evaluation index to obtain an optimal model. By adopting the scheme, the technical problem of positioning and identifying dangerous goods in the active millimeter wave image is solved, so that the efficiency of human body security inspection in public places can be improved.

The invention adopts a two-stage algorithm, such as models of Faster R-CNN and the like, and the algorithm has larger model parameters and short detection time. According to the millimeter wave image target detection method and system based on YOLO V3-Tiny, the improved YOLOV3-Tiny model belongs to a lightweight model, detection can be performed on equipment with insufficient computing power, the detection time is greatly reduced, and the requirement of real-time monitoring in a security inspection environment is met.

Application publication No. CN105513035A, a method for detecting concealed objects in a passive millimeter wave image, comprising: the image preprocessing step is used for carrying out image enhancement on the acquired original passive millimeter wave image by adopting a triple iterative enhancement method after interpolation reconstruction, and acquiring a passive millimeter wave image so as to obviously distinguish a passive millimeter wave image target from a background; the human body region detection step is to detect a human body target by pre-judging whether a human body exists or not based on the preprocessed passive millimeter wave image to obtain a human body region; the hidden object detection step is used for detecting the hidden objects in the human body region by adopting a twice iterative segmentation method and marking the detected hidden object region.

The above patent is detection of human hidden objects in passive millimeter wave images, and passive millimeter wave images are blurred, wherein the imaged hidden objects lack detailed information, and it is difficult to distinguish which objects are. The invention discloses a target detection method in an active millimeter wave imaging, which can extract more effective information from an image and more accurately mark the position and the type of a dangerous article. And the invention uses the lightweight model, the model parameter is greatly reduced compared with the two-stage algorithm, and the detection requirement of the case environment is better satisfied.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. A millimeter wave image target detection method and system based on improved YOLO V3-Tiny is provided. The technical scheme of the invention is as follows:

a millimeter wave image target detection method based on improved YOLO V3-Tiny comprises the following steps:

acquiring a millimeter wave image to be detected and carrying out pretreatment including denoising and normalization;

inputting the preprocessed millimeter wave image into a modified YOLO V3-Tiny model for detection; the improved feature extraction network adds a part of convolution layers of 3 multiplied by 3 and 1 multiplied by 1, extracts the features of the millimeter wave image and generates a feature map; sending the feature map to an improved feature fusion layer module for feature fusion; the improved feature fusion layer is an FPN feature fusion layer introducing a CBAM attention mechanism;

inputting the feature map after feature fusion into a YOLO prediction layer, decoding to obtain a final prediction result, decoding the prediction frame information according to the prior frame coordinate, and setting the prior frame through a K-means algorithm.

Further, the acquiring a millimeter wave image to be detected and performing preprocessing including denoising and normalization steps specifically includes:

the millimeter wave image to be detected is obtained, the millimeter wave image is preprocessed, the preprocessing process comprises the step of denoising the image, and interpolation sampling is unified to be 608 pixels by 608 pixels.

Further, the feature extraction network includes 15 convolutional layers, 6 pooling layers:

the first layer is a convolution layer and comprises 16 convolution kernels, the size of each convolution kernel is 3 multiplied by 3, and the step length is 1;

the second layer is a maximum pooling layer, the sampling core is 2 multiplied by 2, and the step length is 2;

the third layer is a convolution layer and comprises 32 convolution kernels, the size of each convolution kernel is 3 multiplied by 3, and the step length is 1;

the fourth layer is a maximum pooling layer, the sampling core is 2 multiplied by 2, and the step length is 2;

the fifth layer is a convolution layer and comprises 64 convolution kernels, the size of each convolution kernel is 3 multiplied by 3, and the step length is 1;

the sixth layer is a maximum pooling layer, the sampling core is 2 multiplied by 2, and the step length is 2;

the seventh layer is a convolution layer and comprises 128 convolutions, the size of a convolution kernel is 3 multiplied by 3, and the step size is 1;

the eighth layer is a convolution layer and comprises 64 convolution kernels, the size of each convolution kernel is 3 multiplied by 3, and the step length is 1;

the ninth layer is a convolutional layer and comprises 128 convolutional kernels, the size of the convolutional kernels is 3 multiplied by 3, and the step length is 1;

the tenth layer is a maximum pooling layer, the sampling core is 2 multiplied by 2, and the step length is 2;

the eleventh layer is a convolutional layer and comprises 256 convolutional kernels, the size of the convolutional kernels is 3 multiplied by 3, and the step length is 1;

the twelfth layer is a convolutional layer and comprises 128 convolutional kernels, the size of the convolutional kernels is 3 multiplied by 3, and the step length is 1;

the thirteenth layer is a convolutional layer and comprises 256 convolutional kernels, the size of the convolutional kernels is 3 multiplied by 3, and the step length is 1;

the fourteenth layer is a convolutional layer and comprises 128 convolutional kernels, the size of the convolutional kernels is 3 multiplied by 3, and the step length is 1;

the fifteenth layer is a convolutional layer and comprises 256 convolutional kernels, the size of each convolutional kernel is 3 multiplied by 3, and the step length is 1;

the sixteenth layer is a maximum pooling layer, the sampling core is 2 multiplied by 2, and the step length is 2;

the seventeenth layer is a convolutional layer and comprises 512 convolutional kernels, the size of each convolutional kernel is 3 multiplied by 3, and the step length is 1;

the eighteenth layer is a maximum pooling layer, the sampling core is 2 multiplied by 2, and the step length is 1;

the nineteenth layer is a convolutional layer and comprises 1024 convolutional kernels, the size of each convolutional kernel is 3 multiplied by 3, and the step length is 1;

the twentieth layer is a convolution layer and comprises 256 convolution kernels, the size of each convolution kernel is 1 multiplied by 1, and the step length is 1;

the twenty-second layer is a convolution layer and comprises 512 convolution kernels, the size of each convolution kernel is 3 multiplied by 3, and the step length is 1;

further, the improved feature fusion layer is an FPN feature fusion layer introducing a CBAM attention mechanism, and the feature fusion step comprises the following steps:

the feature extraction network outputs from the eleventh layer and passes through a CBAM attention mechanism module, then an up-sampling operation is carried out on the feature map, feature fusion is carried out on the feature map obtained by the feature extraction network outputs from the seventh layer and passes through the CBAM attention mechanism module, and a first prediction layer with the size of 76 multiplied by 76 is obtained after convolution operation;

the twenty-first layer output of the feature extraction network passes through a CBAM (cone beam amplitude modulation) attention mechanism module, then an up-sampling operation is carried out on the feature map, feature fusion is carried out on the feature map obtained by the feature extraction network output from the fifteenth layer through the CBAM attention mechanism module, and a second prediction layer with the size of 38 multiplied by 38 is obtained after convolution operation;

the twenty-first layer output of the feature extraction network passes through a CBAM attention mechanism module, and then a third prediction layer is obtained after convolution operation, wherein the size of the third prediction layer is 19 multiplied by 19;

the CBAM attention mechanism is an attention mechanism with mixed channel dimensionality and space dimensionality, and can be used for learning attention in channels and spaces for important feature image layers, the FPN feature fusion layer is a pyramid feature fusion layer, and the high-level feature image is subjected to up-sampling and then is subjected to same-dimensional splicing with the low-level feature image to obtain a feature image with high-level and bottom-level information fused.

Further, the multiple prior frames [ x, y ] are specifically 9 prior frames, and respectively correspond to 3 feature maps with different sizes. A priori box of size [35,54], [48,42], [59,53] corresponds to prediction layer one, size 76 × 76; a priori box of size [64,70], [78,92], [87,64] corresponds to prediction layer two, size 38 × 38; the a priori blocks of size [54,103], [102,48], [107,75] correspond to prediction layer three, which is 19 x 19 in size.

A millimetre wave image target detection system based on YOLO V3-Tiny, which comprises; the system comprises a millimeter wave image preprocessing module, a feature extraction network module, a feature fusion layer module, a classification prediction module and a classification result output module;

the millimeter wave image preprocessing module is used for acquiring a millimeter wave image and preprocessing the millimeter wave image; the preprocessing process comprises the steps of denoising the millimeter wave image and filling interpolation to 608 multiplied by 608 pixel size; sending the preprocessed millimeter wave image to a feature extraction backbone network module;

the feature extraction network module is used for extracting features of the preprocessed millimeter wave image to obtain a feature map, and sending the feature map to the feature fusion layer module for feature fusion;

the feature fusion layer module is used for fusing feature map information of different dimensions of the feature extraction network, and the improved feature fusion layer is an FPN feature fusion layer introduced with a CBAM (cone beam-based adaptive amplitude modulation) attention mechanism; obtaining a feature map with prediction information, and sending the feature map to a classification prediction module;

the classification prediction module is used for decoding and predicting the prediction layer, decoding the coordinate information according to a pre-selection frame which is set in advance to obtain a final classification result, and sending the classification result to the classification result output module;

and the classification result output module is used for outputting a classification result.

The invention has the following advantages and beneficial effects:

the invention provides a millimeter wave image target detection method based on improved YOLO V3-Tiny. The YOLOv3-Tiny model is used as a basic network, the network is improved according to the characteristics of the millimeter wave image, and the target detection efficiency of the millimeter wave image is improved under the condition that excessive parameters are not increased. The modification of the feature extraction network comprises the step of adding a part of convolution layers to enhance the feature extraction capability of the network, the added convolution layers are convolution of 3 x 3 and 1 x 1, the added convolution layers of 3 x 3 can extract more features, the 1 x 1 convolution layers map the feature map to a lower dimension so as not to generate excessive parameters, and meanwhile, the feature fusion of a shallow layer is added to improve the detection effect of small targets. The improvement of the feature fusion layer comprises the steps of introducing an attention mechanism into the original FPN feature fusion, respectively passing input high-dimensional and low-dimensional feature graphs through a CBAM attention mechanism, then carrying out same-dimensional splicing, and then passing fused feature graphs through a CBAM attention mechanism. The novel feature fusion structure can enable a network to automatically learn the spatial relationship and the channel relationship on the important feature layer, and reduce the influence of redundant information such as background noise in the millimeter wave image on the result. The YOLO V3-Tiny model is improved in that excessive parameters are not increased while the detection precision is improved, the higher detection precision is obtained under the condition that the detection speed is kept, meanwhile, the lightweight model does not depend on GPU and other computing power equipment, the device has better adaptability, and more possibilities are provided for the miniaturization research of the millimeter wave security inspection system.

Drawings

FIG. 1 is a schematic diagram of the present invention providing a preferred embodiment of the present invention for an improved target detection method;

FIG. 2 is a flow chart of an improved target detection method of the present invention;

FIG. 3 is a schematic diagram of the network architecture of the present invention;

FIG. 4 is a diagram of a feature pyramid structure;

FIG. 5 is a schematic diagram of a CBAM attention mechanism;

FIG. 6 is a schematic diagram of the improved feature pyramid of the present invention;

FIG. 7 is a detection display diagram of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

as shown in fig. 1, a detection process of the millimeter wave image target detection method includes inputting an image, generating a corresponding feature map through a feature extraction network, performing feature fusion, regression prediction, and finally obtaining a detection result and displaying the detection result in the image.

As shown in FIG. 2, a millimeter wave image target detection method based on YOLO V3-Tiny comprises the following steps; the system comprises a millimeter wave image preprocessing module, a feature extraction network module, a feature fusion layer module, a classification prediction module and a classification result output module;

the preprocessing module is used for preprocessing an original millimeter wave image, and comprises denoising and interpolation filling until the size of 608 multiplied by 608 pixels is obtained;

the feature fusion layer module is used for fusing feature map information of different dimensions of the feature extraction network to obtain a feature map with prediction information, and sending the feature map with the prediction information to the classification prediction module;

A millimeter wave image target detection method based on YOLO V3-Tiny is structurally shown in FIG. 3, and comprises the following steps:

s1: acquiring a millimeter wave image to be detected, and preprocessing the millimeter wave image, wherein the preprocessing process comprises denoising the image and unifying the interpolation sampling into 608 x 608 pixel size;

s2: inputting the preprocessed millimeter wave image into a modified YOLO V3-Tiny model for detection, wherein the structure of the modified model is shown in FIG. 3;

s21: inputting the preprocessed millimeter wave image into an improved YOLO V3-Tiny feature extraction network to extract the features of the image, and generating a corresponding feature map. The improved method of the feature extraction network is to add a series of convolution layers of 3 x 3 and 1 x 1 on the basis of YOLOv 3-Tiny. The purpose of the 3 × 3 convolutional layer is to extract features, and the 1 × 1 convolutional layer maps the feature map to a lower dimension, so that a large number of operation parameters caused by too many convolutional layers are avoided. The method can reduce the input channel quantity of the network and effectively improve the detection precision of the network under the condition of not increasing excessive parameters. The final improved feature extraction network has 15 convolutional layers and 6 pooling layers, and the 608 × 608 size images generate 4 feature maps through the feature extraction network and are input into the next network module.

S22: and inputting the feature graphs generated by the feature extraction network into an improved feature fusion layer for feature fusion, and outputting 3 fused feature graphs with different sizes. Feature Pyramid (FPN) as shown in fig. 4, the high-level feature map is upsampled and then merged with the low-level feature map in the same dimension, which is only simple to merge in the channel dimension and cannot reflect the importance and correlation between different level feature maps. A convolutional attention module (CBAM), as shown in fig. 5, is a mixed attention mechanism of channel dimension and spatial dimension, which learns the spatial and channel information of the previous layer and obtains the corresponding weight. A CBAM attention mechanism is introduced into the FPN, and channel and spatial attention learning is performed on the important feature map layer, so that the accuracy of the network can be effectively improved under the condition that only a small number of calculation parameters are added, and the improved feature pyramid structure is shown in fig. 6.

S23: inputting the fused 3 feature maps into a YOLO prediction layer, decoding to obtain a final prediction result, decoding the prediction frame information according to the prior frame coordinate, and setting the prior frame through a K-means algorithm.

S3: according to the output result of the improved YOLO V3-Tiny model, marking the position of a contraband on the image, and displaying the type and the position of the object on a screen;

the detection result of the improved algorithm for the dangerous goods in the millimeter wave image is shown in fig. 7, and it can be found that the improved algorithm of the embodiment has better detection performance.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the present invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A millimeter wave image target detection method based on improved YOLO V3-Tiny is characterized by comprising the following steps:

acquiring a millimeter wave image to be detected and carrying out preprocessing including denoising and normalization;

2. The millimeter wave image target detection method based on the improved YOLO V3-Tiny as claimed in claim 1, wherein the millimeter wave image to be detected is obtained and is subjected to preprocessing including denoising and normalization, and the preprocessing specifically includes:

3. The improved YOLO V3-Tiny-based millimeter wave image target detection method as claimed in claim 1, wherein the feature extraction network comprises 15 convolutional layers, 6 pooling layers:

the twelfth layer is a convolution layer and comprises 128 convolution kernels, the size of the convolution kernels is 3 multiplied by 3, and the step length is 1;

the twentieth layer is a convolution layer and comprises 256 convolution kernels, the sizes of the convolution kernels are 1 multiplied by 1, and the step length is 1;

the twenty-second layer is a convolutional layer, which contains 512 convolutional kernels, the size of which is 3 × 3, and the step size is 1.

4. The millimeter wave image target detection method based on the improved YOLO V3-Tiny according to claim 3, wherein the improved feature fusion layer is an FPN feature fusion layer introduced into CBAM attention mechanism, and the feature fusion step comprises:

5. The improved YOLO V3-Tiny-based millimeter wave image target detection method according to claim 4, wherein the multiple prior frames [ x, y ] are specifically 9 prior frames, and correspond to 3 feature maps with different sizes respectively. A priori box of size [35,54], [48,42], [59,53] corresponds to prediction layer one, size 76 × 76; a priori box of size [64,70], [78,92], [87,64] corresponds to prediction layer two, size 38 × 38; the a priori blocks of size [54,103], [102,48], [107,75] correspond to prediction layer three, which is 19 x 19 in size.

6. A millimeter wave image target detection system based on YOLO V3-Tiny is characterized by comprising; the system comprises a millimeter wave image preprocessing module, a feature extraction network module, a feature fusion layer module, a classification prediction module and a classification result output module;

the feature fusion layer module is used for fusing feature map information of different dimensions of the feature extraction network, and the improved feature fusion layer is an FPN feature fusion layer introduced with a CBAM attention mechanism; obtaining a feature map with prediction information, and sending the feature map to a classification prediction module;

the classification result output module is used for outputting a classification result.