CN112651441B

CN112651441B - Fine-grained non-motor vehicle feature detection method, storage medium and computer equipment

Info

Publication number: CN112651441B
Application number: CN202011562327.XA
Authority: CN
Inventors: 张伍聪; 梁添才; 赵清利; 徐天适; 岳许要
Original assignee: Shenzhen Xinyi Technology Co Ltd
Current assignee: Shenzhen Radio & Tv Xinyi Technology Co ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2022-08-16
Anticipated expiration: 2040-12-25
Also published as: CN112651441A

Abstract

The invention belongs to the field of intelligent transportation, and relates to a fine-grained non-motor vehicle characteristic detection method, a storage medium and computer equipment, wherein the method comprises the following steps: constructing a feature detection network which comprises a convolutional neural network, a plurality of block fusion modules and a driver and non-motor vehicle feature detector, and realizing a feature pyramid network for realizing block fusion from a deep feature map to a shallow feature map through the plurality of connected block fusion modules; selecting a plurality of network characteristic layers from the characteristic detection network, carrying out layer-by-layer blocking and fusion processing to obtain a driver fine-grained characteristic diagram and a non-motor vehicle fine-grained characteristic diagram, respectively inputting the driver and the non-motor vehicle characteristic detector to carry out positioning and classification of an interested area, and respectively outputting the interested areas to an upper sampling layer, further carrying out blocking and fusion on characteristics to a shallow layer characteristic diagram, and obtaining the driver characteristic diagram and the non-motor vehicle characteristic diagram fused with finer-grained resolution information. The invention provides a detection network with the fusion of the block characteristic pyramids, and the target detection accuracy is improved.

Description

Fine-grained non-motor vehicle feature detection method, storage medium and computer equipment

Technical Field

The invention belongs to the field of intelligent transportation, and particularly relates to a fine-grained non-motor vehicle characteristic detection method, a storage medium and computer equipment.

Background

The non-motor vehicle brings great convenience to people going out, but the illegal behaviors such as no helmet wearing, unlicensed driving, overload driving and the like also greatly increase the potential safety hazard of urban traffic. By means of the bayonet and the electronic warning and wiping system, the traffic police department can perform intelligent traffic control on the non-motor vehicles, such as video structured analysis, tracking of the traveling track and the like on the non-motor vehicles.

The non-motorized vehicle comprises two parts: drivers and non-motor vehicles. The non-motorized vehicle person often comprises a plurality of important interested areas (ROI), such as a license plate, a vehicle head, a driver head, an upper body of the non-motorized vehicle and the like, and ROI target detection on the non-motorized vehicle person is a basis for carrying out attribute analysis on the non-motorized vehicle person and is also one of important contents for structuralized analysis of the non-motorized vehicle person.

On the other hand, the existing target detection methods based on deep learning mainly include two types:

a method for detecting a target of a fast Region-based convolutional network (FasterRCNN) is a two-stage detection method, wherein a candidate Region is extracted firstly in the first stage, and the candidate Region is finely adjusted in the second stage; the fasterncn method has high target detection accuracy, but has a slow actual detection speed because of the two-stage detection.

A Single Shot multi box Detector (SSD) target detection method, which is abbreviated as a Single stage detection method, is a Single stage detection method, so the detection speed is faster than the two-stage detection method, but the accuracy is lower than the two-stage detection method.

At present, the SSD method is mainly used for detecting targets of non-motorized vehicles, but when smaller targets appear in an image, the SSD method is low in precision, and the target detection precision can be improved by using the characteristic pyramid method.

The input of the non-motorized vehicle detection is an image with obvious upper and lower characteristic blocks, the upper half part is a driver, and the lower half part is a non-motorized vehicle, so how to improve the non-motorized vehicle detection effect by fusing global and local information based on the block and characteristic pyramid method idea becomes a key optimization direction for non-motorized vehicle structuralization.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides a fine-grained non-motor vehicle feature detection method, a storage medium and computer equipment, provides a block feature pyramid fused target detection network structure, performs target detection on a driver and a non-motor vehicle, and improves the accuracy of target detection.

The method of the invention is realized by adopting the following technical scheme: the fine-grained non-motor vehicle characteristic detection method comprises the following steps:

s1, constructing a feature detection network, wherein the constructed feature detection network comprises a convolutional neural network, a plurality of block fusion modules, a driver feature detector and a non-motor vehicle feature detector, and the convolutional neural network comprises a plurality of CBR modules and a maximum pooling layer which are connected in series; the system comprises a plurality of block fusion modules, a driver feature detector, a non-motor vehicle feature detector, a deep feature map, a shallow feature map and a non-motor vehicle feature detector, wherein the block fusion modules are connected with each other, and each block fusion module is also connected with one CBR module, the driver feature detector and the non-motor vehicle feature detector of the convolutional neural network respectively, so that a feature pyramid network for realizing block fusion from the deep feature map to the shallow feature map is realized;

s2, selecting a plurality of network characteristic layers from different CBR modules of the characteristic detection network, and carrying out layer-by-layer blocking and fusion processing to obtain a fused fine-grained characteristic diagram of a driver and a fused fine-grained characteristic diagram of a non-motor vehicle;

s3, inputting the segmented and fused fine-grained characteristic diagram of the driver and the fine-grained characteristic diagram of the non-motor vehicle into a driver characteristic detector and a non-motor vehicle characteristic detector respectively, and positioning and classifying the region of interest;

and S4, respectively outputting the block-fused fine-grained characteristic diagram of the driver and the non-motor vehicle to an upper sampling layer, and further performing block-fusion on the shallow layer characteristic diagram to obtain the driver characteristic diagram and the non-motor vehicle characteristic diagram fused with finer-grained resolution information.

In a preferred embodiment, the detection method further comprises: s5, training the constructed feature detection network, and adding a mutual exclusion loss function L in the training process _Mutex And the characteristic detection network model can accurately distinguish the detection frames overlapped by the same kind of targets.

The storage medium of the present invention has stored thereon computer instructions which, when executed by a processor, perform the steps of the fine-grained non-motor vehicle characteristic detection method described above.

The computer device of the invention comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and when the computer program is run by the processor, the steps of the fine-grained non-motor vehicle characteristic detection method are executed.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention provides a target detection method for a driver and a non-motor vehicle after blocking and fusing network shallow and deep features, namely a target detection network structure with pyramid fusion of blocking features, and the accuracy of target detection is improved.

2. The invention provides a mutual exclusion loss function which is used for optimizing the problem of dense self-shielding in target detection and reducing the condition of missed detection.

Drawings

FIG. 1 is a schematic diagram of a feature detection network in an embodiment of the invention;

FIG. 2 is a schematic diagram of the structure of the CBR module of FIG. 1;

FIG. 3 is a schematic view of the construction of the block fusion module of FIG. 1;

FIG. 4 is a diagram illustrating a specific process of network feature graph partitioning and merging according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a candidate box and a true box.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the embodiments of the present invention are not limited thereto.

Examples

The invention provides a fine-grained non-motor vehicle feature detection method, which is used for detecting a key region/an interested region by fusing a deep-shallow layer feature map based on a blocking idea. After the images are processed by the neural network, a shallow feature map and a deep feature map can be obtained. The shallow characteristic diagram has fine-grained high-resolution information, and the high-resolution information can more accurately position the boundary of the target; the deep characteristic map has high-level semantic information, and the high-level semantic information can more accurately define the category information of the target. By means of fusing the shallow feature map and the deep feature map, a fused feature map with high semantic information and high resolution information can be obtained, and therefore accuracy of target detection is improved. The common feature map fusion method is an integral fusion method, i.e. a deep feature map is directly up-sampled and then added with a shallow feature map to obtain a fusion feature map. According to the invention, based on the blocking characteristics of the non-motor vehicle driver, the shallow layer characteristic diagram and the deep layer characteristic diagram are respectively blocked, and then the blocking characteristics are respectively fused, so that the fine-grained characteristic extraction of the driver and the fine-grained characteristic extraction of the non-motor vehicle are realized; compared with the integral fusion method, the fine-grained feature fusion method has higher target detection accuracy. The detailed steps are as follows:

step 1: building feature detection networks

In this embodiment, the overall network structure is as shown in fig. 1. The constructed backbone network of the feature detection network is a convolutional neural network VGG16, and the VGG16 network comprises a plurality of CBR modules and a maximum pooling layer which are connected in series; the CBR module is shown in fig. 2 and includes a convolutional layer, a batch normalization layer, and a ReLU active layer connected together. In addition to the VGG16 network, the constructed feature detection network further comprises a plurality of block fusion modules, as well as a driver feature detector and a non-motor vehicle feature detector; and each block fusion module is also respectively connected with a CBR module, a driver feature detector and a non-motor vehicle feature detector of the VGG16 network, so that the feature pyramid network for realizing block fusion from the deep feature map to the shallow feature map is realized.

The block fusion module comprises a segmentation module and two output branches respectively connected with the segmentation module, each output branch comprises an addition module and a convolution module which are connected, and the addition module performs superposition processing on the feature map processed by the segmentation module from the CBR module and the feature map output by the upper sampling layer; the input of the block fusion module is three, one is the output of the CBR module, and the other is the output of the up-sampling layer; the output of the block fusion module is four, two outputs are output to the upper sampling layer for upward feature pyramid fusion, and two outputs are output to the detection layer for classification regression processing. The detection layer adopts a SSD detector, and the detection layer in the case has two types, one is a driver characteristic detection layer, and the other is a non-motor vehicle characteristic detection layer.

Step 2: and selecting a plurality of network characteristic layers from different CBR modules of the characteristic detection network, and carrying out layer-by-layer blocking and fusion processing to obtain a fused fine-grained characteristic diagram of the driver and a fused fine-grained characteristic diagram of the non-motor vehicle.

In this embodiment, based on the SSD detection framework, 6 branches of the feature layer CBR module 4_3, the CBR module 7, the CBR module 8_2, the CBR module 9_2, the CBR module 10_2, and the CBR module 11_2 are selected from the feature detection network, and are respectively used as inputs of the 5 block fusion modules to perform block fusion processing. After the layer-by-layer blocking feature fusion of the 5 blocking fusion modules, outputting 5 fine-grained feature maps of drivers and 5 fine-grained feature maps of non-motor vehicles, and respectively inputting the fine-grained feature maps into corresponding detection layers for area positioning and category identification. And (4) integrating the characteristic detection results to obtain all detection results of the non-motorized vehicle passengers.

In steps 2 to 4 of this embodiment, the block features of the CBR module 10_2 and the CBR module 11_2 are fused, and then the detection of the driver feature and the non-motor vehicle feature is performed as an example for explanation. The process of block fusion is shown in fig. 4, where the shallow feature map 21 represents the feature map of the CBR module 10_2, and the deep feature map 22 represents the feature map of the CBR module 11_ 2.

The feature map of the deep convolutional neural network contains four-dimensional information, namely four dimensions of sample number, channel number, height and width. The segmentation module segments the shallow feature map from the CBR module into 2 sections along the height dimension, the upper section being a driver-related shallow segmented feature map 23 and the lower section being a non-vehicle-related shallow segmented feature map 24; or the deep feature map from the CBR module is divided into 2 sections along the channel dimension, the upper section being the driver-related deep feature map 25 and the lower section being the non-vehicle-related deep feature map 26.

The block fusion module performs bilinear interpolation up-sampling on the deep layer feature map 25 related to the driver to obtain a driver interpolation feature map 27 with the same width, height and channel number of the shallow layer block feature map 23 related to the driver; the shallow layer block feature map 23 related to the driver is added with the interpolation feature map 27 of the driver, and then a feature map 28 obtained through a convolution operation of 3x3 is used as a fine-grained feature for detecting the region of interest of the driver, wherein the feature map 28 is also called a fine-grained feature map of the driver.

Similarly, the block fusion module performs bilinear interpolation up-sampling on the deep layer feature map 26 related to the non-motor vehicle to obtain a non-motor vehicle interpolation feature map 29 with the same width, height and channel number as those of the shallow layer block feature map 24 related to the non-motor vehicle; and adding the shallow layer block feature map 24 related to the non-motor vehicle and the non-motor vehicle interpolation feature map 29, and performing a convolution operation of 3x3 to obtain a feature map 30 which is used as a fine-grained feature for detecting the non-motor vehicle region of interest, wherein the feature map 30 is also called a non-motor vehicle fine-grained feature map.

The above is a fine-grained block fusion process of two feature maps. In the process of fusing the fine-grained blocks of a plurality of feature maps, the fused fine-grained feature map 28 of the driver and the fused fine-grained feature map 30 of the non-motor vehicle are respectively subjected to bilinear interpolation upsampling and then serve as the input of the next fine-grained block fusion module.

And step 3: and respectively inputting the segmented and fused fine-grained characteristic diagram of the driver and the fine-grained characteristic diagram of the non-motor vehicle into a driver characteristic detector and a non-motor vehicle characteristic detector, and positioning and classifying the region of interest.

(1) And accessing the fine-grained characteristic diagram 28 of the driver into a detection layer to position and classify the interested regions, wherein the interested regions related to the driver can be trained and learned in the detection layer, such as the head of the driver, the upper body region of the driver and other interested regions.

(2) Similarly, the feature map 30 is accessed to the detection layer for region-of-interest localization and classification, and the region-of-interest related to the non-motor vehicle can be trained and learned in the detection layer, for example, the region-of-interest of the non-motor vehicle, such as the headlight, the head, the face, etc.

And 4, step 4: and respectively outputting the partitioned and fused fine-grained characteristic diagram of the driver and the non-motor vehicle to an upper sampling layer, and further partitioning and fusing the characteristics to the shallow layer characteristic diagram to obtain the driver characteristic diagram and the non-motor vehicle characteristic diagram fused with finer-grained resolution information.

(1) In step 3(1), one end of the feature map 28 has access to the detection layer for positioning and classification. Following the characteristic diagram pyramid principle, the other end of the characteristic diagram 28 continues to perform up-sampling interpolation and is fused with the shallower layer block characteristics, and the obtained driver characteristic diagram is fused with finer-grained resolution information, so that a smaller target in a driver can be identified.

(2) Similarly, in step 3(2), one end of the feature map 30 has access to the detection layer for positioning and classification. And following the pyramid principle of the feature map, the other end of the feature map 30 continues to perform up-sampling interpolation and is fused with the shallower layer block features, and the obtained non-motor vehicle feature map is fused with finer-grained resolution information, so that a smaller target in the non-motor vehicle can be identified.

And 5: and training the constructed feature detection network. In the whole network training process, a mutual exclusion loss function is adopted to optimize the dense self-shielding detection problem.

Due to the occlusion problem of overlapping of multiple drivers in the input image, part of the prediction frame is suppressed and processed by the non-maximum value, and detection omission is caused. In order to optimize the missing detection problem, the embodiment performs model training on the whole network framework designed in step 1, and adds a mutual exclusion loss function L in the training process of the whole network _Mutex The characteristic detection network model can more accurately distinguish the detection frames overlapped by the similar targets. As shown in FIG. 5, the mutual exclusion penalty function L _Mutex Is aimed at making the candidate frame P and the surrounding true value frame T ₂ As far as possible from each otherSurrounding true value box T ₂ Is a true value box T except for a match with a candidate box ₁ The true value box with the maximum overlapping degree is arranged; i.e. using mutually exclusive loss functions L _Mutex Zooming in the candidate frame P and the surrounding true value frame T ₂ The distance between them. That is, the truth box T ₁ Is the true value box with the first highest degree of overlap with the candidate box P.

Through function IoT (P, T) ₂ ) To represent the candidate frame P and the surrounding true value frame T ₂ Distance between:

in the above formula, area (P ^ T ^ n) ₂ ) Representing the candidate frame P and the surrounding true value frame T ₂ Area of intersection, area (T) ₂ ) Box T for representing surrounding true values ₂ The area of (a). Candidate frame P and surrounding true value frame T ₂ Mutual exclusion loss function L between _Mutex Expressed as:

wherein P is ⁺ Represents N positive sample candidate boxes, the surrounding true value box T ₂ If there is no surrounding true value frame T, the true value frame with the second highest degree of overlap with the candidate frame P ₂ Then IoT is 0. The loss function used in the overall network training process can be expressed as:

L _all ＝L _conf +L _loc +γ·L _Mutex

L _conf and L _loc Respectively representing classification loss and positioning loss in the feature detection network; γ is the weight coefficient of mutual exclusion loss, and is set to 0.5 in this embodiment. In the network training process, by adding a mutual exclusion loss function and adjusting the overall network weight parameters, the model can more accurately distinguish the detection frames with overlapped targets of the same kind in the reasoning process, and the problem of missed detection after the overlapped targets of the same kind are reduced.

The sequence of the steps 1 to 5 is not limited or required in this embodiment. In fact, step 5 is executed after the feature detection network is constructed in step 1, and after the weight parameters of the feature detection network are trained, the weight parameters are used to detect the features of the driver and the non-motor vehicle.

Based on the same inventive concept, the present embodiment also proposes a storage medium having stored thereon computer instructions that, when executed by a processor, implement the steps S1-S5 of the above-described fine-grained non-motor vehicle characteristic detection method.

Based on the same inventive concept, the present embodiment also proposes a computer apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed by the processor performs the steps S1-S5 of the above-described fine-grained non-motor vehicle characteristic detection method.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A fine-grained non-motor vehicle characteristic detection method is characterized by comprising the following steps:

s1, constructing a feature detection network, wherein the constructed feature detection network comprises a convolutional neural network, a plurality of block fusion modules, a driver feature detector and a non-motor vehicle feature detector, and the convolutional neural network comprises a plurality of CBR modules and a maximum pooling layer which are connected in series; the system comprises a plurality of block fusion modules, a characteristic pyramid network and a characteristic pyramid network, wherein the block fusion modules are connected, and each block fusion module is also connected with a CBR module of a convolutional neural network, a driver characteristic detector and a non-motor vehicle characteristic detector respectively, so that the characteristic pyramid network is formed by block fusion from a deep characteristic diagram to a shallow characteristic diagram;

s4, respectively outputting the block-fused fine-grained characteristic diagram of the driver and the non-motor vehicle to an upper sampling layer, and further performing block-fusion on the shallow layer characteristic diagram to obtain a driver characteristic diagram and a non-motor vehicle characteristic diagram fused with finer-grained resolution information;

each block fusion module comprises a cutting module and two output branches respectively connected with the cutting module, each output branch comprises an adding module and a convolution module which are connected, and the adding module is used for performing superposition processing on a feature map from the CBR module after being processed by the cutting module and a feature map output by an upper sampling layer;

the input of each block fusion module comprises three, one is the output of the CBR module, and the other is the output of the up-sampling layer; the output of the block fusion module comprises four, two outputs are output to the upper sampling layer to perform upward feature pyramid fusion, and two outputs are output to the driver feature detector and the non-motor vehicle feature detector to perform classification regression processing.

2. The fine-grained non-automotive feature detection method of claim 1, further comprising:

s5, training the constructed feature detection network, and adding a mutual exclusion loss function L in the training process _Mutex And the characteristic detection network model can accurately distinguish the detection frames overlapped by the same kind of targets.

3. The fine grain non-motor vehicle feature detection method of claim 2, wherein a mutual exclusion penalty function L is utilized _Mutex Zooming in the candidate frame P and the surrounding true value frame T ₂ Distance between, surrounding true value box T ₂ For frames other than the true value T matched with the candidate frame ₁ Other than overlapThe true value box with the maximum degree; through function IoT (P, T) ₂ ) To represent the candidate frame P and the surrounding true value frame T ₂ Distance between:

wherein, area (P ^ T ^ N ^ T) ₂ ) Representing the candidate frame P and the surrounding true value frame T ₂ Area of intersection, area (T) ₂ ) Box T for representing surrounding true values ₂ The area of (d); candidate frame P and surrounding true value frame T ₂ Mutual exclusion loss function L between _Mutex Expressed as:

wherein P is ⁺ Representing N positive sample candidate boxes, if there is no surrounding true value box T ₂ If IoT is 0; the loss function adopted by the feature detection network in the training process is as follows:

L _all ＝L _conf +L _loc +γ·L _Mutex

L _conf and L _loc Respectively representing classification loss and positioning loss in the feature detection network; and gamma is a weight coefficient of mutual exclusion loss.

4. The fine-grained non-motor vehicle characteristic detection method according to claim 1, wherein in step S2, a network characteristic layer is selected from six different CBR modules of the characteristic detection network, and is used as an input of five block fusion modules; after the block feature fusion of the five block fusion modules layer by layer, 5 fine-grained feature maps of drivers and 5 fine-grained feature maps of non-motor vehicles are output and respectively input to corresponding detection layers for region positioning and category identification.

5. The fine-grained non-motor vehicle feature detection method according to claim 4, wherein the shallow feature map from the CBR module is segmented into a driver-related shallow segmented feature map and a non-motor vehicle-related shallow segmented feature map along a height dimension by a segmentation module; or dividing the deep feature map from the CBR module into a driver-related deep feature map and a non-motor vehicle-related deep feature map along the channel dimension;

the block fusion module performs bilinear interpolation up-sampling on the deep feature map related to the driver to obtain a driver interpolation feature map with the same width, height and channel number as those of the shallow block feature map related to the driver; adding the shallow layer block characteristic diagram related to the driver and the interpolation characteristic diagram of the driver, and performing convolution operation to obtain a fine-grained characteristic diagram of the driver;

the blocking fusion module performs bilinear interpolation up-sampling on the deep layer characteristic diagram related to the non-motor vehicle to obtain a non-motor vehicle interpolation characteristic diagram with the same width, the same height and the same channel number of the shallow layer blocking characteristic diagram related to the non-motor vehicle; and adding the shallow layer block characteristic diagram related to the non-motor vehicle and the non-motor vehicle interpolation characteristic diagram, and performing convolution operation to obtain a non-motor vehicle fine-grained characteristic diagram.

6. The fine-grained non-automotive feature detection method according to claim 1, wherein the CBR module comprises a convolutional layer, a batch normalization layer, and a ReLU activation layer connected.

7. The fine grain non-motor vehicle characteristic detection method according to claim 1, wherein the driver characteristic detector and the non-motor vehicle characteristic detector both use an SSD detector.

8. A storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, perform the steps of the fine-grained non-motor vehicle characteristic detection method of any one of claims 1 to 7.

9. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, performs the steps of the fine-grained non-motor vehicle characteristic detection method according to any one of claims 1 to 7.