CN113033720A

CN113033720A - Vehicle bottom picture foreign matter identification method and device based on sliding window and storage medium

Info

Publication number: CN113033720A
Application number: CN202110588934.1A
Authority: CN
Inventors: 赵荣; 成晓龙; 赵智玉; 徐梅娟
Original assignee: Nanjing Soan Electronics Co ltd
Current assignee: Nanjing Soan Electronics Co ltd
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-06-25
Anticipated expiration: 2041-05-28
Also published as: CN113033720B

Abstract

The invention discloses a vehicle bottom picture foreign matter identification method, a device and a storage medium based on a sliding window, wherein the vehicle bottom picture foreign matter identification method comprises the following steps: step 1, training a foreign body recognition network model M to obtain parameters of a foreign body recognition network; and 2, segmenting the vehicle bottom picture into a plurality of window images by adopting a sliding window, loading image data of each window in a multi-process mode, preprocessing the window images and inputting the window images into a foreign matter identification network M to obtain an identification result of the foreign matter. According to the method, the vehicle bottom image after the segmentation of the sliding window is processed in multiple processes, and the efficiency and the accuracy of the high-resolution vehicle bottom foreign matter identification are effectively improved by combining the anchor frame-based method which is more stable for medium target identification and the characteristic point-based method which is more flexible for small target identification.

Description

Vehicle bottom picture foreign matter identification method and device based on sliding window and storage medium

Technical Field

The invention relates to an efficient sliding window-based small target foreign matter identification method for a high-resolution vehicle bottom picture, and belongs to the technical field of foreign matter identification.

Background

In order to prevent the dangerous articles such as firearms, explosives, drugs and the like from being hidden in the vehicle chassis by lawless persons, the foreign body identification of the vehicle chassis is an important part in the safety inspection. The existing vehicle bottom foreign matter identification method is to shoot a high-definition vehicle bottom picture and manually check the picture, but the method has the defects of low efficiency, low judgment accuracy rate after a security inspector monitors the picture for a long time and the like. A more efficient security check mode is for carrying out analysis to the vehicle bottom picture through the target identification technique based on degree of depth learning in the computer vision to automatically, discern the foreign matter that exists at the bottom of the vehicle.

The vehicle bottom picture shot by the safety inspection system has the following characteristics: first, the picture resolution is high. The calculation cost of the picture identification process is high, the time consumption is long, and tens of seconds are often needed to obtain the identification result when the high-resolution picture is directly input into the target identification network; second, the vehicle bottom foreign matter is mostly the small-size target, and the foreign matter occupies that the proportion of whole vehicle bottom picture is less promptly. This makes the recognition accuracy of foreign matter low in the general target recognition method.

Disclosure of Invention

The invention aims to solve the technical problem of improving the efficiency and accuracy of vehicle bottom foreign matter identification under the conditions that the resolution of a vehicle bottom picture is higher and the vehicle bottom foreign matter is a small target.

In order to solve the problems, the invention adopts the following technical scheme:

a vehicle bottom picture foreign matter identification method based on a sliding window is characterized by comprising the following steps

Step 1, training a foreign body recognition network model M to obtain parameters of a foreign body recognition network;

and 2, segmenting the vehicle bottom picture into a plurality of window images by adopting a sliding window, loading image data of each window in a multi-process mode, preprocessing the window images and inputting the window images into a foreign matter identification network M to obtain an identification result of the foreign matter.

The step 1 comprises the following steps: the method comprises the steps of data set construction and division, vehicle bottom image preprocessing, foreign matter identification network forward propagation and foreign matter identification network parameter updating.

Step 1-1, collecting vehicle bottom picture data containing foreign matters, wherein the vehicle bottom picture data comprises a vehicle bottom image and corresponding foreign matter labeling boundary frames and categories; dividing the vehicle bottom picture data into a training set and a verification set, randomly disordering the training data and dividing the training data into a plurality of small batches, and directly dividing the verification set data into a plurality of small batches;

step 1-2, preprocessing an input vehicle bottom image, randomly cutting a window with the length and width of h and w from the input vehicle bottom image to obtain a window image, horizontally and randomly overturning the window image, updating a foreign matter labeling frame corresponding to the window image, normalizing the window image, converting the window image into a Pythrch tensor, and splicing data in a small batch to obtain input data of a foreign matter identification network M;

step 1-3, inputting the data obtained by preprocessing in the step 1-2 into a backbone network in a foreign matter identification network M to obtain a multi-scale characteristic diagram;

the backbone network can be ResNet50 backbone network to obtain C₂、C₃、C₄、C₅Four multi-scale feature maps.

Step 1-4, inputting the multi-scale feature map obtained in the step 1-3 into a feature pyramid network in a foreign matter identification network M to obtain a multi-scale feature map with a plurality of fused features;

each of the plurality of features is P₂、P₃、P₄、P₅、P₆、P₇And six characteristics.

Step 1-5, each feature map P output in step 1-4_iDistributing anchor frames with various scales and length-width ratios, and respectively inputting the feature map into a prediction network based on the anchor frame and a prediction network based on the feature points in the foreign matter identification network M to obtain a classification value and a frame regression value of each anchor frame and a classification value and a frame regression value of each feature point;

step 1-6, firstly, calculating a loss value according to the classification value and the frame regression value of the anchor frame, the classification value and the frame regression value of the feature point and the labeled data of the vehicle bottom image in the step 1-5; then calculating the gradient value of the network parameter and updating the network parameter value;

step 1-7, completing the training of one batch after the steps 1-2 to 1-6 are executed on all the small batches; performing steps 1-2 to 1-5 on each small batch of the verification set, then performing post-processing operation consisting of frame filtering and non-maximum value inhibition to obtain a foreign matter identification result of the data of the verification set, and calculating the average accuracy of the whole class according to the identification result and the label of the verification set;

and 1-8, repeating the steps 1-1 to 1-710 to 20 times, and selecting the model parameters of the batches with the highest class average accuracy as the parameters of the foreign matter identification network M.

As a preferred technical scheme of the invention, the prediction network based on the anchor frame and the prediction network based on the characteristic points in the steps 1-5 are both composed of classification branches and regression branches, and each branch comprises an L-layer convolution and a prediction layer. Each layer of convolution adopts group regularization, C channels are divided into G groups, and normalization and linear transformation are carried out in each group of data of each vehicle bottom picture sample.

The step 2 comprises the following steps: the method comprises the steps of dividing a vehicle bottom image into a plurality of window images by adopting a sliding window, loading image data of each window in a multi-process mode, preprocessing the window images and inputting the window images into a foreign matter recognition network M to obtain a foreign matter recognition result. The method specifically comprises the following steps:

step 2-1, adopting sliding windows with the length and width of H and W respectively and the step length of s to obtain N = [ (H-H)// s +1] [ (W-W)// s +1] windows on the vehicle bottom image with the length and width of H, W respectively, wherein "//" represents a division of downward rounding;

step 2-2, distributing the N window images to P processes, wherein the first process is

One process processes N// P window pictures and the last process processes N- (P-1) (N// P) pictures, where "/" denotes division by rounding down. Loading a foreign matter identification network M comprising a backbone network, a characteristic pyramid, a prediction network based on an anchor frame and a prediction network based on a characteristic point in each process;

step 2-3, carrying out normalization of the statistic value of the ImageNet data set and preprocessing of converting the statistic value into a Pythroch tensor data type on the input window image;

and 2-4, inputting the preprocessed data into the foreign matter recognition network M to obtain the predicted value of the anchor frame, similar to the steps 1-3 to 1-5. And calculating to obtain the frame after the anchor frame is adjusted according to the frame regression value, the position, the scale and the length-width ratio of the anchor frame. Meanwhile, calculating to obtain a frame obtained by the feature points according to the regression values of the feature points and the positions of the feature points;

step 2-5, in the prediction frame of each characteristic layer output in step 2-4, the confidence degree is taken to be more than t₂Frame of (a), t₂The manually set confidence threshold is typically set between 0.3 and 0.6. Fusing the predicted frames of the characteristic layers of the window pictures processed by the processes, and performing intersection ratio with threshold value of t_iouThe final foreign object recognition result is obtained by the non-maximum value suppression operation. t is t_iouThe confidence threshold set manually is typically 0.5.

Compared with the prior art, the invention has the following technical effects:

1. according to the method for processing the high-resolution vehicle bottom picture after the sliding window segmentation in the multi-process mode, on one hand, the foreign matter identification speed is improved in the multi-process mode, on the other hand, the occupation ratio of the small target in the image is improved through the sliding window segmentation, and the small target identification precision is improved.

2. In the training process, an anchor frame-based method which is more stable for medium-scale target identification and a feature point-based method which is more flexible for small target identification are combined, and the combination of the anchor frame-based method and the feature point-based method can obtain a better solution of network parameters, so that the efficiency and the accuracy of high-resolution vehicle bottom image foreign matter identification are effectively improved.

3. The invention is beneficial to the identification of foreign objects with various scales by combining a method based on an anchor frame which is more stable for the identification of the medium-scale object and a method based on the feature points which is more flexible for the identification of the small object in the test process.

Drawings

FIG. 1 is a flow chart of a foreign object identifier recognition method employed in the present invention;

fig. 2 is a visualization of a vehicle bottom image foreign matter recognition result according to the invention.

Detailed Description

The embodiments of the present invention will be described in further detail with reference to the drawings attached to the specification.

Example 1

As shown in figure 1, the vehicle bottom image foreign matter identification method of the invention, the vehicle bottom includes but is not limited to the wagon bottom, the train bottom and other vehicle bottoms, the identification method comprises the following steps:

step 1, training a foreign body recognition network model M to obtain parameters of a foreign body recognition network.

And 2, predicting the vehicle bottom picture to obtain a vehicle bottom picture foreign matter identification result.

the training set and validation set may be as follows 9: 1 or other ratios divide the data into training and validation sets, with 9: 1 is T_val= T//10 validation set samples and T_train= T-T//10 training set samples. Then randomly disorganizing and dividing training set data into T_train// bs Small batches containing bs samples, direct partitioning of the validation dataset into T_valV/bs mini-lot, where T is the total number of car bottom images, "/" denotes division by rounding down.

In one embodiment, 10000 underbody images and corresponding foreign objects are labeled with bounding boxes and categories. Wherein the number of types of foreign matter is 10. According to the following steps of 9: 1 divide the data into a training set and a validation set, i.e. T_val=1000 validation set samples and T_train=9000 training set samples. The training set data was then randomly shuffled and divided into 281 small batches containing up to 32 images, and the validation data set was directly divided into 32 small batches containing up to 32 images.

Step 1-2, preprocessing an input vehicle bottom image, randomly cutting a window with the length and width of h and w from the input vehicle bottom image to obtain a window image, horizontally and randomly overturning the window image, updating a foreign matter labeling frame corresponding to the window image, normalizing the window image, converting the window image into a Pythrch tensor, and splicing data in a small batch to obtain input data of a foreign matter identification network M.

In one embodiment, a small batch of input vehicle bottom images are preprocessed, a window image is obtained by randomly cutting windows with the length and width of 800 and 1333 from input vehicle bottom images with the length and width of 1024 and 3750 respectively, namely the upper left corner of the input image is used as the origin, and the window image is obtained by randomly cutting the upper left corner of the input image

Randomly selecting upper boundary coordinates of the window image in the original image within the range with equal probability

And randomly selecting left boundary coordinates of the window image in the original image within the range with equal probability. And then, randomly overturning the window image at a probability level of 0.5, updating a foreign matter labeling frame corresponding to the window image, finally normalizing the window image by using the statistic value of the ImageNet data set, converting the window image into a Pythroch tensor, and splicing the data in a small batch to obtain the input data of the foreign matter identification network M.

And step 1-3, inputting the data obtained by preprocessing in the step 1-2 into a backbone network in the foreign matter identification network M to obtain a multi-scale characteristic diagram.

In one embodiment, the backbone network may be a ResNet50 backbone network, resulting in C₂、C₃、C₄、C₅Four multi-scale feature maps. C₂、C₃、C₄、C₅Four multi-scale feature maps with step sizes of 8, 16, 32, 64, respectively, relative to the input window image, where C_iThe top layer characteristic diagram with the step length being the corresponding step length of the characteristic diagram in the backbone network.

And 1-4, inputting the multi-scale feature map obtained in the step 1-3 into a feature pyramid network in the foreign matter identification network M to obtain a multi-scale feature map with a plurality of fused features.

In one embodiment, P is obtained₂、P₃、P₄、P₅、P₆、P₇The step sizes of the six feature fused multi-scale feature maps relative to the input window image are 8, 16, 32, 64, 128 and 256 respectively, wherein P is_iAnd the step length in the characteristic pyramid network is the characteristic diagram of the corresponding step length of the characteristic diagram.

Step 1-5, each feature map P output in step 1-4_iAnd distributing anchor frames with various scales and length-width ratios, and respectively inputting the feature map into a prediction network based on the anchor frame and a prediction network based on the feature points in the foreign matter identification network M to obtain a classification value and a frame regression value of each anchor frame and a classification value and a frame regression value of each feature point.

In one embodiment, let the profile P output from steps 1-4_iStep length of S_iAt P_iEach feature point is assigned with 3 scales S_i、2S_i、4S_iAnd 9 anchor frames with 3 length-width ratios of 0.5, 1.0 and 2.0. The feature map is input into an anchor-box based prediction network and 10 classification values, 4 bounding box regression values, are predicted for each anchor box. And meanwhile, inputting the feature map into a prediction network based on the feature points, and predicting 10 classification values and 4 frame regression values for each feature point.

the prediction classification vector containing K values for the anchor box is p^aThe corresponding labeled class value is y^aThe 4 predicted regression values constitute a vector r^aThe corresponding 4 labeled regression values constitute the vector t^aIf the classification loss weight of the positive sample is 0.25, the corresponding loss value of the anchor box is:

wherein L is^aAs loss value, i = y^aWhen the temperature of the water is higher than the set temperature,

otherwise

。

Loss value L of feature point^pThe calculation method is the same.

The total loss value L is: l = L^a+L^p

Then calculating the gradient value of each parameter, updating each parameter value, setting the learning rate of parameter updating to be 0.1, and adopting SGD (serving gateway device) in an optimization algorithm;

performing the steps 1-2 to 1-5 on each small batch of the verification set, and calculating to obtain the frame after the anchor frame is adjusted according to the frame regression value, the position, the scale and the length-width ratio of the anchor frame; and meanwhile, calculating to obtain a frame obtained by the characteristic points according to the regression values of the characteristic points and the positions of the characteristic points. Then, in each feature layer prediction frame, top-k confidence coefficients are selected to be the highest, and the confidence coefficient is larger than t₁And performing a cross-over ratio threshold of t_iouThe non-maximum value suppression operation of (2) to obtain a foreign object identification result of the verification set data. And finally, calculating the average accuracy of the whole class according to the identification result and the label of the verification set.

In one embodiment, 1000 frames with the highest confidence and the confidence greater than 0.05 are taken, and the non-maximum suppression operation with the intersection ratio threshold of 0.5 is performed to obtain the foreign object identification result of the verification set data. Finally, calculating the average accuracy of the whole class according to the identification result and the label of the verification set;

In one embodiment, E picks 12, i.e., 12 batches. The learning rate dropped to 0.1 times before in 8 th and 11 th batches. Finally, selecting the model parameters of the batches with the highest class average accuracy as the parameters of the foreign matter identification network M;

as a preferred technical solution of the present invention, the backbone network ResNet50 described in steps 1-3 includes layer2, layer3, layer4, and layer5, and is composed of 3, 4, 6, and 3 bolt sock modules, respectively, and the bolt sock of layer2 and layer3 is composed of 1 × 1 convolution, 3 × 3 convolution, 1 × 1 convolution, and hop connection. The bottle sock in layer4 and layer5 is composed of 1 × 1 convolution, deformable convolution, 1 × 1 convolution and jump connection;

in one embodiment, the anchor frame prediction network and the feature point-based prediction network of steps 1-5 are each comprised of classification branches and regression branches, each branch containing an L-layer convolution and a prediction layer. Each layer of convolution adopts group regularization, 256 channels are divided into 32 groups, and normalization and linear transformation are carried out in each group of data of each vehicle bottom picture sample.

in one example, using sliding windows with a length and width of 800, 1333, respectively, and a step size of 400, N =14 windows are obtained on a vehicle bottom image with a length and width of 1024, 3750, respectively.

And 2-2, distributing the N window images to P processes, wherein the first P-1 processes process N// P window images, and the last process processes N- (P-1) (N// P) images, wherein "/" represents division by rounding down. Loading a foreign matter identification network M comprising a backbone network, a characteristic pyramid, a prediction network based on an anchor frame and a prediction network based on a characteristic point in each process;

in one embodiment, 14 window images are assigned to 4 processes, with the first 3 processes processing 3 window images and the last process processing 5 images. And loading a foreign object identification network M comprising a backbone network, a characteristic pyramid and an anchor frame prediction network in each process.

and 2-5, taking a frame with the confidence level larger than 0.5 from the predicted frames of each characteristic layer output in the step 2-4. And fusing the prediction frames of the characteristic layers of the window pictures processed by the processes, and executing non-maximum suppression operation with the intersection ratio threshold value of 0.5 to obtain the final foreign matter identification result. As shown in fig. 2, black boxes indicate the positions of the recognized alien materials, and characters on the boxes indicate the types of recognized alien materials, "gun", "knife", and "ax" indicate the gun, knife, and axe, respectively.

The speed and the full-class average accuracy of the model are evaluated on 1000 validation set samples in step 1, and compared with a commonly used target identification method RetinaNet, and the results are shown in Table 1:

TABLE 1

Method	Speed (frame/second)	Average accuracy of all classes (%)
			RetinaNet	0.5	85.3
The invention	1.8	89.6

As can be seen from Table 1, the speed of the identification method of the invention is improved obviously, which is more than 3 times of the existing method, and the average accuracy of the whole class is not reduced and is also improved on the premise of greatly improving the speed.

Example 2

The invention also provides a vehicle bottom picture foreign matter recognition device based on the sliding window, which comprises a processor and a memory; the memory is stored with a program or an instruction, and the program or the instruction is loaded and executed by the processor to realize the vehicle bottom picture foreign matter identification method in the embodiment 1.

Example 3

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, wherein instructions are stored in the computer-readable storage medium, and when the instructions are run on a computer, the instructions cause the computer to execute the vehicle bottom picture foreign matter identification method according to embodiment 1.

It is clear to those skilled in the art that the technical solution of the present invention, which is essential or part of the technical solution contributing to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The method, the apparatus and the storage medium for enhancing the brightness of the vehicle bottom image provided by the present invention have many methods and approaches for implementing the technical solution, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. A vehicle bottom picture foreign matter identification method based on a sliding window is characterized by comprising the following steps:

step 2, segmenting the vehicle bottom picture into a plurality of window images by adopting a sliding window, loading image data of each window in a multi-process manner, preprocessing the window images and inputting the window images into a foreign matter identification network M to obtain an identification result of the foreign matter; the method comprises the following steps:

step 2-1, segmenting a high-resolution vehicle bottom image by adopting sliding windows with the length and the width of h and w respectively to obtain a plurality of window images;

step 2-2, averagely distributing the window image data to a plurality of processes, and loading a foreign matter identification network M in each process, wherein the foreign matter identification network M comprises a backbone network, a characteristic pyramid, an anchor frame-based prediction network and a characteristic point-based prediction network;

step 2-3, carrying out normalization on the input window image and preprocessing for converting the window image into a Pythrch tensor data type;

step 2-4, inputting the preprocessed data into a foreign matter recognition network M to respectively obtain classification values and frame regression values of the anchor frame and the feature points;

step 2-5, obtaining a predicted frame according to the classification value output in the step 2-4 and the frame regression value, and taking the confidence coefficient larger than t₂Frame of (a), t₂A confidence threshold value set manually; and adding the prediction frames based on the anchor frame and the feature point of each feature layer of the window picture processed by each process into the frame set, and performing non-maximum suppression operation on the frame set to obtain a final foreign matter identification result.

2. The method of claim 1, wherein the confidence threshold t is₂Is set between 0.3 and 0.6.

3. The method of claim 1, wherein step 1 comprises:

step 1-4, inputting the multi-scale feature map obtained in the step 1-3 into a feature pyramid network in a foreign matter identification network M to obtain a multi-scale feature map with fused features;

step 1-5, distributing anchor frames with various scales and length-width ratios on each feature map output in the step 1-4, and respectively inputting the feature maps into a prediction network based on the anchor frames and a prediction network based on feature points in the foreign matter identification network M to obtain a classification value and a frame regression value of each anchor frame and a classification value and a frame regression value of each feature point;

step 1-7, completing a round of training after the steps 1-2 to 1-6 are executed on all the small batches; performing steps 1-2 to 1-5 on each small batch of the verification set, then performing post-processing operation consisting of frame filtering and non-maximum value inhibition to obtain a foreign matter identification result of the data of the verification set, and calculating the average accuracy of the whole class according to the identification result and the label of the verification set;

and 1-8, repeating the steps 1-1 to 1-7 for E times, and selecting the model parameters of the batches with the highest overall average accuracy as the parameters of the foreign matter identification network M, wherein E = 10-20.

4. The method of claim 2, wherein the backbone network uses deformable convolution, and wherein the anchor-box-based prediction network and the feature-point-based prediction network both use packet regularization.

5. The method of claim 2, wherein the frame regression value of the anchor frame-based prediction network is a relative offset value of the frame center point coordinates and an adjustment value of the frame length and width; and the frame regression value of the prediction network based on the feature points is the distance value from the feature points to four boundaries of the frame.

6. A vehicle bottom picture foreign matter recognition device based on a sliding window is characterized by comprising a processor and a memory; the memory stores programs or instructions which are loaded and executed by the processor to realize the vehicle bottom picture foreign matter identification method as claimed in any one of claims 1 to 5.

7. A computer-readable storage medium on which a program or instructions are stored, the program or instructions, when executed by a processor, implementing the underbody picture foreign matter identification method according to any one of claims 1 to 5.