CN111078946A

CN111078946A - Bayonet vehicle retrieval method and system based on multi-target regional characteristic aggregation

Info

Publication number: CN111078946A
Application number: CN201911228964.0A
Authority: CN
Inventors: 王培�; 其他发明人请求不公开姓名
Original assignee: Hangzhou Pico Pico Technology Co Ltd
Current assignee: Hangzhou Pico Pico Technology Co Ltd
Priority date: 2019-12-04
Filing date: 2019-12-04
Publication date: 2020-04-28

Abstract

The invention discloses a bayonet vehicle retrieval method and a system based on multi-target regional characteristic aggregation, which comprises the following steps: constructing a bayonet vehicle target detection database; building a vehicle head target detection database; respectively training three vehicle type recognition models according to the full image, the lower half area image and the head image of the vehicle at the gate; respectively training a vehicle recognition model according to the full image and the upper half area image of the vehicle at the checkpoint, extracting a convolution characteristic diagram of a corresponding input image according to the model, performing characteristic aggregation to be used as final characteristic vector representation, performing normalization, and connecting the characteristic vectors into a unified characteristic vector to be used as final characteristic representation of the vehicle at the checkpoint; and respectively extracting the features of the target vehicle image to be retrieved and the candidate query image in the database by using the steps, calculating a feature similarity matrix of the two, and obtaining a final vehicle retrieval ordering and grading result by adopting a reordering algorithm. By the method and the device, the characteristic characterization effect is improved, and the retrieval precision is improved.

Description

Bayonet vehicle retrieval method and system based on multi-target regional characteristic aggregation

Technical Field

The invention relates to the technical field of image and video processing, in particular to a bayonet vehicle retrieval method and system based on multi-target region feature aggregation.

Background

The vehicle retrieval technology at the urban road gate has important value in the application fields of road traffic management, public security criminal investigation and the like, and is a core technology for realizing the construction of smart cities. For example, at the gate positions of expressway toll stations, urban road intersections and the like, dense cameras are generally arranged for capturing coming and going motor vehicles, and the monitoring video data all day long is stored locally for violation evidence inquiry, criminal case investigation and the like. According to statistics, the video data of about 150PB per month in a city with 25 ten thousand security cameras. How to efficiently retrieve the target vehicle from the massive monitoring data is a difficult problem with great application value.

The difficulty of the vehicle retrieval technology is that vehicles of the same vehicle type in a video can come from different vehicle owners, so that great challenges are created for the identity recognition of the vehicles; secondly, vehicles of the same owner can present completely different appearances due to different illumination conditions and shooting angles under different cameras, so that the vehicle retrieval system can make misjudgment serious; thirdly, the total reserved quantity of the existing urban vehicles is huge, and a constructed urban vehicle characteristic database provides a severe efficiency requirement for a large-scale vehicle retrieval task. The existing vehicle retrieval method generally comprises the technical key points of detection and positioning of a vehicle target, extraction of appearance characteristics of a vehicle, recognition of a vehicle type and a license plate, reordering of retrieval results and the like. The vehicle target detection and positioning method can be realized by a plurality of methods with reliable performance, such as target detection algorithms based on deep learning, such as fast RCNN, YOLO, SSD and the like. In a surveillance video scene, the license plate recognition method has a very limited role due to factors such as a long shooting distance, serious mutual shielding between vehicles, low quality of a surveillance image and the like.

Through research on the prior vehicle retrieval technology, part of the prior invention patents also aim to improve the accurate retrieval level of the vehicle. Application No. 201610671729.0, entitled: the invention discloses a Chinese invention patent of a vehicle retrieval method and a device based on big data, which cannot be well adaptive to the retrieval problem of vehicle pictures with different appearance characteristics due to the dependence on the probability threshold of each marker. The invention relates to a Chinese patent with application number of 201410381921.7 and a name of a vehicle detection method and a device for resisting illumination change, which can well deal with the problem of illumination change, but can not be distinguished for different vehicles with the same type under similar illumination conditions. The invention has the application number of 201510744990.4 and is named as a Chinese invention patent of a vehicle detection method based on similarity learning, and the calculation cost of SIFT characteristics adopted by the invention is very high. The invention discloses a Chinese patent with the application number of 201711135951.X and the name of the Chinese patent is a modified locality sensitive Hash vehicle retrieval method based on multitask deep learning. The invention has the application number of 201510666534.2 and is named as a Chinese invention patent of a bayonet vehicle retrieval method based on camera perspective transformation, the method depends on external reference of a camera to construct an imaging geometric model, and camera parameters are complicated, diversified and even lacked in a real large-scale network camera application scene, so that the practicability of the method is limited. The invention is a Chinese patent with application number of 201510081237.1 entitled vehicle retrieval method based on multiple visual angles, which utilizes a vehicle attitude estimation method, however, because the vehicle characteristic points are very few, the accuracy of the attitude estimation model is usually very limited.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a bayonet vehicle retrieval method and a system based on multi-target region feature aggregation, which fully utilize the existing large-scale city monitoring video data, construct a multi-target region detection and feature representation model based on a deep convolutional neural network and effectively improve the feature representation effect of a bayonet vehicle image; meanwhile, the vehicle retrieval system performance is improved by reordering the similarity matrix of the vehicle retrieval.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention provides a bayonet vehicle retrieval method based on multi-target region feature aggregation, which comprises the following steps:

s11: constructing a bayonet vehicle target detection database by using monitoring video frame data, designing a vehicle target detection network model based on a deep convolutional neural network, and training the network model by using vehicle marking data in the database for detecting and positioning a vehicle;

s12: according to the vehicle detection result in the S11, a bayonet vehicle area in the image is segmented, and a vehicle head target detection database is further constructed and used for training a vehicle head target area detection model to realize vehicle head positioning;

s13: respectively training three vehicle type recognition models based on a deep convolutional neural network according to a full image, a lower half area image and a head image of a vehicle at a gate, and extracting vehicle type category characteristics of corresponding images;

s14: respectively training a vehicle identification model based on a deep convolutional neural network according to the full image and the upper half area image of the vehicle at the checkpoint for extracting the vehicle identity characteristics of the corresponding images;

s15: extracting convolution characteristic diagrams of corresponding input images according to the models obtained in the S13 and the S14, respectively carrying out characteristic aggregation by adopting a selective convolution characteristic descriptor aggregation algorithm to be used as final characteristic vector representation, finally respectively carrying out normalization, and then connecting the characteristic vectors into a unified characteristic vector to be used as final characteristic representation of the bayonet vehicle;

s16: and respectively performing feature extraction on the target vehicle image to be retrieved and the candidate query image in the database by using S11-S15 to obtain feature vectors of the target vehicle image to be retrieved and the candidate query image, then calculating a feature similarity matrix of the target vehicle image to be retrieved and the candidate query image, and obtaining a final vehicle retrieval ordering and grading result by adopting a reordering algorithm.

Preferably, in the step S14, two loss functions are used for training in the vehicle identification model based on the deep convolutional neural network trained according to the full image of the bayonet vehicle, so as to obtain two vehicle identification models.

Preferably, the bayonet vehicle object detection database in S11 includes: the method comprises the steps of obtaining video frame images from a monitoring camera, and carrying out manual labeling on all vehicle targets in each image to form a rectangular surrounding frame, wherein all the video frame images and labeling samples are divided into a training set and a testing set and are used for training a target detection model.

Preferably, the vehicle target detection model in S11 is specifically: extracting a feature map of the input image by using a deep convolutional neural network, and establishing a regression model for training and testing model parameters by using the feature map and the artificially labeled vehicle target rectangular surrounding frame;

the parameters of the regression model comprise network parameters of each layer of the deep convolutional neural network and loss function parameters for guiding the training process; in the model training process, the loss function is continuously minimized through a mathematical optimization algorithm based on random gradient descent until the loss function converges to the global optimum, so that network parameters of each layer in a convergence state are obtained, in the model testing process, the video frame image is input according to a regression model obtained through training, and the rectangular bounding box coordinates and confidence score of the vehicle target in the image are output.

Preferably, the vehicle detection and location result in S12 includes a rectangular bounding box of all vehicle targets appearing in all video frame images captured by the camera.

Preferably, the locomotive object detection database in S12 includes: and the full images of the vehicles at the bayonet, which are obtained from the vehicle detection and positioning results and the video frame images obtained in the step S11, and the rectangular surrounding frame for manually labeling all the vehicle head targets in each image, wherein all the full images of the vehicles at the bayonet and the labeling samples are divided into a training set and a testing set for training a target detection model.

Preferably, the vehicle head target detection model in S12 specifically is: extracting a feature map of the input image by using a deep convolutional neural network, and establishing a regression model for training and testing model parameters by using the feature map and the artificially labeled vehicle head target rectangular surrounding frame;

the parameters of the regression model comprise network parameters of each layer of the deep convolutional neural network and loss function parameters for guiding the training process; in the model training process, the loss function is continuously minimized through a mathematical optimization algorithm based on random gradient descent until the loss function converges to the global optimum, so that network parameters of each layer in a convergence state are obtained, in the model testing process, according to a regression model obtained through training, a full-width image of the bayonet vehicle is input, and rectangular bounding box coordinates of a vehicle head target in the image are output.

Preferably, the similarity calculation in the feature similarity matrix of the two calculations in S16 adopts cosine similarity, and the calculation formula is as follows:

wherein F₁And F₂Respectively are the feature vectors of the target vehicle image to be retrieved and the candidate query image.

Preferably, the reordering algorithm in S16 is specifically: k-recurrocal algorithm in the field of pedestrian re-identification.

The invention also provides a bayonet vehicle retrieval system based on multi-target region feature aggregation, which comprises:

the vehicle target detection database construction module is used for constructing a bayonet vehicle target detection database by utilizing monitoring video frame data, designing a vehicle target detection network model based on a deep convolutional neural network, and training the network model by utilizing vehicle marking data in the database for detecting and positioning a vehicle;

the vehicle head target detection database construction module is used for segmenting a bayonet vehicle area in the image according to vehicle detection and positioning results, further constructing a vehicle head target detection database and training a vehicle head target area detection model to realize vehicle head positioning;

the vehicle type classification characteristic extraction module is used for respectively training three vehicle type recognition models based on the deep convolutional neural network according to the full image, the lower half area image and the head image of the vehicle at the checkpoint and extracting vehicle type classification characteristics of corresponding images;

the vehicle identity characteristic extraction module is used for respectively training a vehicle identification model based on a deep convolutional neural network according to the full image and the upper half area image of the vehicle at the checkpoint and extracting the vehicle identity characteristics of the corresponding images;

the characteristic vector construction module of the bayonet vehicle is used for extracting a convolution characteristic diagram of a corresponding input image according to the models obtained in the vehicle type characteristic extraction module and the vehicle identity characteristic extraction module, respectively carrying out characteristic aggregation by adopting a selective convolution characteristic descriptor aggregation algorithm to be used as final characteristic vector representation, finally respectively carrying out normalization, and then connecting the characteristic vectors into a unified characteristic vector to be used as the final characteristic representation of the bayonet vehicle;

and the vehicle retrieval ordering and grading result obtaining module is used for respectively extracting the features of the image of the target vehicle to be retrieved and the candidate query image in the database by using the vehicle target detection database constructing module, the vehicle head target detection database constructing module, the vehicle type feature extracting module, the vehicle identity feature extracting module and the final feature representation constructing module of the bayonet vehicle to obtain the feature vectors of the image of the target vehicle to be retrieved and the candidate query image, then calculating the feature similarity matrix of the image of the target vehicle to be retrieved and the candidate query image, and obtaining the final vehicle retrieval ordering and grading result by adopting a reordering algorithm.

Compared with the prior art, the invention has the following advantages:

(1) according to the multi-target regional characteristic aggregation-based checkpoint vehicle retrieval method and system, the influence of different local regional characteristics and global visual characteristics of a checkpoint vehicle image on vehicle type identification and vehicle identification is considered, and the expression capacity of the vehicle visual characteristics is effectively improved;

(2) according to the bayonet vehicle retrieval method and system based on multi-target regional characteristic aggregation, the similarity matrix of vehicle retrieval is reordered, so that the performance of the vehicle retrieval system is improved, and the precision level of the vehicle retrieval system is improved;

(3) according to the bayonet vehicle retrieval method and system based on multi-target region feature aggregation, the image features are extracted through the deep convolutional neural network, the vehicle feature expression capability is effectively improved, and the accuracy and robustness of vehicle retrieval application based on deep learning are guaranteed;

(4) the bayonet vehicle retrieval method and system based on multi-target regional characteristic aggregation can well relieve the huge difference of the same vehicle image appearance and the high similarity of different vehicle appearances, challenges are caused to a vehicle retrieval system, and the method and system have strong practicability and reliability;

(5) the bayonet vehicle retrieval method and system based on multi-target region feature aggregation have strong expandability, can construct specific data sets aiming at different task requirements, and provide practical solutions under specific scenes for users;

(6) the bayonet vehicle retrieval method and system based on multi-target region feature aggregation further improve the expression capability of vehicle visual features by considering the influence of different loss functions in a deep learning method on model training.

Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings:

FIG. 1 is a schematic diagram of a bayonet vehicle retrieval method based on multi-target region feature aggregation according to an embodiment of the present invention;

FIG. 2 is a flowchart of a bayonet vehicle retrieval method based on multi-target regional feature aggregation according to an embodiment of the present invention;

fig. 3 is a processing diagram of feature representation based on multi-target region feature aggregation algorithm according to an embodiment of the present invention.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

Fig. 1 is a schematic diagram of a bayonet vehicle retrieval method based on multi-target region feature aggregation according to an embodiment of the present invention, and fig. 2 is a flowchart thereof.

Referring to fig. 1 and fig. 2, the method for retrieving a vehicle at a gate of the present embodiment includes the following steps:

s12: according to the vehicle detection and positioning result in the S11, a bayonet vehicle area in the image is segmented, and a vehicle head target detection database is further constructed and used for training a vehicle head target area detection model to realize vehicle head positioning;

s15: extracting convolution characteristic diagrams of corresponding input images according to the models obtained in S13 and S14, respectively performing characteristic aggregation by adopting a selective convolution characteristic descriptor aggregation algorithm to serve as final characteristic vector representation, finally respectively performing normalization, and then connecting the characteristic vectors into a unified characteristic vector to serve as the final characteristic representation of the bayonet vehicle;

In a preferred embodiment, the bayonet vehicle object detection database in S11 includes: video frame images acquired from a monitoring camera and rectangular surrounding frames for manually labeling all vehicle targets in each image, wherein each rectangular surrounding frame is represented by two-dimensional coordinates of the upper left corner and the lower right corner, such as: [ x ] of₁,y₁,x₂,y₂](ii) a All video frame images and labeled samples are divided into a training set and a testing set and used for training a target detection model.

In the preferred embodiment, the vehicle target detection model in S11 is specifically: extracting a feature map of the input image by using a deep convolutional neural network, and establishing a regression model for training and testing model parameters by using the feature map and a manually marked vehicle target rectangular surrounding frame; the parameters of the regression model comprise network parameters of each layer of the deep convolutional neural network and loss function parameters for guiding the training process; in the model training process, the loss function is continuously minimized through a mathematical optimization algorithm based on random gradient descent until the loss function converges to the global optimum, so that network parameters of each layer in a convergence state are obtained, in the model testing process, the video frame image is input according to a regression model obtained through training, and the rectangular bounding box coordinates and confidence score of the vehicle target in the image are output.

In the preferred embodiment, the vehicle detection result in S12 includes a rectangular bounding box of all vehicle targets appearing in all video frame images captured by the camera.

In a preferred embodiment, the vehicle head target detection database in S12 includes: obtaining a full image of the bayonet vehicle from the vehicle detection and positioning result obtained in the step S11 and the video frame image, and obtaining rectangular bounding boxes for manually labeling all the vehicle head targets in each image, where each rectangular bounding box is represented by two-dimensional coordinates of the upper left corner and the lower right corner, such as: [ x ] of₁,y₁,x₂,y₂](ii) a All the full images and the labeled samples of the vehicles at the gate are divided into a training set and a testing set,for training the target detection model.

In a preferred embodiment, the vehicle head target detection model in S12 is specifically: extracting a feature map of the input image by using a deep convolutional neural network, and establishing a regression model for training and testing model parameters by using the feature map and a manually labeled vehicle head target rectangular bounding box; the parameters of the regression model comprise network parameters of each layer of the deep convolutional neural network and loss function parameters for guiding the training process; in the model training process, the loss function is continuously minimized through a mathematical optimization algorithm based on random gradient descent until the loss function converges to the global optimum, so that network parameters of each layer in a convergence state are obtained, in the model testing process, according to a regression model obtained through training, a full-width image of the bayonet vehicle is input, and rectangular bounding box coordinates of a vehicle head target in the image are output.

In one embodiment, the full-width image of the vehicle at the checkpoint in S13 is from a rectangular frame region of the vehicle detection result, and only contains one vehicle target. And the lower half area image is the lower half part of the full image of the vehicle at the bayonet. And the head image is from the rectangular frame area of the head detection result. And establishing corresponding vehicle type classification databases by using the three types of images respectively. The vehicle type classification database comprises a vehicle type manual labeling label corresponding to each image, and is used for distinguishing different types of vehicles, such as: blue bmax 5.

In one embodiment, the model for vehicle identification based on deep convolutional neural network in S13 adopts a google lenet convolutional neural network structure commonly used in the art. Specifically, the number of neurons in the last fully connected layer of the network structure is set as the total number of manually labeled vehicle type categories. The vehicle type characteristic is output by the fifth pooling layer of the GoogleLeNet network, and the characteristic size of the output vehicle type is 7 × 1024. During the training process, a Softmax loss function commonly used in the industry is used to guide the gradient descent algorithm to train. The formula form of the Softmax loss function is:

wherein x_iTraining image samples with class label 1 that are one-hot coded.

In one embodiment, the upper half area image in S14 is the upper half of the full image of the bayonet vehicle. The vehicle identification model based on the convolutional neural network also adopts a GoogLeNet convolutional neural network structure commonly used in the industry, the vehicle identity characteristic is output by a fifth pooling layer of the GoogLeNet network, and the characteristic size of the output vehicle is 7 × 1024. For the upper half area image, a Softmax loss function was used to guide the gradient descent algorithm for training.

In the preferred embodiment, two loss functions are used for training in the vehicle identification model based on the deep convolutional neural network and trained according to the full image of the vehicle at the checkpoint in S14, so as to obtain two vehicle identification models. In one embodiment, the gradient descent algorithm may be guided by using a Softmax loss function and a Triplet loss function, respectively. The Triplet loss function has the formula form:

wherein

And

reference samples, positive samples, and negative sample images, respectively, α being constant terms the reference samples and positive sample images belong to the same class, while the reference samples and negative samples belong to different classes.

In the above embodiment, both the Softmax and Triplet loss functions are loss functions commonly used in training a neural network model in the field of face recognition. The objective of the Softmax loss function is to maximize the probability of correct classification, namely, the vehicle type or vehicle identity category of each vehicle image is required to be classified as correctly as possible; the triple loss function is based on the similarity measure, and the objective is to achieve that the inter-class distance is smaller than the intra-class distance, i.e. the distance between image samples belonging to the same vehicle type or vehicle is required to be as small as possible and smaller than the distance between image samples belonging to different vehicle types or vehicles. The two loss functions can guide the GoogLeNet network model to output characteristics with different representation capabilities, and therefore robustness of the vehicle retrieval system is improved.

In one embodiment, the sizes of the convolution signatures in S15 are all 7 × 1024. In order to strengthen the identification capability of the feature map, a selective convolution feature descriptor aggregation algorithm is adopted. The algorithm specifically comprises the following steps: firstly, summing convolution feature maps with the size of 7 x 1024 along the channel direction to obtain a total feature map with the size of 7 x 7, then calculating an average value of the total feature map as a threshold, carrying out binarization on the total feature map by using the threshold to obtain a binary mask image with the size of 7 x 7, taking out feature descriptors of positions corresponding to the convolution feature maps according to the positions with the value of 1 in the mask image, wherein the descriptors are 1024-dimensional feature vectors, then respectively adopting an average value aggregation method and a maximum value aggregation method to all the taken out feature descriptors to obtain 2 1024-dimensional aggregation vectors, and finally splicing the two vectors to obtain 2048-dimensional feature vectors to output.

In one embodiment, the normalization in S15 may be performed using a L2 norm normalization, which is commonly used in the art. The six L2 normalized vectors are spliced into a final bayonet vehicle feature vector representation with dimension 12288. Fig. 3 is a diagram of a processing procedure of the feature representation based on the multi-target region feature aggregation algorithm in S15.

In the preferred embodiment, the target vehicle image to be retrieved in S16 and the candidate query images in the database are all bayonet vehicle full images output by the vehicle target detection model, and each image only contains one vehicle target. When there are M target vehicle images to be retrieved and N candidate query images in the database, the size of the similarity matrix is M x N, the value of the similarity matrix is taken to represent the similarity score, and the cosine similarity is obtained by calculating the cosine similarity between the 12288-dimensional feature vector of the corresponding target vehicle image and the 12288-dimensional feature vector of the candidate query image. Similarity scores range from 0 to 1, with closer to 1 indicating more similarity between the two images. The cosine similarity score is calculated as:

wherein F₁And F₂Respectively are the feature vectors of the image to be retrieved and the query image.

Because the sorting results with the similarity from high to low are directly output, the wrong sorting position of the images of the retrieval results is generally arranged at the front, and the correct sorting position of the images of the retrieval results is arranged at the back. To address this problem, reordering of the similarity scores is typically required. In the preferred embodiment, the reordering algorithm may be a k-recurrocal algorithm in the field of pedestrian re-identification. And sequentially outputting a retrieval image list according to the reordered similarity score matrix.

In another embodiment, a bayonet vehicle retrieval system based on multi-target region feature aggregation is further provided, which is used for implementing the bayonet vehicle retrieval method in the foregoing embodiments, and includes:

the system comprises a characteristic vector construction module of the bayonet vehicle, a convolution characteristic diagram of a corresponding input image is extracted according to models obtained in a vehicle type characteristic extraction module and a vehicle identity characteristic extraction module, a selective convolution characteristic descriptor aggregation algorithm is adopted, characteristic aggregation is respectively carried out to be used as final characteristic vector representation, finally normalization is respectively carried out, and then the characteristic vectors are connected into a unified characteristic vector to be used as the final characteristic representation of the bayonet vehicle;

and the vehicle retrieval ordering and grading result obtaining module is used for respectively extracting the features of the target vehicle image to be retrieved and the candidate query image in the database by using the vehicle target detection database constructing module, the vehicle head target detection database constructing module, the vehicle type feature extracting module, the vehicle identity feature extracting module and the final feature representation constructing module of the bayonet vehicle to obtain the feature vectors of the target vehicle image to be retrieved and the candidate query image, then calculating the feature similarity matrix of the target vehicle image to be retrieved and the candidate query image, and obtaining the final vehicle retrieval ordering and grading result by adopting a reordering algorithm.

It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art may refer to the technical solution of the system to implement the step flow of the method, that is, the embodiment in the system may be understood as a preferred example for implementing the method, and details are not described herein.

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices provided by the present invention in purely computer readable program code means, the method steps can be fully programmed to implement the same functions by implementing the system and its various devices in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices thereof provided by the present invention can be regarded as a hardware component, and the devices included in the system and various devices thereof for realizing various functions can also be regarded as structures in the hardware component; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and not to limit the invention. Any modifications and variations within the scope of the description, which may occur to those skilled in the art, are intended to be within the scope of the invention.

Claims

1. A bayonet vehicle retrieval method based on multi-target region feature aggregation is characterized by comprising the following steps:

s12: according to the vehicle position detection and positioning result in the S11, a bayonet vehicle area in the image is divided, and a vehicle head target detection database is further constructed and used for training a vehicle head target area detection model to realize vehicle head positioning;

2. The bayonet vehicle retrieval method based on multi-target region feature aggregation according to claim 1, wherein two loss functions are used for training in the step S14, wherein the vehicle identification model based on the deep convolutional neural network is trained according to a full image of a bayonet vehicle, and two vehicle identification models are obtained.

3. The bayonet vehicle retrieval method based on multi-target regional feature aggregation according to claim 1, wherein the bayonet vehicle target detection database in S11 includes: the method comprises the steps of obtaining video frame images from a monitoring camera, and carrying out manual labeling on all vehicle targets in each image to form a rectangular surrounding frame, wherein all the video frame images and labeling samples are divided into a training set and a testing set and are used for training a target detection model.

4. The bayonet vehicle retrieval method based on multi-target region feature aggregation according to claim 3, wherein the vehicle target detection model in S11 is specifically: extracting a feature map of the input image by using a deep convolutional neural network, and establishing a regression model for training and testing model parameters by using the feature map and the artificially labeled vehicle target rectangular surrounding frame;

5. The bayonet vehicle retrieval method based on multi-target region feature aggregation according to claim 4, wherein the vehicle position detection and positioning result in S12 comprises a rectangular surrounding frame of all vehicle targets appearing in all video frame images captured by a camera.

6. The bayonet vehicle retrieval method based on multi-target regional feature aggregation according to any one of claims 1 to 5, wherein the vehicle head target detection database in the S12 includes: and the full images of the vehicles at the bayonet, which are obtained from the vehicle detection and positioning results and the video frame images obtained in the step S11, and the rectangular surrounding frame for manually labeling all the vehicle head targets in each image, wherein all the full images of the vehicles at the bayonet and the labeling samples are divided into a training set and a testing set for training a target detection model.

7. The bayonet vehicle retrieval method based on multi-target region feature aggregation according to claim 6, wherein the vehicle head target detection model in the S12 is specifically: extracting a feature map of the input image by using a deep convolutional neural network, and establishing a regression model for training and testing model parameters by using the feature map and the artificially labeled vehicle head target rectangular surrounding frame;

8. The bayonet vehicle retrieval method based on multi-target region feature aggregation according to claim 1, wherein the similarity calculation in the feature similarity matrix of the two calculations in S16 adopts cosine similarity, and the calculation formula is as follows:

9. The bayonet vehicle retrieval method based on multi-target region feature aggregation according to claim 1, wherein the reordering algorithm in S16 is specifically: k-recurrocal algorithm in the field of pedestrian re-identification.

10. A bayonet vehicle retrieval system based on multi-target regional feature aggregation is characterized by comprising: