CN112071075B - Escaping vehicle weight identification method - Google Patents

Escaping vehicle weight identification method Download PDF

Info

Publication number
CN112071075B
CN112071075B CN202010595381.8A CN202010595381A CN112071075B CN 112071075 B CN112071075 B CN 112071075B CN 202010595381 A CN202010595381 A CN 202010595381A CN 112071075 B CN112071075 B CN 112071075B
Authority
CN
China
Prior art keywords
view
vehicle
global
path
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010595381.8A
Other languages
Chinese (zh)
Other versions
CN112071075A (en
Inventor
孙伟
代广昭
戴亮
张旭
常鹏帅
张国策
陈旋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202010595381.8A priority Critical patent/CN112071075B/en
Publication of CN112071075A publication Critical patent/CN112071075A/en
Application granted granted Critical
Publication of CN112071075B publication Critical patent/CN112071075B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/017Detecting movement of traffic to be counted or controlled identifying vehicles
    • G08G1/0175Detecting movement of traffic to be counted or controlled identifying vehicles by photographing vehicles, e.g. when violating traffic rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a method for identifying the weight of an escaping vehicle, which comprises the following steps: (1) Constructing a target camera topological network, and predicting a related camera track; (2) Based on the measurement learning of visual angle perception, learning two different depth measurements in an S-view same-visual angle sample and a D-view cross-visual angle sample; (3) Vehicle weight recognition under adaptive attention based on two paths of paths; and (3) the double paths comprise a global path and a local path, in the step (2), double-path vehicle re-identification is respectively carried out in the S-view same-view and D-view cross-view feature space, the global path extracts the global features of the picture, and the local path is used for global feature supplement. According to the method, a key monitoring area with optimal time sequence is obtained by constructing a topological network of the suspicious vehicle camera; different loss functions are applied by utilizing depth measurement learning, the self-adaptive attention model is added, the re-recognition task is carried out, the walking track of the vehicle is obtained, and the accuracy of re-recognition of the escaping vehicle is improved.

Description

Weight recognition method for escaping vehicles
Technical Field
The invention relates to a vehicle weight identification method, in particular to an escaping vehicle weight identification method.
Background
Along with the development of science and technology and the improvement of the living standard of people, the use frequency and the occupancy rate of automobiles are gradually increased, and meanwhile, the awareness of people on traffic safety and the processing scheme are correspondingly improved. Once a traffic accident occurs, how to contact artificial intelligence recognition to quickly and accurately process the standardization and the intellectualization of the traffic accident is very important.
In recent years, with the introduction of large data sets and the development of deep learning algorithms, as well as the widespread use of traffic cameras, vehicle re-identification based on deep learning has enjoyed significant success over the past decade. The vehicle re-identification technology has great application potential in the fields of urban safety monitoring and intelligent traffic monitoring, particularly in the task of accurately and quickly re-identifying vehicles causing traffic accidents and escaping.
In view of the non-obvious differences between different vehicles, vehicle weight identification remains a very challenging task, especially in the case of large data volumes. This work faces significant challenges, and first, appearance-based approaches tend to yield unsatisfactory results because differences between different vehicle classes taken from similar perspectives are small, while differences within the same vehicle class taken from different perspectives are large. Although depth metric learning has been successful in feature acquisition studies of viewing angle changes, it is still very challenging for extreme viewing angle changes (e.g., 180 °) of vehicles, where the impact of viewing angle changes on the accuracy of the re-identification task is significant. Second, some subtle cues such as tire models, window stickers, window borders and some custom decorations in the car are difficult to obtain in the global characterization of appearance inspection. Most extreme is that different vehicles may possess similar colors and shapes, especially vehicles from the same manufacturer in different years having similar appearances but with some locally small decorations different. Therefore, when the vehicle weight recognition model is used for making a decision, feature acquisition when the view angle changes and adaptive attention to the vehicle weight recognition key part are very important.
In the vehicle weight recognition, not all key points provide recognition information, and the direction of the vehicle in the query picture is a determining factor for selecting the key points, so that the recognition process of the vehicle weight recognition model needs to learn the ability of learning the perspective perception metric and focus attention on the differentiated parts.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a method for re-identifying escaping vehicles by combining vehicle re-identification with rapid positioning of hit-and-run vehicles in traffic accidents.
The technical scheme is as follows: the invention discloses a vehicle weight recognition method, which comprises the following steps: 1. constructing a target camera topological network, and predicting a key monitoring area of a vehicle escaping due to a hit-and-run accident; 2. based on the metric learning of visual angle perception, learning the depth metrics under two different visual angle constraints in an S-view same-visual angle sample and a D-view cross-visual angle sample respectively; 3. vehicle weight recognition under adaptive attention based on two paths of paths; the double paths in the step 3 comprise a global path and a local path, vehicle weight recognition of the global path and the local path is respectively carried out in the S-view same-view and D-view cross-view feature space in the step 2, the global path extracts picture global features, and the local path extracts local differentiation features for global feature supplement through self-adaptive attention.
In the step 1, the monitoring detection range of the target is narrowed through the time transition probability among the cameras, wherein the monitoring detection range is a key monitoring area, and the method specifically comprises the following steps:
step 1.1: establishing road section information of a vehicle monitoring scene to be inquired and a network topological structure of multiple cameras through a map and an actual camera view in data;
step 1.2: the suspicious vehicles in the monitoring circle are tracked through a monitoring system, and the key point is that after hit-and-run vehicles are observed from an initial position, the positions of cameras where the next or a plurality of hit-and-run vehicles appear need to be determined, and the cameras which possibly appear are associated;
step 1.3: and analyzing and sequencing the probability of the vehicles to be queried for hit-and-run to appear in the associated camera set, and finding a small number of cameras with the optimal time sequence relation as key monitoring areas.
And (4) after the step (1.3) is completed, executing the step 1 and the step 3, and updating the key monitoring circle after the vehicle is identified.
Step 2 is provided with a two-way network, and the input vehicle image is mapped to two characteristic space areas, and the method specifically comprises the following steps:
step 2.1: inputting a picture of a vehicle to be inquired, firstly predicting an absolute visual angle of each image by using a visual angle classifier, and dividing the visual angle into front, side or rear; if the image pair is from the same/similar view angle, classifying as S-view pair, otherwise, D-view pair;
step 2.2: sending the image classified into the S-view pair into an S-view characteristic space for S-view same-view constraint training, and sending the image classified into the D-view pair into a D-view characteristic space for D-view cross-view constraint training;
step 2.3: attention feature fusion is respectively carried out in the two feature spaces S-view and D-view, and a fusion attention model of the feature space S-view and a fusion attention model of the specially Huizoni space D-view are respectively obtained.
In the step 3, a double-path self-adaptive attention model is added in an S-view feature space and a D-view feature space respectively to identify the vehicle weight, a global appearance path captures global features of the vehicle appearance, a local appearance path with directional constraint learns to capture local differentiation features, and vehicles which are inconsistent with the appearance of the query vehicle are filtered; the method specifically comprises the following steps:
step 3.1: the main network uses ResNet-50 and ResNet-101 as baseline models at the same time, pre-trains in a VehclelID data set, and then extracts the global characteristics f of the vehicle g
Step 3.2: using a two-stage model to estimate key points and orientations of the vehicle, and comprising the following two steps of:
step 3.2.1: the convolutional network based on VGG-19 is used to make a rough hot spot map estimate for 21 classes, 21 classes include 20 key points and 1 background, the convolutional network based on VCG-19 is trained using a pixel-by-pixel multi-class cross entropy loss function, the loss function is:
Figure GDA0003730056710000021
in the formula I i,j Is a vector of corresponding pixel locations (i, j) on all output channels,
Figure GDA0003730056710000022
is a ground truth label of each pixel position, H and W respectively represent the height and width of the hot spot diagram, x i,j (k) A predictor representing the corresponding pixel location (i, j) on all output channels;
step 3.2.2, down-sampling the input image through HRnet and refine the rough key point and direction in step 3.2.1;
step 3.3: self-adaptive key points are selected, local microscopic features are extracted, and the directions of the vehicles are divided into 8 types: front, rear, left, left front, left rear, right, right front and right rear, designing a keypoint selector, and adaptively selecting keypoints based on the prediction direction;
step 3.4: and (3) respectively adding the self-adaptive attention appearance detection models trained in the steps 3.1, 3.2 and 3.3 into the S-view and D-view feature space of the step 2 for joint optimization.
Has the advantages that: compared with the prior art, the invention has the following remarkable effects: 1. constructing a suspicious vehicle camera topological network, and obtaining a key monitoring area with optimal time sequence by calculating the occurrence frequency between the associated cameras; 2. different loss functions are respectively used for depth measurement learning at different visual angles, and a self-adaptive attention model is added in the process for accurate searching, so that not only can a re-identification task be performed, but also the traveling track of a vehicle can be obtained; 3. the network performance is good, the generalization capability is strong, and the accuracy of re-identification of escaping vehicles is effectively improved; 4. provided is an escaping vehicle weight recognition method.
Drawings
FIG. 1 is a schematic diagram of the identification process of the present invention;
FIG. 2 is a diagram of a partial camera node-arc model of the present invention;
FIG. 3 is a diagram of a model for dual-path appearance detection based on adaptive attention strategy according to the present invention;
FIG. 4 is a diagram of an adaptive attention model under the learning based on the perspective perception metric according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
1. Building network structures and models
The vehicle re-identification in the artificial intelligence field is combined with the rapid positioning of the vehicle causing the traffic accident in the traffic accident. And giving out a suspected vehicle picture by using a pre-trained network, and accurately identifying the image in the database of the suspected vehicle by using a vehicle re-identification algorithm.
The camera network is combined with a Geographic Information System (GIS), a camera network topological structure is constructed, a camera group with the optimal time sequence relation is obtained as a key monitoring area by calculating the transfer time frequency between cameras in the identification process, and the vision field of which cameras the vehicle appears in can be actively predicted.
The method comprises a self-adaptive attention strength learning model based on visual angle perception and two constraint modes: are co-view constraints (S-view) and cross-view constraints (D-view), respectively, and are fused with an adaptive attention model, trained with triplet loss and cross-entry loss.
During appearance detection, a double-path self-adaptive attention model is adopted. Firstly, extracting the macroscopic features of the vehicle through the global appearance path; and capturing local difference features through the local appearance path of the directional constraint, and finally performing joint optimization on the global features and the local features.
2. Implementation procedure
Step 1, constructing a target camera topological network
Step 1.1, a road network is used for expressing reachable space of a city, and then a road network is expressed by using a node arc section model to form a node-arc section model. The center line in the road section represents a road (namely an arc section), and the node represents an intersection of the road section. The mathematical model is represented as Q (N, a), where N = (N) 1 ,n 2 ,...,n m ) Represents a set of nodes, and a set of arc segments is A = (a) 1 ,a 2 ,...,a m ). In practice, due to complexity of camera distribution and uncertainty of view boundary, cameras overlap and abutOr not adjacent, the topological difficulty of the camera is increased. According to the invention, the vision fields of the camera are divided into 4 types according to the condition of the monitoring picture of the camera, namely an intersection area (A), a one-way lane (B), a single roadway (C) and a two-way roadway (D). One camera and its field of view are considered as a node, and each camera node is given 3 attributes, arc (representing the street where the camera is located), node (representing the nearest intersection to which the camera belongs), and offset (the distance between the camera node and the nearest intersection). According to the above definition, the google map is added with the actual road camera view to construct the camera topology structure of hit-and-run vehicles.
Step 1.2, defining a search surrounding ring of the camera as O:
O=(F Ψ ,G(g s )) (1)
in formula (1), G (G) s ) Is the set of view nodes associated with respect to the starting camera node.
Figure GDA0003730056710000041
In the formula (2), g e Searching a relevant camera view node;
Figure GDA0003730056710000042
is the road termination point.
Figure GDA0003730056710000043
In formula (3), F ψ To start the number of paths between the camera and all cameras,
Figure GDA0003730056710000044
is g s And g e Monitoring all path sets in the enclosure;
Figure GDA0003730056710000045
is represented by g s To g e In one of themThe strip path, namely the starting and ending positions of the path must be the positions of the camera nodes, and no other nodes except the starting and ending camera nodes can exist between the paths.
The definition shows that the searching is started from the initial camera where the hit-and-run vehicle is located, and the monitoring enclosure is formed along different searching paths until the node of the searching camera or the road termination point; if the hit-and-run vehicle reaches the end of the road along the path for installing the camera or stops in the monitoring enclosure, the monitoring enclosure is regarded as a run-away vehicle area.
Step 1.3, carrying out collaborative analysis between cameras, wherein the specific process can be divided into three steps:
step 1.3.1, determining the initial position of the vehicle where the hit-and-run vehicle is found, then sorting according to the shortest path between the start-stop camera and the associated camera, and obtaining the spatial relationship between the cameras according to the Google map.
Step 1.3.2, calculate probability function
Figure GDA0003730056710000051
In the formula (4), the reaction mixture is,
Figure GDA0003730056710000052
is a camera g e Number of paths to the starting camera, F ψ The number of paths between the starting camera and all cameras.
P(g e ) It is indicated that the hit-and-run vehicle starts from the initial camera and appears in the next moment with other probability associated to the camera, i.e. the higher the probability the hit-and-run vehicle is more likely to appear in the camera field of view. And finding the first 6 camera nodes with the maximum probability, sequencing in sequence, and regarding the area where the camera nodes are located as a key monitoring area.
Step 1.3.3, calculating the time from the initial camera to the key monitoring area camera obtained in the step 1.3.2 according to the path length and the speed ratio of the vehicle, and sequencing the cameras from small to large in sequence, wherein a first sequencing result is assumed to be C 2 >C 5 >C 8 >C 9 >C 10 >C 15 …C n . Setting from the initial camera to the camera C 2 Need to pass through t 0 Time, when t passes 0 If the moving vehicle is not in the camera C 2 If the vehicle appears, the vehicle is excluded from appearing C 2 The possibility of (a); let time of all cameras subtract t simultaneously 0 Then, reordering is carried out to obtain a second group of key monitoring cameras C 5 >C 8 >C 9 >C 10 >C 15 >C 20 …C n And continuing to observe. If the hit-and-run vehicle appears in the visual field of the camera, the tracking is successful, and the camera is replaced by the initial camera, the step 1.3.1 is returned, and the process is circulated; if no hit-and-run vehicle appears in all camera areas, it is indicated that the hit-and-run vehicle has disappeared within the search range. And finally, after the escape vehicle camera topological network is constructed, the steps 2 and 3 are carried out, and a vehicle re-identification related model is trained.
Step 2, learning based on visual angle perception measurement
The commonly used triplet loss is adopted to establish baseline for metric learning, cross-entry loss is added in vehicle classification, and the loss function is jointly optimized by using the triplet loss and the cross-entry loss. In different feature spaces, namely same-view S-view and cross-view D-view, two different depth measures are respectively learned for the input image. Denote the dataset by X, I = (X) i ,x j ) Representing a pair of images, in which x i ,x j Belongs to X; the function f represents the mapping of the original image to the feature space, respectively f s ,f d (ii) a D is the Euclidean distance between features, D (I) = D (x) i ,x j )=||f(x i )-f(x j )|| 2 Sample distances D corresponding to the respective feature spaces S-view and D-view s (I)、D d (I) Respectively:
D s (I)=||f s (x i )-f s (x j )|| 2 (5)
D d (I)=||f d (x i )-f s (x j )|| 2 (6)
given three samples x, x + ,x - Constructing a triple, wherein x, x + Samples represented as the same category (same vehicle),
Figure GDA0003730056710000053
for an S-view positive sample pair,
Figure GDA0003730056710000054
is a D-view positive sample pair, x - For samples of different classes
Figure GDA0003730056710000055
For the pair of S-view negative samples,
Figure GDA0003730056710000056
for D-view sample pairs, positive sample pairs I + =(x,x + ) Negative sample I - =(x,x - ) Defining a triple loss formula:
Figure GDA0003730056710000061
in the formula (7), α represents two Euclidean distances D (I) + ) And D (I) - ) A minimum spacing therebetween.
Step 2.1, constructing a triple in the Veni-776 data set, wherein to identify the view angle relationship between each image pair (i.e. the triple), the view angle of the vehicle should be calculated first, and the view angle classifier is used to roughly divide the view angle of the vehicle in all the images into 3 view angles: front, side, rear. The perspective classifier adopts RegNet as a basic network, and a perspective classification loss function is trained by a cross entropy loss function (at the moment, only the visual angle is roughly classified, and the vision classification is performed in a vehicle re-identification stage based on self-adaptive attention). An image pair is classified as an S-view pair if it is from the same or similar view angle, otherwise it is a D-view pair.
Step 2.2, the already obtained in step 2.1The view-aware N images map a series of stacked convolutional layers, then add two convolutional branches (different feature spaces) to convert all pictures to 2N, and then calculate the respective sample distances D in the corresponding feature spaces s (I)=||f s (x i )-f s (x j )|| 2 ,D d (I)=||f d (x i )-f d (x j )|| 2 For view-aware metric learning, two constraints are employed: co-view constraints and cross-view constraints.
And (5) constraint with the same view angle: ensuring D (P) in both feature spaces + )<D(P - ) Always hold (sample pair at same view angle) triple loss function L in S-view, D-view feature space s 、L d Respectively as follows:
Figure GDA0003730056710000062
Figure GDA0003730056710000063
and (4) cross-view constraint: when the image pairs are from different visual angles, D (P) is still remained when the sample pairs are respectively in different feature spaces + )<D(P - ) Always true, corresponding triple loss function L cross Comprises the following steps:
Figure GDA0003730056710000064
step 2.3, combining the triplet loss function of step 2.1 and step 2.2
L trplet =L s +L d +L cross (11)
In the data set X, N vehicles of the category (ID) are defined, for a given input picture X, corresponding to a vehicle, the label is y, and cross-entry loss is used for punishing wrong vehicle identification prediction, so that the accuracy of the vehicle identification prediction is improved. The corresponding loss function is:
Figure GDA0003730056710000065
in the formula (12), p i To input the group route tag corresponding to the sample picture x,
Figure GDA0003730056710000066
is a predicted value;
the loss function is jointly optimized by using the tree loss and cross-entry loss together:
L view =ωL softmax +(1-ω)L trplet (13)
in the formula (13), L view ω is L based on the loss function at the viewing angle view ω =0.25.
Step 3, recognizing the weight of the vehicle based on self-adaptive attention
And (3) adding self-adaptive attention models into two different S-view and D-view feature spaces obtained in the step (2) respectively to identify the vehicle weight.
Step 3.1, global feature extraction (f) g ) And taking ResNet-50 and ResNet-101 as main networks and also as baseline models, pre-training the main networks in a VehicleiD data set, wherein the data set for vehicle re-identification is VeRi-776. The 2048-dimensional feature vectors from the last convolutional layer are input into a shallow, multi-layered perceptron trained using L2 softmax loss.
Step 3.2, in the step, key points and orientation estimation in the attention strategy required by step 3.1 are obtained, and the specific steps can be divided into two steps:
step 3.2.1, using a full convolution network based on VGG-19 to perform a rough H × W (64 × 64) hot spot map estimation on the picture, the result is 21 types (N) 1 =21, which includes 20 keypoints and 1 background), the network is trained using a pixel-by-pixel multi-class cross-entropy loss function, the loss function being:
Figure GDA0003730056710000071
in the formula (14), l i,j Is a vector of corresponding pixel locations (i, j) on all output channels,
Figure GDA0003730056710000072
is a ground route label of each pixel position, H and W respectively represent the height and width of the hot spot diagram, x i,j (k) Representing the predicted value of the corresponding pixel location (i, j) on all output channels.
Step 3.2.2, the HRNet network is fine-tuned, and compared with stacked hourglass structures and other methods for recovering high-resolution representations from low-resolution representations, the HRNet retains the high-resolution representations and gradually increases subnets of high to low resolution through multi-scale fusion. Therefore, the predicted key points and the hotspot graph are more accurate and more accurate in space, and meanwhile, the perspective predicted by the perspective classifier in the perspective perception metric learning is refined.
The input image is down sampled by HRNet and the coarse keypoint and direction estimates are redefined in step 3.2.1. The refine rear vehicle perspective results can be refined into 8 categories: rear, left, left front, left rear, right, right front and right rear. To train the redefinement of the hotspot graph and the estimation of the directional branches in step 3.2.2, the mean square error and cross entropy loss functions were used respectively but excluding the training of the background picture.
Figure GDA0003730056710000073
Representing the loss function in step 3.2.2:
Figure GDA0003730056710000074
and then
Figure GDA0003730056710000081
Regression loss representing a hotspot graph:
Figure GDA0003730056710000082
Figure GDA0003730056710000083
represents the directional classification loss function:
Figure GDA0003730056710000084
in formula (15); mu is a hyper-parameter balancing two losses, and the value is 11;
in the formula (16), H and W represent the height and width of the hotspot chart, respectively, and N 2 =N 1 -1=20,h k (i, j) and
Figure GDA0003730056710000085
the predicted hotspot graph and the real hotspot graph of the k key points (i, j) in the step 3.2.2 are respectively;
in the formula (17), p * And N p Respectively representing the predicted orientation vector, the corresponding true vector and the number of direction classes, p (p) * ) Representing the probability that the predicted azimuth is the true azimuth, and p (i) is the probability of one direction in the predicted azimuth vector.
And 3.3, dividing the vehicle into 8 types according to the statistical view angle and direction: real, left, left front, right front and right front, however, there is no clear boundary between the two orientations but there is a difference in the key points taken. In order to overcome the problem, a key point selector is designed, and key points can be selected adaptively based on the predicted azimuth possibility, which is implemented as follows: for each set of squares, the top 8 keypoints are counted. Given the set of highest likelihood orientation (orientations calculated by step 3.2), each orientation set picks up 8 keypoints by the keypoint selector. Then, inputting the heat point map of the 8 key points, and then, using the deeper network blocks (Res 3, res4 and Res 5) in another ResNet for extracting the local feature f l . Finally, the local feature f l And global feature f g Connecting and performing combined optimization through L2 softmax loss multi-layer perceptron, and optimizing function
Figure GDA0003730056710000086
Comprises the following steps:
Figure GDA0003730056710000087
step 3.4, extracting global feature f g Then extracting the local feature f of the orientation constraint l (by adaptive attention strategies); the 2048-dimensional feature vectors from the last convolutional layer are input into a shallow multi-layered perceptron trained using L2 softmax loss.
Step 3.5, respectively adding the self-adaptive attention appearance detection models trained in the steps 3.1, 3.2 and 3.3 into the S-view and D-view feature spaces of the step 2 (view perception metric learning), and finally performing joint optimization, wherein an optimization function L is as follows:
Figure GDA0003730056710000088
and finally, predicting the most probable position of the target vehicle through the multi-camera topological structure in the step 1, and acquiring a key monitoring area. And (4) applying the vehicle re-identification model trained in the step (2) and the step (3) to a suspicious vehicle database to identify the vehicle again, and returning to the step (1) to update the camera topological network to obtain the key monitoring area for continuously tracking the vehicle after the hit vehicle and the position (camera position) are found.

Claims (4)

1. A method for re-identifying an escaping vehicle is characterized by comprising the following steps: (1) Constructing a target camera topological network, and predicting key monitoring areas of vehicles escaping due to accidents; (2) Based on the metric learning of visual angle perception, learning the depth metrics under two different visual angle constraints in an S-view same-visual angle sample and a D-view cross-visual angle sample respectively; (3) Vehicle weight recognition under self-adaptive attention based on double paths; the double paths in the step (3) comprise a global path and a local path, vehicle weight recognition of the global path and the local path is respectively carried out in a cross-view feature space based on the S-view same-view and D-view in the step (2), the global path extracts picture global features, and the local path extracts local differential features for global feature supplement through self-adaptive attention;
in the step (1), the monitoring detection range of the target is narrowed through the time transition probability between the cameras, the monitoring detection range is a key monitoring area, and the method specifically comprises the following steps:
step 1.1: establishing road section information of a vehicle monitoring scene to be inquired and a network topological structure of multiple cameras through a map and an actual camera view in data;
step 1.2: the suspicious vehicles in the monitoring circle are tracked through a monitoring system, and the key point is that after hit-and-run vehicles are observed from the initial position, the positions of cameras where the next hit-and-run vehicle or vehicles appear need to be determined, and the cameras which possibly appear are associated;
step 1.3: and analyzing and sequencing the probability of the hit-and-run vehicle to be inquired appearing in the associated camera set, and finding a small number of cameras with the optimal time sequence relation as key monitoring areas.
2. The escaping vehicle weight identification method according to claim 1, characterized in that after the step (1.3) is completed, the steps (2) and (3) are executed, and the key monitoring circle is updated after the vehicle is identified.
3. The escaping vehicle weight recognition method as claimed in claim 1, wherein a two-way network is provided in step (2) to map the input vehicle image to two feature space regions, specifically comprising the steps of:
step 2.1: inputting a picture of a vehicle to be inquired, firstly predicting an absolute visual angle of each image by using a visual angle classifier, and dividing the visual angle into front, side or rear; if the image pair is from the same/similar view angle, classifying as S-view pair, otherwise, D-view pair;
step 2.2: sending the image classified into the S-view pair into an S-view characteristic space for S-view same-view constraint training, and sending the image classified into the D-view pair into a D-view characteristic space for D-view cross-view constraint training;
step 2.3: and respectively carrying out attention feature fusion in the two feature spaces S-view and D-view to respectively obtain a fusion attention model of the feature space S-view and a fusion attention model of the feature space D-view.
4. The escape vehicle weight recognition method according to claim 1, wherein in the step (3), a double-path adaptive attention model is added to the feature spaces of S-view and D-view respectively for vehicle weight recognition, global appearance paths capture global features of vehicle appearances, directionally-constrained local appearance path learning captures local differentiated features, and vehicles which do not accord with the appearance of the query vehicle are filtered out; the method specifically comprises the following steps:
step 3.1: the main network uses ResNet-50 and ResNet-101 as well as Baseline models, is pre-trained in a VehcleID data set and then is used for extracting the global feature f of the vehicle g
Step 3.2: the method comprises the following steps of (1) estimating key points and orientations of a vehicle by using a two-stage model:
step 3.2.1: the VGG-19 based convolutional network is used to make a rough hotspot graph estimation for 21 classes, the 21 classes include 20 key points and 1 background, the VGG-19 based convolutional network is trained using a pixel-by-pixel multi-class cross-entropy loss function, the loss function is:
Figure FDA0003730056700000021
in the formula I i,j Is a vector of corresponding pixel locations (i, j) on all output channels,
Figure FDA0003730056700000022
is per pixel bitPut ground truth label, H, W represent height and width, x of the heat point diagram respectively i,j (k) A predictor representing the corresponding pixel location (i, j) on all output channels;
step 3.2.2: down-sampling the input image by HRNet and refine the coarse keypoints and directions in step 3.2.1;
step 3.3: self-adaptive key points are selected, local microscopic features are extracted, and the directions of the vehicles are divided into 8 types: front, rear, left, left front, left rear, right, right front and right rear, designing a key point selector, and adaptively selecting key points based on the prediction direction;
step 3.4: and (3) respectively adding the self-adaptive attention appearance detection models trained in the steps 3.1, 3.2 and 3.3 into the S-view and D-view feature space of the step (2) for joint optimization.
CN202010595381.8A 2020-06-28 2020-06-28 Escaping vehicle weight identification method Active CN112071075B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010595381.8A CN112071075B (en) 2020-06-28 2020-06-28 Escaping vehicle weight identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010595381.8A CN112071075B (en) 2020-06-28 2020-06-28 Escaping vehicle weight identification method

Publications (2)

Publication Number Publication Date
CN112071075A CN112071075A (en) 2020-12-11
CN112071075B true CN112071075B (en) 2022-10-14

Family

ID=73656156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010595381.8A Active CN112071075B (en) 2020-06-28 2020-06-28 Escaping vehicle weight identification method

Country Status (1)

Country Link
CN (1) CN112071075B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114091548A (en) * 2021-09-23 2022-02-25 昆明理工大学 Vehicle cross-domain re-identification method based on key point and graph matching
CN114399537B (en) * 2022-03-23 2022-07-01 东莞先知大数据有限公司 Vehicle tracking method and system for target personnel

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171247B (en) * 2017-12-21 2020-10-27 北京大学 Vehicle re-identification method and system
CN109165589B (en) * 2018-08-14 2021-02-23 北京颂泽科技有限公司 Vehicle weight recognition method and device based on deep learning
CN109740653B (en) * 2018-12-25 2020-10-09 北京航空航天大学 Vehicle re-identification method integrating visual appearance and space-time constraint
CN110795580B (en) * 2019-10-23 2023-12-08 武汉理工大学 Vehicle weight identification method based on space-time constraint model optimization
CN110826484A (en) * 2019-11-05 2020-02-21 上海眼控科技股份有限公司 Vehicle weight recognition method and device, computer equipment and model training method

Also Published As

Publication number Publication date
CN112071075A (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN110020651B (en) License plate detection and positioning method based on deep learning network
Laddha et al. Map-supervised road detection
Kühnl et al. Monocular road segmentation using slow feature analysis
CN105930833B (en) A kind of vehicle tracking and dividing method based on video monitoring
CN101701818B (en) Method for detecting long-distance barrier
CN110910378B (en) Bimodal image visibility detection method based on depth fusion network
CN110956651A (en) Terrain semantic perception method based on fusion of vision and vibrotactile sense
CN104239867B (en) License plate locating method and system
CN112071075B (en) Escaping vehicle weight identification method
CN110889398B (en) Multi-modal image visibility detection method based on similarity network
CN110795580B (en) Vehicle weight identification method based on space-time constraint model optimization
CN111310728B (en) Pedestrian re-identification system based on monitoring camera and wireless positioning
CN111968046B (en) Target association fusion method for radar photoelectric sensor based on topological structure
CN112863186B (en) Vehicle-mounted unmanned aerial vehicle-based escaping vehicle rapid identification and tracking method
CN112419317B (en) Visual loop detection method based on self-coding network
Xue et al. A novel multi-layer framework for tiny obstacle discovery
Liao et al. Lr-cnn: Local-aware region cnn for vehicle detection in aerial imagery
CN114998993A (en) Combined pedestrian target detection and tracking combined method in automatic driving scene
Zhang et al. Front vehicle detection based on multi-sensor fusion for autonomous vehicle
CN107705327B (en) Candidate target extraction method of multi-camera network space-time model
CN116485894A (en) Video scene mapping and positioning method and device, electronic equipment and storage medium
CN116206297A (en) Video stream real-time license plate recognition system and method based on cascade neural network
Li et al. Enhancing feature fusion using attention for small object detection
CN115565157A (en) Multi-camera multi-target vehicle tracking method and system
CN115294560A (en) Vehicle tracking method and system based on attribute matching and motion trail prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant