CN117670938A - Multi-target space-time tracking method based on super-treatment robot - Google Patents
Multi-target space-time tracking method based on super-treatment robot Download PDFInfo
- Publication number
- CN117670938A CN117670938A CN202410125476.1A CN202410125476A CN117670938A CN 117670938 A CN117670938 A CN 117670938A CN 202410125476 A CN202410125476 A CN 202410125476A CN 117670938 A CN117670938 A CN 117670938A
- Authority
- CN
- China
- Prior art keywords
- target
- network
- traffic
- dimensional
- vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000012216 screening Methods 0.000 claims abstract description 21
- 230000003068 static effect Effects 0.000 claims abstract description 18
- 238000012544 monitoring process Methods 0.000 claims abstract description 8
- 238000001514 detection method Methods 0.000 claims description 34
- 239000013598 vector Substances 0.000 claims description 34
- 230000006870 function Effects 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 9
- 238000012800 visualization Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 241000189662 Calla Species 0.000 claims description 2
- 238000010586 diagram Methods 0.000 abstract description 8
- 238000004088 simulation Methods 0.000 abstract description 3
- 238000013135 deep learning Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 239000003086 colorant Substances 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000005303 weighing Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000005641 tunneling Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Abstract
The invention discloses a multi-target space-time tracking method based on a super-robot, which is characterized in that an integrated multi-level description network is constructed, vehicle targets with different scales are well described in static characteristics under a large-scale traffic scene, the static attributes such as the position, the category, the size and the like of each traffic target are efficiently obtained by combining a traffic scene target characteristic set and a key frame screening network, a model belonging to the traffic target is constructed, a certain traffic target is uniquely described, a CARLA simulation platform is adopted for a traffic monitoring camera, a large-scale traffic monitoring scene data set covering multiple scenes and the traffic targets is generated, complete target and scene characteristic information is provided for subsequent modeling, a multi-target tracking network is designed for video segments, a track model is established, a time-space diagram under a single camera is generated and updated, the fact that the camera cannot fully cover road sections is considered, and the traffic time-space diagram under multiple cameras is fused by utilizing the characteristic models of single vehicles and vehicle topologies under the covered road sections.
Description
Technical Field
The invention relates to the technical field of highways, in particular to a multi-target space-time tracking method based on a super-treatment robot.
Background
The overload control method is used for controlling the phenomenon of illegal overload of vehicles, the illegal overload transportation of the vehicles induces a large number of road traffic safety accidents, and the illegal overload not only has damage to roads, damage to transportation vehicles, damage to drivers and damage to normal competing commercial environments of transportation markets, but also has the defects of incomplete feature extraction, high false detection rate, poor robustness and the like when the traditional method mainly depends on the characteristics of manual design, has simple algorithm and small calculated amount and faces the factors such as camera shake, target shielding, illumination change, weather change and the like.
Therefore, a deep learning method is required to be introduced, the deep learning method has super-strong self-help learning capability and modeling capability of complex tasks, compared with the traditional method, the deep learning method has a large number of model parameters which can approach complex nonlinear relations, so that the model has stronger expression capability and higher accuracy of an algorithm, and meanwhile, the running speed of the algorithm is higher due to distributed storage and parallel computing technology, but the deep learning method still has the following problems.
(1) A camera full-automatic calibration method based on deep learning. The existing method is often focused on calibration under a general scene, and the specificity of the traffic scene is not considered, for example, the traffic scene comprises lane lines, vehicles and the like, and meanwhile, a data set for automatic calibration of a camera aiming at the traffic scene is still blank.
(2) Convolutional neural networks have the problems that convolutional features are sensitive to scale changes, interesting area pooling damages the feature structures of small objects, reverse propagation errors accumulate in the network training process and the like, and final feature extraction is incomplete due to the problems. Continuous target detection, feature extraction and modeling in traffic scenes still need to be further studied, and especially the detection precision of small targets is improved.
(3) Existing multi-target tracking methods are often divided into three phases, namely: the detection, feature modeling and association matching have serious dependency relationship in the three stages, and few methods are used for realizing the aim of integrated multi-target tracking by predicting the motion parameters of targets.
(4) The target detection network is often focused on positioning and identification, and most of the tasks cannot be completed at the same time, and the traffic static information description not only comprises the positions and classifications of the vehicle targets, but also comprises axles, vehicle colors and the like, so that the static information extraction based on the multi-task network is urgently needed to be studied.
Based on the reasons, the invention discloses a multi-target space-time tracking method based on a super-robot.
Disclosure of Invention
The invention aims to provide a multi-target space-time tracking method based on a super-robot.
The invention aims to solve the problems that: and obtaining more accurate camera calibration parameters. And then redesigning the deep neural network structure, improving the scale sensitivity of the deep neural network structure, and constructing a two-dimensional and three-dimensional detection integrated multi-level description network, so that the vehicle targets with different scales can be better described in static characteristics in a large-scale traffic scene. Meanwhile, combining a traffic scene target feature set and a key frame screening network, efficiently obtaining the static attributes of the position, the category, the size, the vehicle axle number, the license plate, the vehicle color and the like of each traffic target, constructing a feature model belonging to the traffic target, and describing a certain traffic target by using the feature model. Meanwhile, aiming at the traffic monitoring camera, a large-scale traffic monitoring scene data set covering various climatic conditions, traffic scenes and traffic targets is generated through the CARLA simulation platform, and complete target and scene characteristic information is provided for development of subsequent researches. And then, aiming at the video fragment, designing a multi-target tracking algorithm based on a graph network, establishing a track characteristic model, and generating and updating a space-time diagram under a single camera. And finally, considering that the cameras cannot fully cover road sections, and fusing traffic space diagrams under multiple cameras by utilizing the feature models of single vehicles and vehicle topologies under the covered road sections.
A multi-target space-time tracking method based on a super-robot comprises the following steps:
s1, establishing a traffic scene target feature set;
s2, screening key frames by a key frame screening network;
s3, constructing an integrated detection network, and integrally describing two-dimensional and three-dimensional attributes of traffic targets in multiple layers;
s4, modeling traffic target characteristics;
s5, constructing a multi-target tracking network to track the multi-target motion parameters;
s6, modeling the dynamic information characteristics of the vehicle.
Further, the traffic scene target feature set established in the step S1 includes an actual scene image feature set and a virtual traffic scene feature set, wherein the actual scene image is obtained by collecting high-definition images of actual road-bridge tunneling monitoring cameras at home and abroad, the virtual traffic scene feature set is simulated by a unmanned and automatic driving platform calla, and a virtual camera is placed in a virtual traffic scene, so that a virtual traffic scene video is generated by recording the virtual camera.
Further, the screening key frame network in S2 constructs a large-scale video frame screening image library, and sends the images to be screened and the previous frame image into the video frame screening image library to a convolution network to obtain a one-dimensional vector, the one-dimensional vector contains the features of two images, then the full-connection layer combines the image features and classifies the image features by softmax, and then a key frame result is output, and the video frame screening image library covers the two types of images by manual labeling: the system comprises useless frames and key frames, wherein the useless frames comprise a useless frame tag 0 and a key frame tag 1, the useless frames comprise frames with fewer traffic targets, frames without traffic targets and frames with road surface areas with lower occupation in images, and the key frames comprise frames with a large number of traffic targets in images and frames with road surface areas with higher occupation in images.
Further, the integrated detection network in S3 includes:
s31, inputting a traffic scene video frame, and generating a convolution feature map through a convolution layer;
s32, generating a group of suggestions based on the convolution feature map by using the regional proposal network RPN;
s33, predicting a boundary box possibly containing an object by using an area proposal network RPN, pooling by utilizing deconvolution and bilinear kernels, expanding small proposal area characteristics, avoiding the problem of insensitivity of a small target caused by representing the small traffic target by repeated values, and applying pooling operation to serially fusing pooling elements positioned at different convolution layers in a plurality of layers of a convolution neural network to fuse low-layer detailed information and high-layer semantic information;
s34, dividing the network into a plurality of branches according to the size of the proposal area, reducing the training burden of traffic targets with different scales and dimensions, and improving the detection precision of large objects and small objects;
s35, after combining the characteristics of the ROI area, predicting targets with different scales by adopting three prediction branches, wherein the three prediction branches respectively correspond to a small target, a medium target and a large target;
s36, adding a full-connection layer into the three prediction branches, adding a three-dimensional prediction branch into the multi-level prediction branch, which is responsible for detecting key points of a vehicle, adding a pyramid mechanism to adapt to a multi-scale vehicle target, and simultaneously introducing the multi-level prediction branches of the license plate, the vehicle color and the vehicle axle number of the detection target to obtain different vehicle static characteristics;
s37, fusing all prediction results of the branches to obtain a final detection result.
Further, the loss function of the integrated detection network is as follows:
;
wherein the overall loss functionConsists of four parts, namely->Loss of standard softmax, +.>For the proposal in the batch, propose +.when the proposal is positive>,/>To smooth L1 loss, in +.>In (I)>Regularization constants for category and regression, +.>The regression vector is a two-dimensional target rectangular frame regression vector obtained by integral network prediction and a two-dimensional target frame true regression vector; at->In (I)>Regularized constant for three-dimensional detection branch, +.>Respectively obtaining three-dimensional regression vectors and true three-dimensional regression vectors by integral network prediction; at->In (I)>Regularization parameters for template similarity +.>Respectively obtaining a template vector and a real template vector by integrated network prediction; at->In (I)>Regularization parameters for component detection, +.>And respectively predicting the obtained target component vector and the actual target component vector position for the integrated network.
Further, the modeling of the traffic target feature model in the step S4 is based on the target static attribute acquired in the step S4, a feature model belonging to a corresponding target is constructed, and the acquired target static attribute is stored in a unified information coding format as a binary format.
Further, the multi-target tracking network is constructed based on a deep motion modeling network DMM-Net, and the specific structure is as follows: inputting a time stampTo->Several frames of images in between, a video frame sequence using oneThe feature extractors are convolved, then 6 layers of features are selected from the rest feature extractors and respectively input into a motion information network, a classification network and a visualization network, and the loss functions of the motion information network, the classification network and the visualization network are respectively used->、/>And->The loss function of the entire network is represented by the following formula:
wherein N is the number of positive anchor pipes; alpha and beta are super parameters, and are used for manually adjusting the learning rate.
Further, in the step S6, the modeling of the dynamic information features of the vehicle constructs an information feature model based on the three-dimensional motion trail of the video vehicle and the timestamp information acquired by the multi-target tracking network, and the information features include a timestamp, an image position, a real position, an average speed, a category and a color.
The invention has the beneficial effects that: the invention utilizes a three-dimensional machine vision technology to extract wagon bottom information in the process of overload treatment, and based on a constructed two-dimensional and three-dimensional detection integrated multi-level description network of a traffic target, static attribute information such as the position, the category, the size, the number of vehicle axles, license plates, vehicle colors and the like of the target is obtained, a characteristic model belonging to the vehicle target is constructed, and the obtained static attribute of the target is uniformly coded; the method comprises the steps of constructing an information characteristic model by using three-dimensional motion trail and time stamp information of a vehicle target in a video;
the application innovation of multi-source data fusion is realized, and traffic volume information such as vehicle speed, vehicle type, flow, vehicle head distance, vehicle head time interval, vehicle following percentage and the like is analyzed and obtained on the basis of vehicle basic information acquisition. When the vehicles are queued to pass through the detection area, the car queue is inserted, no license plate or license plate identification error exists, the vehicles are backed up and the like, the acquired positions and the traffic flow states of the vehicles ensure that the license plate, weight, wheel axle, outline, photo, video and other data of the same vehicle are matched and summarized into a driving record, and the problems of multi-source data matching and vehicle queue information uploading are solved;
the whole process monitoring of truck weighing is realized, the information including the speed of the truck at the place of loading, the weight balance, the sliding edge, the abnormal acceleration, the dynamic vehicle separation and the like can be monitored and provided, the single truck weighing data strip is realized, the data and the weighing interference are avoided, and more complete evidence is provided for the super business.
Drawings
FIG. 1 is a schematic diagram of a key frame screening network architecture according to the present invention;
FIG. 2 is a schematic diagram of an integrated detection network architecture according to the present invention;
FIG. 3 is a schematic diagram of a modeling and encoding flow of a traffic target feature model according to the present invention;
FIG. 4 is a schematic diagram of a DMM-Net network architecture according to the present invention.
Detailed Description
The present invention will be further described more fully hereinafter, but the scope of the invention is not limited thereto.
A multi-target space-time tracking method based on a super-robot comprises the following steps:
s1, establishing a traffic scene target feature set;
s2, screening key frames by a key frame screening network;
s3, constructing an integrated detection network, and integrally describing two-dimensional and three-dimensional attributes of traffic targets in multiple layers;
s4, modeling traffic target characteristics;
s5, constructing a multi-target tracking network to track the multi-target motion parameters;
s6, modeling the dynamic information characteristics of the vehicle.
Further, the large-scale target data sets presently disclosed, such as COCO, pascalVoc data sets, BIT vehicle data sets, which include numerous common item features, are not satisfactory for traffic scenarios and the present research problem. Because a large-scale traffic scene needs to be considered in the research problem, the shooting range of a camera is wider, a target on a road can generate severe deformation when the target is driven to the camera and driven from the camera, and meanwhile, in order to meet the diversity of the traffic scene and consider the traffic target condition under a complex traffic environment, a plurality of data sets facing the traffic scene need to be constructed, so that the traffic scene target feature set established in the S1 needs to comprise an actual scene image feature set and a virtual traffic scene feature set, wherein the actual scene image covers traffic target features with various scale changes and shape changes by collecting high-definition images of actual road bridge tunneling monitoring cameras at home and abroad, and considers the problem of insufficient light caused by severe weather conditions such as overcast and rainy days, and the like, the virtual traffic scene feature set generates virtual traffic scene videos by recording the virtual cameras in the virtual traffic scene through unmanned and automatic driving platform CARLA simulation, and the virtual traffic scene feature set can acquire a plurality of scene description information through videos: the position angle of the camera, the internal and external parameter matrix of the camera, the weather in the scene, the crowding degree of the vehicles in the scene and the like; numerous traffic objective description information may also be obtained: the position, speed, type, number and the like of the traffic targets collect 300 more traffic scenes in the simulated traffic scene target feature set, and 4000 more traffic videos comprise 2000 tens of thousands of video frames. The actual scene is complementary with the simulated scene target feature set in advantage, so that the real state of the traffic target in operation is considered, the diversity of the traffic scene is enriched, and complete target and scene feature information is provided for development of subsequent research.
Further, in traffic video sequences, there are often a large number of redundant video frames, and if each video frame is analyzed, the calculation speed is seriously affected. Therefore, for the original video sequence, firstly, a key frame screening network is sent to provide valuable video frames in the original video sequence, and subsequent analysis is performed, so that the processing efficiency is improved, as shown in fig. 1, in the step S2, the key frame screening network builds a large-scale video frame screening image library, and sends images to be screened and previous frame images in the video frame screening image library into a convolution network to obtain a one-dimensional vector, wherein the one-dimensional vector contains features of two images, then, the full-connection layer combines the image features and classifies the image features by softmax, and then, a key frame result is output, and the video frame screening image library covers the two types of images through manual labeling: the system comprises useless frames and key frames, wherein the useless frames comprise a useless frame tag 0 and a key frame tag 1, the useless frames comprise frames with fewer traffic targets, frames without traffic targets and frames with road surface areas with lower occupation in images, and the key frames comprise frames with a large number of traffic targets in images and frames with road surface areas with higher occupation in images.
Further, as shown in fig. 2, the integrated detection network in S3 includes:
s31, inputting a traffic scene video frame, and generating a convolution feature map through a convolution layer;
s32, generating a group of suggestions based on the convolution feature map by using the regional proposal network RPN;
s33, regional proposal network RPN, namely regional generation network prediction, possibly contains a boundary box of an object, and pools by utilizing deconvolution and bilinear kernels, so that small proposal area characteristics are enlarged, the problem of insensitivity of a small target caused by representing a small traffic target by a repeated value is avoided, and pooling operation is applied to serially fusing pooling elements positioned in different convolution layers in a plurality of layers of a convolution neural network with low-layer detailed information and high-layer semantic information;
s34, dividing the network into a plurality of branches according to the size of the proposal area, reducing the training burden of traffic targets with different scales and dimensions, and improving the detection precision of large objects and small objects;
s35, after combining the characteristics of the ROI region, namely the region of interest, for targets with different scales, predicting by adopting three prediction branches, wherein the three prediction branches respectively correspond to a small target, a medium target and a large target;
s36, adding a full-connection layer into the three prediction branches, adding a three-dimensional prediction branch into the multi-level prediction branch, which is responsible for detecting key points of a vehicle, adding a pyramid mechanism to adapt to a multi-scale vehicle target, and simultaneously introducing the multi-level prediction branches of the license plate, the vehicle color and the vehicle axle number of the detection target to obtain different vehicle static characteristics;
s37, fusing all prediction results of the branches to obtain a final detection result.
Further, the loss function of the integrated detection network is as follows:
;
wherein the overall loss functionConsists of four parts, namely->Loss of standard softmax, +.>For the proposal in the batch, propose +.when the proposal is positive>,/>To smooth L1 loss, in +.>In (I)>Regularization constants for category and regression, +.>The regression vector is a two-dimensional target rectangular frame regression vector obtained by integral network prediction and a two-dimensional target frame true regression vector; at->In (I)>Is three in threeRegularization constant of the dimension detection branch, +.>Respectively obtaining three-dimensional regression vectors and true three-dimensional regression vectors by integral network prediction; at->In (I)>Regularization parameters for template similarity +.>Respectively obtaining a template vector and a real template vector by integrated network prediction; at->In (I)>Regularization parameters for component detection, +.>And respectively predicting the obtained target component vector and the actual target component vector position for the integrated network.
Further, the modeling of the traffic target feature model in S4 is based on the target static attribute acquired in S4, a feature model belonging to the corresponding target is constructed, the acquired target static attribute is stored in a unified information coding format as a binary format, and a specific coding format is shown in fig. 3. As can be seen from fig. 3, the traffic target to be modeled is a red vehicle, the image position, the category, the size, the number of axles, the license plate and the color of the traffic target are obtained by using a two-dimensional and three-dimensional detection integrated multi-level description network of the traffic target, the information is converted by using an encoder and stored, the modeling of the traffic target feature model is completed, and when the specific traffic information of the target is needed, the specific information of the target can be recovered by using a decoder. The traffic target feature model can uniquely describe the static attribute of the target, so that the traffic target feature model can be used for uniquely judging the target in the multi-camera linkage traffic scene, and meanwhile, the target information recorded by the traffic target feature model can be used in various environments through encoding and decoding.
Further, as shown in fig. 4, the multi-target tracking network is constructed based on a deep motion modeling network DMM-Net, and the specific structure is as follows: inputting a time stampTo->The video frame sequence is convolved by a feature extractor, then 6 layers of features are selected from the rest feature extractors and respectively input into a motion information network, a classification network and a visualization network, and the loss functions of the motion information network, the classification network and the visualization network are respectively equal to +.>、And->The loss function of the entire network is represented by the following formula:
wherein N is the number of positive anchor pipes; alpha and beta are super parameters, are used for manually adjusting the learning rate, and realize the multi-target tracking task in the video by constructing the multi-target tracking network.
In the step S6, the vehicle dynamic information feature modeling is based on the three-dimensional motion trail and time stamp information of the video vehicle acquired by the multi-target tracking network, an information feature model is constructed, the information features comprise time stamps, image positions, real positions, average speeds, categories and colors, and the dynamic information of the vehicle in the video scene can be comprehensively and detailed expressed through the feature vectors.
The embodiments of the present invention are disclosed as preferred embodiments, but not limited thereto, and those skilled in the art will readily appreciate from the foregoing description that various extensions and modifications can be made without departing from the spirit of the present invention.
Claims (8)
1. A multi-target space-time tracking method based on a super-robot is characterized by comprising the following steps:
s1, establishing a traffic scene target feature set;
s2, screening key frames by a key frame screening network;
s3, constructing an integrated detection network, and integrally describing two-dimensional and three-dimensional attributes of traffic targets in multiple layers;
s4, modeling traffic target characteristics;
s5, constructing a multi-target tracking network to track the multi-target motion parameters;
s6, modeling the dynamic information characteristics of the vehicle.
2. The multi-objective space-time tracking method based on the super-robot treatment system according to claim 1, wherein the traffic scene objective feature set established in the step S1 comprises an actual scene image feature set and a virtual traffic scene feature set, wherein the actual scene image is obtained by collecting high-definition images of actual road-bridge tunnel monitoring cameras at home and abroad, the virtual traffic scene feature set is obtained by simulating a unmanned and automatic driving platform calla, and a virtual camera is placed in a virtual traffic scene, so that virtual traffic scene videos are generated by recording the virtual camera.
3. The multi-objective spatio-temporal tracking method based on super-robot according to claim 1, wherein: the screening key frame network in the S2 is used for constructing a large-scale video frame screening image library, sending images to be screened and the previous frame image into a convolution network to obtain a one-dimensional vector, wherein the one-dimensional vector comprises the characteristics of two images, combining the image characteristics through a full connection layer and classifying by using softmax, and outputting a key frame result, and the video frame screening image library covers the two types of images through manual labeling: the system comprises useless frames and key frames, wherein the useless frames comprise a useless frame tag 0 and a key frame tag 1, the useless frames comprise frames with fewer traffic targets, frames without traffic targets and frames with road surface areas with lower occupation in images, and the key frames comprise frames with a large number of traffic targets in images and frames with road surface areas with higher occupation in images.
4. The multi-objective spatio-temporal tracking method based on super robot as claimed in claim 1, wherein said integrated detection network in S3 comprises:
s31, inputting a traffic scene video frame, and generating a convolution feature map through a convolution layer;
s32, generating a group of suggestions based on the convolution feature map by using the regional proposal network RPN;
s33, predicting a boundary box possibly containing an object by using an area proposal network RPN, pooling by utilizing deconvolution and bilinear kernels, expanding small proposal area characteristics, avoiding the problem of insensitivity of a small target caused by representing the small traffic target by repeated values, and applying pooling operation to serially fusing pooling elements positioned at different convolution layers in a plurality of layers of a convolution neural network to fuse low-layer detailed information and high-layer semantic information;
s34, dividing the network into a plurality of branches according to the size of the proposal area, reducing the training burden of traffic targets with different scales and dimensions, and improving the detection precision of large objects and small objects;
s35, after combining the characteristics of the ROI area, predicting targets with different scales by adopting three prediction branches, wherein the three prediction branches respectively correspond to a small target, a medium target and a large target;
s36, adding a full-connection layer into the three prediction branches, adding a three-dimensional prediction branch into the multi-level prediction branch, which is responsible for detecting key points of a vehicle, adding a pyramid mechanism to adapt to a multi-scale vehicle target, and simultaneously introducing the multi-level prediction branches of the license plate, the vehicle color and the vehicle axle number of the detection target to obtain different vehicle static characteristics;
s37, fusing all prediction results of the branches to obtain a final detection result.
5. The multi-objective space-time tracking method based on super robot as claimed in claim 4, wherein the loss function of the integrated detection network is:
;
wherein the overall loss functionConsists of four parts, namely->Loss of standard softmax, +.>For the proposal in the batch, propose +.when the proposal is positive>,/>To smooth L1 loss, in +.>In (I)>Regularization constants for category and regression, +.>The regression vector is a two-dimensional target rectangular frame regression vector obtained by integral network prediction and a two-dimensional target frame true regression vector;at->In (I)>Regularized constant for three-dimensional detection branch, +.>Respectively obtaining three-dimensional regression vectors and true three-dimensional regression vectors by integral network prediction; at->In (I)>Regularization parameters for template similarity +.>Respectively obtaining a template vector and a real template vector by integrated network prediction; at->In (I)>Regularization parameters for component detection, +.>And respectively predicting the obtained target component vector and the actual target component vector position for the integrated network.
6. The multi-target space-time tracking method based on the super-robot treatment system according to claim 1, wherein the modeling of the traffic target feature model in the step S4 is based on the target static attribute acquired in the step S4, the feature model belonging to the corresponding target is constructed, and the acquired target static attribute is stored in a unified information coding format as a binary format.
7. The multi-target space-time tracking method based on the super-robot as claimed in claim 1, wherein the multi-target tracking network is constructed based on a deep motion modeling network DMM-Net, and the specific structure is as follows:
inputting a time stampTo->The video frame sequence is convolved by a feature extractor, then 6 layers of features are selected from the rest feature extractors and respectively input into a motion information network, a classification network and a visualization network, and the loss functions of the motion information network, the classification network and the visualization network are respectively equal to +.>、/>And->The loss function of the entire network is represented by the following formula:
;
wherein N is the number of positive anchor pipes; alpha and beta are super parameters, and are used for manually adjusting the learning rate.
8. The multi-target space-time tracking method based on the super-robot according to claim 1, wherein the modeling of the dynamic information features of the vehicle in S6 constructs an information feature model based on the three-dimensional motion track of the video vehicle and the timestamp information acquired by the multi-target tracking network, and the information features include a timestamp, an image position, a real position, an average speed, a category and a color.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410125476.1A CN117670938B (en) | 2024-01-30 | Multi-target space-time tracking method based on super-treatment robot |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410125476.1A CN117670938B (en) | 2024-01-30 | Multi-target space-time tracking method based on super-treatment robot |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117670938A true CN117670938A (en) | 2024-03-08 |
CN117670938B CN117670938B (en) | 2024-05-10 |
Family
ID=
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111968123A (en) * | 2020-08-28 | 2020-11-20 | 北京交通大学 | Semi-supervised video target segmentation method |
CN112307921A (en) * | 2020-10-22 | 2021-02-02 | 桂林电子科技大学 | Vehicle-mounted end multi-target identification tracking prediction method |
CN113139620A (en) * | 2021-05-14 | 2021-07-20 | 重庆理工大学 | End-to-end multi-target detection and tracking joint method based on target association learning |
CN114387265A (en) * | 2022-01-19 | 2022-04-22 | 中国民航大学 | Anchor-frame-free detection and tracking unified method based on attention module addition |
CN114550023A (en) * | 2021-12-31 | 2022-05-27 | 武汉中交交通工程有限责任公司 | Traffic target static information extraction device |
CN114627447A (en) * | 2022-03-10 | 2022-06-14 | 山东大学 | Road vehicle tracking method and system based on attention mechanism and multi-target tracking |
KR20220080631A (en) * | 2020-12-07 | 2022-06-14 | 부경대학교 산학협력단 | Apparatus and method for tracking multi-object in real time |
CN115861884A (en) * | 2022-12-06 | 2023-03-28 | 中南大学 | Video multi-target tracking method, system, device and medium in complex scene |
CN116189116A (en) * | 2023-04-24 | 2023-05-30 | 江西方兴科技股份有限公司 | Traffic state sensing method and system |
CN116229112A (en) * | 2022-12-06 | 2023-06-06 | 重庆邮电大学 | Twin network target tracking method based on multiple attentives |
CN116912804A (en) * | 2023-07-31 | 2023-10-20 | 江苏大学 | Efficient anchor-frame-free 3-D target detection and tracking method and model |
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111968123A (en) * | 2020-08-28 | 2020-11-20 | 北京交通大学 | Semi-supervised video target segmentation method |
CN112307921A (en) * | 2020-10-22 | 2021-02-02 | 桂林电子科技大学 | Vehicle-mounted end multi-target identification tracking prediction method |
KR20220080631A (en) * | 2020-12-07 | 2022-06-14 | 부경대학교 산학협력단 | Apparatus and method for tracking multi-object in real time |
CN113139620A (en) * | 2021-05-14 | 2021-07-20 | 重庆理工大学 | End-to-end multi-target detection and tracking joint method based on target association learning |
CN114550023A (en) * | 2021-12-31 | 2022-05-27 | 武汉中交交通工程有限责任公司 | Traffic target static information extraction device |
CN114387265A (en) * | 2022-01-19 | 2022-04-22 | 中国民航大学 | Anchor-frame-free detection and tracking unified method based on attention module addition |
CN114627447A (en) * | 2022-03-10 | 2022-06-14 | 山东大学 | Road vehicle tracking method and system based on attention mechanism and multi-target tracking |
CN115861884A (en) * | 2022-12-06 | 2023-03-28 | 中南大学 | Video multi-target tracking method, system, device and medium in complex scene |
CN116229112A (en) * | 2022-12-06 | 2023-06-06 | 重庆邮电大学 | Twin network target tracking method based on multiple attentives |
CN116189116A (en) * | 2023-04-24 | 2023-05-30 | 江西方兴科技股份有限公司 | Traffic state sensing method and system |
CN116912804A (en) * | 2023-07-31 | 2023-10-20 | 江苏大学 | Efficient anchor-frame-free 3-D target detection and tracking method and model |
Non-Patent Citations (3)
Title |
---|
ZHOU, Y (ZHOU, YAN) ; CHEN, JY (CHEN, JUNYU) ; WANG, DL (WANG, DONGLI) ; ZHU, XL (ZHU, XIAOLIN): ""Multi-object tracking using context-sensitive enhancement via feature fusion"", 《MULTIMEDIA TOOLS AND APPLICATIONS》, vol. 83, no. 7, 10 August 2023 (2023-08-10), pages 19465 - 19484 * |
刘文强等: ""深度在线多目标跟踪算法综述"", 《计算机科学与探索》, vol. 16, no. 12, 31 December 2022 (2022-12-31), pages 2718 - 2733 * |
许小伟;陈乾坤;钱枫;李浩东;唐志鹏;: "基于小型化YOLOv3的实时车辆检测及跟踪算法", 《公路交通科技》, no. 08, 15 August 2020 (2020-08-15) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108694386B (en) | Lane line detection method based on parallel convolution neural network | |
Geiger et al. | Vision meets robotics: The kitti dataset | |
CN111429484B (en) | Multi-target vehicle track real-time construction method based on traffic monitoring video | |
Yang et al. | Video scene understanding using multi-scale analysis | |
CN111598030A (en) | Method and system for detecting and segmenting vehicle in aerial image | |
CN114170580A (en) | Highway-oriented abnormal event detection method | |
Sellat et al. | Intelligent semantic segmentation for self-driving vehicles using deep learning | |
Masihullah et al. | Attention based coupled framework for road and pothole segmentation | |
CN109543520B (en) | Lane line parameterization method for semantic segmentation result | |
Švorc et al. | An infrared video detection and categorization system based on machine learning | |
CN117670938B (en) | Multi-target space-time tracking method based on super-treatment robot | |
CN117670938A (en) | Multi-target space-time tracking method based on super-treatment robot | |
CN111160282A (en) | Traffic light detection method based on binary Yolov3 network | |
Yi et al. | End-to-end neural network for autonomous steering using lidar point cloud data | |
CN114550023A (en) | Traffic target static information extraction device | |
Khosravian et al. | Multi‐domain autonomous driving dataset: Towards enhancing the generalization of the convolutional neural networks in new environments | |
CN116310970A (en) | Automatic driving scene classification algorithm based on deep learning | |
Kheder et al. | Transfer Learning Based Traffic Light Detection and Recognition Using CNN Inception-V3 Model | |
CN113298781B (en) | Mars surface three-dimensional terrain detection method based on image and point cloud fusion | |
US20230084761A1 (en) | Automated identification of training data candidates for perception systems | |
Krajewski et al. | VeGAN: Using GANs for augmentation in latent space to improve the semantic segmentation of vehicles in images from an aerial perspective | |
Zou et al. | HFT: Lifting Perspective Representations via Hybrid Feature Transformation for BEV Perception | |
CN114511740A (en) | Vehicle image classification method, vehicle track restoration method, device and equipment | |
CN114255450A (en) | Near-field vehicle jamming behavior prediction method based on forward panoramic image | |
Hadzic et al. | Rasternet: Modeling free-flow speed using lidar and overhead imagery |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |