CN114677555B - Iterative optimization type end-to-end intelligent vehicle sensing method and device and electronic equipment - Google Patents

Iterative optimization type end-to-end intelligent vehicle sensing method and device and electronic equipment Download PDF

Info

Publication number
CN114677555B
CN114677555B CN202210200266.5A CN202210200266A CN114677555B CN 114677555 B CN114677555 B CN 114677555B CN 202210200266 A CN202210200266 A CN 202210200266A CN 114677555 B CN114677555 B CN 114677555B
Authority
CN
China
Prior art keywords
tracking
network
result
target
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210200266.5A
Other languages
Chinese (zh)
Other versions
CN114677555A (en
Inventor
郑四发
吴浩然
张创
许庆
王建强
李克强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210200266.5A priority Critical patent/CN114677555B/en
Publication of CN114677555A publication Critical patent/CN114677555A/en
Application granted granted Critical
Publication of CN114677555B publication Critical patent/CN114677555B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Traffic Control Systems (AREA)

Abstract

The application relates to the technical field of vehicles, in particular to an iterative optimization type end-to-end intelligent vehicle sensing method, an iterative optimization type end-to-end intelligent vehicle sensing device and electronic equipment, wherein the method comprises the following steps: acquiring perception information of an intelligent vehicle; inputting perception information into an end-to-end network after iterative optimization, executing a detection task, a tracking task and a prediction task, and simultaneously obtaining a detection result, a tracking result and a prediction result, wherein the end-to-end network comprises a multi-target detection network, a multi-target tracking network and a multi-target track prediction network; and obtaining a perception result of the end-to-end intelligent vehicle based on the detection result, the tracking result and the prediction result. Therefore, the problems of interdependence of three sensing tasks, independent algorithm, lower cooperativity, target loss caused by shielding and the like in tracking, detecting and predicting are solved, and the tracking rate of the shielded object, the robustness of a tracking result and the instantaneity and precision of a sensing scheme are improved through end-to-end detection tracking, three sensing module integration and implementation of an iterative optimization scheme.

Description

Iterative optimization type end-to-end intelligent vehicle sensing method and device and electronic equipment
Technical Field
The application relates to the technical field of vehicles, in particular to an iterative optimization type end-to-end intelligent vehicle sensing method, an iterative optimization type end-to-end intelligent vehicle sensing device and electronic equipment.
Background
The existing intelligent vehicle perception scheme divides perception into three steps of detection, tracking and prediction, and is mainly realized in sequence through a deep learning algorithm. The method of detection-followed-by-tracking (Tracking by Detection) relies on a given accurate recognition model to detect the object, and then connects the results detected at different times through a separate network to complete the tracking task. The detection-followed-by-tracking method effectively utilizes the capability of a target detector based on deep learning, and is a currently dominant detection-tracking paradigm. However, the method takes detection-tracking as two tasks, two networks are required to be constructed for learning, the calculated amount is large, and the real-time performance is relatively poor. Joint Detection AND TRACKING method, such as CENTERTRACK, places the detector on two consecutive frames and on a thermodynamic diagram of the previous trajectory in dots. The detector outputs an offset vector from the center of the current object to its center in the previous frame, which is computationally inexpensive and based on which the offset is sufficient to achieve target correlation, target tracking is accomplished. Such a method greatly reduces the amount of computation in the detection-tracking step, but has poor tracking effect on the occluded object because the offset is computed based on two consecutive frames. Therefore, how to solve the tracking error caused by shielding under the condition of meeting the real-time requirement is a problem to be solved in the current detection-tracking algorithm.
For a prediction algorithm, the main stream method mainly performs feature extraction on factors such as a target historical track, an environment image, road topology information and the like based on a feature extraction network, simulates a feature interaction relation between a target and the environment through a graph neural network, and outputs a target future track prediction result considering interaction. Among them, the target history track information is the most important factor, and once the target history track is missing and shifted (especially, the starting point of the history track), the input of the neural network changes, and the prediction effect may be different from the day to day. Thus, the prediction algorithm is very strongly dependent on the detection-tracking result. Meanwhile, the current detection-tracking algorithm only provides detection results for the prediction algorithm, but cannot be synchronously optimized with the prediction algorithm, and the robustness of the prediction algorithm to error data is also not strong.
Therefore, how to improve the robustness of the prediction algorithm to the error detection-tracking data and make it possible to perform iterative optimization with the detection-tracking algorithm is a problem that needs to be solved at present under the condition that the detection-tracking result cannot be ensured to be completely correct.
Disclosure of Invention
The application provides an iterative optimization type end-to-end intelligent vehicle sensing method, an iterative optimization type end-to-end intelligent vehicle sensing device and electronic equipment, and aims to solve the problems of interdependence of three sensing tasks of tracking, detecting and predicting, independent algorithm, low cooperativity, target loss caused by shielding and the like.
An embodiment of a first aspect of the present application provides an iterative optimization type end-to-end intelligent vehicle sensing method, including the following steps:
acquiring perception information of an intelligent vehicle;
Inputting the perception information into an end-to-end network after iterative optimization, executing a detection task, a tracking task and a prediction task, and simultaneously obtaining a detection result, a tracking result and a prediction result, wherein the end-to-end network comprises a multi-target detection network, a multi-target tracking network and a multi-target track prediction network; and
And acquiring a perception result of the end-to-end intelligent vehicle based on the detection result, the tracking result and the prediction result.
According to one embodiment of the application, before inputting the perception information into the iteratively optimized end-to-end network, the method further comprises:
and performing end-to-end training on the end-to-end network, wherein in the training process, a preset iterative optimization type training mode is adopted to perform forward propagation of the network to respectively obtain initial results of detection, tracking and prediction, iterative optimization is performed, the prediction network is used for solving an intersection ratio matrix of a tracking part to obtain a new tracking result, the detection result and the tracking result are used for the prediction network to obtain the new prediction result, the new tracking result is obtained by tracking, and iteration is repeated until a network convergence condition is met to obtain the end-to-end network after iterative optimization.
According to one embodiment of the present application, the detection formula of the multi-target detection network is:
Fi=CNN(IMGi,IMGi-1,Oi-1)
Ci=FC(Fi)
Wherein F i is a feature vector, IMG i is image information at the current time, IMG i-1 is image information at the last time, O i-1 is detection network output information at the last time, C i is a low-dimensional encoding vector, To detect important features related to the object, O i is the decoding vector at the current time.
According to one embodiment of the application, the tracking formula of the multi-target tracking network is:
If The target tracking is successful, and the kth target tracking result is that
IfThe kth target tracking result is fromBecomes as follows
IfTarget tracking fails, but the target k is kept;
If Observation result In order to be a new goal of the present invention,
Wherein,For the decoding vector of the kth i object of the ith frame,The predicted output of the ith frame for the kth i-1 th target of the ith-1 th frame,And (3) calculating the intersection ratio of the kth prediction of the ith frame and the jth detection, wherein i is the ith frame in the detection, and sigma and beta are preset thresholds.
According to one embodiment of the present application, the prediction formula of the multi-target trajectory prediction network is:
Or (b)
Wherein,For a predictive vector based on historical observation information, k is the target,As the predictive vector of the history information,For the interactive results of all targets of the current frame,Predictive vectors for all targets of the current frame.
According to the iterative optimization type end-to-end intelligent vehicle sensing method, sensing information of an intelligent vehicle is obtained and is input into an end-to-end network after iterative optimization, a detection task, a tracking task and a prediction task are executed, and a sensing result of the end-to-end intelligent vehicle is obtained based on the obtained detection result, tracking result and prediction result. The end-to-end network comprises a multi-target detection network, a multi-target tracking network and a multi-target track prediction network. Therefore, the problems of interdependence of three sensing tasks, independent algorithm, lower cooperativity, target loss caused by shielding and the like in tracking, detecting and predicting are solved, and the tracking rate of the shielded object, the robustness of a tracking result and the instantaneity and precision of a sensing scheme are improved through end-to-end detection tracking, three sensing module integration and implementation of an iterative optimization scheme.
An embodiment of a second aspect of the present application provides an iterative optimized end-to-end intelligent vehicle sensing device, including:
The acquisition module is used for acquiring the perception information of the intelligent vehicle;
The execution module is used for inputting the perception information into the end-to-end network after iterative optimization, executing a detection task, a tracking task and a prediction task, and simultaneously obtaining a detection result, a tracking result and a prediction result, wherein the end-to-end network comprises a multi-target detection network, a multi-target tracking network and a multi-target track prediction network; and
And the perception module is used for acquiring the perception result of the end-to-end intelligent vehicle based on the detection result, the tracking result and the prediction result.
According to one embodiment of the present application, the sensing module is specifically configured to:
and performing end-to-end training on the end-to-end network, wherein in the training process, a preset iterative optimization type training mode is adopted to perform forward propagation of the network to respectively obtain initial results of detection, tracking and prediction, iterative optimization is performed, the prediction network is used for solving an intersection ratio matrix of a tracking part to obtain a new tracking result, the detection result and the tracking result are used for the prediction network to obtain the new prediction result, the new tracking result is obtained by tracking, and iteration is repeated until a network convergence condition is met to obtain the end-to-end network after iterative optimization.
According to one embodiment of the present application, the detection formula of the multi-target detection network is:
Fi=CNN(IMGi,IMGi-1,Oi-1)
Ci=FC(Fi)
Wherein F i is a feature vector, IMG i is image information at the current time, IMG i-1 is image information at the last time, O i-1 is detection network output information at the last time, C i is a low-dimensional encoding vector, To detect important features related to the object, O i is the decoding vector at the current time.
According to one embodiment of the application, the tracking formula of the multi-target tracking network is:
If The target tracking is successful, and the kth target tracking result is that
IfThe kth target tracking result is fromBecomes as follows
IfTarget tracking fails, but the target k is kept;
If Observation result In order to be a new goal of the present invention,
Wherein,For the decoding vector of the kth i object of the ith frame,The predicted output of the ith frame for the kth i-1 th target of the ith-1 th frame,And (3) calculating the intersection ratio of the kth prediction of the ith frame and the jth detection, wherein i is the ith frame in the detection, and sigma and beta are preset thresholds.
According to one embodiment of the present application, the prediction formula of the multi-target trajectory prediction network is:
Or (b)
Wherein,For a predictive vector based on historical observation information, k is the target,As the predictive vector of the history information,For the interactive results of all targets of the current frame,Predictive vectors for all targets of the current frame.
According to the iterative optimization type end-to-end intelligent vehicle sensing device, sensing information of an intelligent vehicle is obtained and input into an end-to-end network after iterative optimization, a detection task, a tracking task and a prediction task are executed, and a sensing result of the end-to-end intelligent vehicle is obtained based on the obtained detection result, tracking result and prediction result. The end-to-end network comprises a multi-target detection network, a multi-target tracking network and a multi-target track prediction network. Therefore, the problems of interdependence of three sensing tasks, independent algorithm, lower cooperativity, target loss caused by shielding and the like in tracking, detecting and predicting are solved, and the tracking rate of the shielded object, the robustness of a tracking result and the instantaneity and precision of a sensing scheme are improved through end-to-end detection tracking, three sensing module integration and implementation of an iterative optimization scheme.
An embodiment of a third aspect of the present application provides an electronic device, including: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the program to realize the iterative optimized end-to-end intelligent vehicle sensing method according to the embodiment.
An embodiment of a fourth aspect of the present application provides a computer readable storage medium having stored thereon a computer program for execution by a processor for implementing an iterative optimized end-to-end intelligent vehicle sensing method as described in the above embodiments.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of one embodiment of the present application;
FIG. 2 is a schematic diagram of an iterative optimized end-to-end intelligent vehicle sensing scheme provided in accordance with one embodiment of the present application;
FIG. 3 is a schematic diagram of an end-to-end network detection portion according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an end-to-end network tracking portion provided in accordance with one embodiment of the present application;
FIG. 5 is a schematic diagram of a portion of an end-to-end network prediction according to one embodiment of the present application;
FIG. 6 is a block diagram illustration of an iterative optimized end-to-end intelligent vehicle awareness apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.
The iterative optimization type end-to-end intelligent vehicle sensing method, device and electronic equipment of the embodiment of the application are described below with reference to the accompanying drawings. Aiming at the problems of interdependence of three sensing tasks, low robustness and instantaneity of data and poor tracking effect of the shielded object, which are mentioned in the background center, the application provides an iterative optimization type end-to-end intelligent vehicle sensing method, in the method,
And acquiring perception information of the intelligent vehicle, inputting the perception information into the end-to-end network after iterative optimization, executing a detection task, a tracking task and a prediction task, and acquiring the perception result of the end-to-end intelligent vehicle based on the acquired detection result, tracking result and prediction result. The end-to-end network comprises a multi-target detection network, a multi-target tracking network and a multi-target track prediction network. Therefore, the problems of interdependence of three sensing tasks, independent algorithm, lower cooperativity, target loss caused by shielding and the like in tracking, detecting and predicting are solved, and the tracking rate of the shielded object, the robustness of a tracking result and the instantaneity and precision of a sensing scheme are improved through end-to-end detection tracking, three sensing module integration and implementation of an iterative optimization scheme.
Specifically, fig. 1 is a schematic flow chart of an iterative optimization type end-to-end intelligent vehicle sensing method according to an embodiment of the present application.
As shown in fig. 1, the iterative optimization type end-to-end intelligent vehicle sensing method comprises the following steps:
In step S101, sensing information of the intelligent vehicle is acquired.
Specifically, the iterative optimization type end-to-end intelligent vehicle sensing method provided by the embodiment of the application mainly comprises two parts, namely an end-to-end detection-tracking-prediction network and an iterative optimization training method, as shown in fig. 2. The method can be divided into three parts of a multi-target detection network, a multi-target tracking network and a multi-target track prediction network according to output results required to be provided by an end-to-end network. The detailed description will be given by way of the following specific examples.
In step S102, the perception information is input to the end-to-end network after iterative optimization, and a detection task, a tracking task and a prediction task are executed, and a detection result, a tracking result and a prediction result are obtained at the same time, where the end-to-end network includes a multi-target detection network, a multi-target tracking network and a multi-target track prediction network.
Further, in some embodiments, before the perceptual information is input to the iteratively optimized end-to-end network, further comprising: and performing end-to-end training on the end-to-end network, wherein in the training process, a preset iterative optimization type training mode is adopted to perform forward propagation of the network to respectively obtain initial results of detection, tracking and prediction, iterative optimization is performed, the prediction network is used for solving an intersection ratio matrix of a tracking part to obtain a new tracking result, the detection result and the tracking result are used for the prediction network to obtain the new prediction result, the new tracking result is obtained by tracking, and iteration is repeated until a network convergence condition is met to obtain the end-to-end network after iterative optimization.
Further, in some embodiments, the detection formula of the multi-target detection network is:
Fi=CNN(IMGi,IMGi-1,Oi-1)
Ci=FC(Fi)
Wherein F i is a feature vector, IMG i is image information at the current time, IMG i-1 is image information at the last time, O i-1 is detection network output information at the last time, C i is a low-dimensional encoding vector, To detect important features related to the object, O i is the decoding vector at the current time.
Further, in some embodiments, the tracking formula for the multi-target tracking network is:
If The target tracking is successful, and the kth target tracking result is that
IfThe kth target tracking result is fromBecomes as follows
IfTarget tracking fails, but the target k is kept;
If Observation result In order to be a new goal of the present invention,
Wherein,For the decoding vector of the kth i object of the ith frame,The predicted output of the ith frame for the kth i-1 th target of the ith-1 th frame,And (3) calculating the intersection ratio of the kth prediction of the ith frame and the jth detection, wherein i is the ith frame in the detection, and sigma and beta are preset thresholds.
Further, in some embodiments, the prediction formula for the multi-target trajectory prediction network is:
Or (b)
Wherein,For a predictive vector based on historical observation information, k is the target,As the predictive vector of the history information,For the interactive results of all targets of the current frame,Predictive vectors for all targets of the current frame.
In step S103, a sensing result of the end-to-end intelligent vehicle is obtained based on the detection result, the tracking result, and the prediction result.
Specifically, first, a multi-objective monitoring network part in an end-to-end network according to an embodiment of the present application will be described. Fig. 3 is a schematic diagram of an end-to-end network monitoring portion according to an embodiment of the present application, as shown in fig. 3. In the multi-target monitoring network, firstly, when multi-target detection is carried out on an ith frame of image, the network input is image information IMG i at the current moment, image information IMG i-1 at the last moment and detection network output information O i-1 at the last moment, which are acquired through a plurality of cameras, in order to simultaneously consider the follow-up tracking requirement; secondly, extracting features of the image information by using CNN (Convolutional Neural Network ) to obtain a feature vector F i, and then reducing the dimension of the feature vector F i by using FC (Full Connected layer, fully connected layer) to convert the feature vector into a coding vector C i with lower dimension; finally, the important features related to the detection target in the code vector C i are known by using the attention mechanismAnd fusing with the code vector, and then using the full connection layer or other decoding network (such as Transformer, etc.) to fuse the code vectorDecoding is carried out to obtain a decoding vector O i at the current moment. The dimensions of the decoding vector O i are defined according to the multi-target detection result required by the practical application, and may include a target center position, a target position offset, a target bounding box coordinate, a target bounding box size, an environment detection thermodynamic diagram, and the like. Thus, the specific formula of the detection section is as follows:
Fi=CNN(IMGi,IMGi-1,Oi-1); (1)
Ci=FC(Fi); (2)
Next, a multi-target tracking network part in the end-to-end network according to the embodiment of the present application will be described. Fig. 4 is a schematic diagram of an end-to-end network tracking portion according to an embodiment of the present application, as shown in fig. 4. In a multi-target tracking network, the input of the tracking network at the ith frame is the output of the current frame detection network And predicting the current frame image position of the target by the previous frame prediction networkDefining the ith frame target as K i, tracking network needs to output all the predicted targets of the previous frameDetection result of current framePerforming IOU (Intersection over Union, cross-over ratio) test to obtain cross-over ratio matrixSecondly, regarding the target of the (i-1), if any detection result of the i frame and the IOU of the predicted result of the target are greater than a certain threshold sigma, the target is considered to be successfully tracked, and the displacement vector of the target of the current frame is output as a tracking result (the highest IOU of a plurality of detection results and the predicted result is obtained when the IOU of the plurality of detection results and the predicted result is greater than the threshold); if the predicted result of a certain target of the (i-1) frame is not consistent with any detection result of the current frame, keeping the target in the beta frame after considering the influence caused by shielding, and predicting based on the existing observation result until the predicted result is consistent with the detection result of the future frame, namely returning to the steps again; finally, if the target lost frame number exceeds the set threshold value beta, the target is removed. Observations that do not match any object are defined as new objects. In practical experiments, σ may be set to 0.6 and β to 10. The specific formula of the tracking part is as follows:
Again, the multi-objective trajectory prediction network portion of the end-to-end network of the embodiments of the present application will be described. Fig. 5 is a schematic diagram of an end-to-end network prediction part according to an embodiment of the present application. In the end-to-end network prediction, the track prediction problem needs to observe the image positions of the past continuous m frames of the target, and output a prediction result of the image positions of the future n frames of the target. Therefore, when track prediction is performed on the target k in the ith frame, the three-dimensional convolutional neural network, or the convolutional neural network plus RNN (Recurrent Neural Network, cyclic neural network), is used to predict the image position information of the previous consecutive m frames Feature extraction is carried out to obtain a prediction vector based on historical observation informationPredictive vectors for all targets (K total) based on current frameUsing interactive prediction networks such as GNN (Graph neural network, graphic neural network) and the like, the interactive results of all targets of the current frame can be solvedPredictive vector based on historical observation information for target k using interaction resultsUpdating to obtain a prediction vector considering interaction and history informationFinally, decoding the track prediction result by using a cyclic neural network and other modes, and obtaining the prediction resultAnd outputting. The specific formula of the prediction part is as follows:
Where concat indicates that the two vectors are to be spliced.
Finally, the iterative optimization training method of the embodiment of the application is introduced. As indicated by the arrow in fig. 2, an iterative optimization training mode is adopted during the training process. The output interfaces of the definition detection, tracking and prediction modules are respectively (a, b and c). In the single training process, firstly, forward transmission of the network, namely a- & gt b- & gt c is carried out to obtain initial results a 0,b0 and c 0 of detection, tracking and prediction; and secondly, performing iterative optimization. The prediction network is firstly used for solving the cross-correlation matrix M of the tracking part to obtain a new tracking result b 1, then the detection result a 0 and the tracking result b 1 are used for the prediction network to obtain a new prediction result c 1, and the new prediction result can be used for the tracking module to obtain a new tracking result b 2. The iterative process is repeated until the network convergence condition is satisfied, namely:
|cn-cn-1|<1; (15)
|bn-bn-1|<2; (16)
And at this time, ending the iterative optimization process, and outputting a 0,bn and c n of the final iteration as the output result of the detection, tracking and prediction network at the current moment.
It should be noted that, in the internal iterative process of the method described above, because of the difficulty of internal iterative training, the iterative optimization training method is also suitable for external iteration, i.e. after the end-to-end detection-tracking-prediction network training is completed under the non-iterative condition, the same test data is input again for iterative training. The iterative training process is exactly the same as the above description and will not be repeated here. It should be noted that since the network has undergone non-iterative preliminary training, the iterative training at this time has less influence on network parameters, and the network may converge on a local optimum point, but since the training is easier, the expected training period is shorter, and therefore, in practical applications, a trade-off may be performed according to the required accuracy.
Thus, the present example specifically addresses the problems existing in the related art as follows:
(1) Aiming at the problems that the existing detection-tracking-prediction algorithms are independent and have weak cooperativity, the embodiment of the application provides an end-to-end detection tracking framework, three kinds of perception tasks are completed in the same network, and corresponding results are output simultaneously. The perception scheme can greatly improve the instantaneity and the accuracy of the perception tasks, and can realize end-to-end optimization, so that three kinds of interdependent perception tasks are closely related.
(2) Aiming at the problem that the existing detection and tracking scheme is difficult to track the blocked object, the embodiment of the application creatively uses the prediction network for the tracking part, when the object is blocked and cannot be detected, the similarity matching is carried out between the prediction result and the tracking result, if the similarity of the prediction result and the tracking result is higher, the detection position of the blocked object is considered to be the weighted result of the prediction position and the tracking position, and the reliability of the tracking algorithm is improved.
(3) Aiming at the problem that the detection, tracking and prediction are interdependent, the embodiment of the application provides an iterative optimization type sensing scheme, a prediction network result is used for a tracking part, the tracking rate of a shielded object is improved, the tracking result is input into the prediction network, the robustness of the prediction network to the tracking result is improved, and the steps are continuously and circularly carried out until the detection, tracking and prediction network finally converges to an acceptable local optimal point, namely the iterative updating is completed.
According to the iterative optimization type end-to-end intelligent vehicle sensing method, sensing information of an intelligent vehicle is obtained and is input into an end-to-end network after iterative optimization, a detection task, a tracking task and a prediction task are executed, and a sensing result of the end-to-end intelligent vehicle is obtained based on the obtained detection result, tracking result and prediction result. The end-to-end network comprises a multi-target detection network, a multi-target tracking network and a multi-target track prediction network. Therefore, the problems of interdependence of three sensing tasks, independent algorithm, lower cooperativity, target loss caused by shielding and the like in tracking, detecting and predicting are solved, and the tracking rate of the shielded object, the robustness of a tracking result and the instantaneity and precision of a sensing scheme are improved through end-to-end detection tracking, three sensing module integration and implementation of an iterative optimization scheme.
Next, an iterative optimized end-to-end intelligent vehicle sensing device according to an embodiment of the present application is described with reference to the accompanying drawings.
FIG. 6 is a block schematic diagram of an iterative optimized end-to-end intelligent vehicle awareness apparatus according to an embodiment of the present application.
As shown in fig. 6, the iterative optimization type end-to-end intelligent vehicle sensing device 10 includes: the device comprises an acquisition module 100, an execution module 200 and a perception module 300.
The acquisition module 100 is configured to acquire perception information of an intelligent vehicle;
the execution module 200 is configured to input perception information to an end-to-end network after iterative optimization, execute a detection task, a tracking task and a prediction task, and obtain a detection result, a tracking result and a prediction result at the same time, where the end-to-end network includes a multi-target detection network, a multi-target tracking network and a multi-target track prediction network; and
The sensing module 300 is configured to obtain a sensing result of the end-to-end intelligent vehicle based on the detection result, the tracking result, and the prediction result.
Further, in some embodiments, the sensing module 300 is specifically configured to:
And performing end-to-end training on the end-to-end network, wherein in the training process, a preset iterative optimization type training mode is adopted to perform forward propagation of the network to respectively obtain initial results of detection, tracking and prediction, iterative optimization is performed, the prediction network is used for solving an intersection ratio matrix of a tracking part to obtain a new tracking result, the detection result and the tracking result are used for the prediction network to obtain the new prediction result, the new tracking result is obtained by tracking, and iteration is repeated until a network convergence condition is met to obtain the end-to-end network after iterative optimization.
Further, in some embodiments, the detection formula of the multi-target detection network is:
Fi=CNN(IMGi,IMGi-1,Oi-1)
Ci=FC(Fi)
Wherein F i is a feature vector, IMG i is image information at the current time, IMG i-1 is image information at the last time, O i-1 is detection network output information at the last time, C i is a low-dimensional encoding vector, To detect important features related to the object, O i is the decoding vector at the current time.
Further, in some embodiments, the tracking formula for the multi-target tracking network is:
If The target tracking is successful, and the kth target tracking result is that
IfThe kth target tracking result is fromBecomes as follows
IfTarget tracking fails, but the target k is kept;
If Observation result In order to be a new goal of the present invention,
Wherein,For the decoding vector of the kth i object of the ith frame,The predicted output of the ith frame for the kth i-1 th target of the ith-1 th frame,And (3) calculating the intersection ratio of the kth prediction of the ith frame and the jth detection, wherein i is the ith frame in the detection, and sigma and beta are preset thresholds.
Further, in some embodiments, the prediction formula for the multi-target trajectory prediction network is:
Or (b)
Wherein,For a predictive vector based on historical observation information, k is the target,As the predictive vector of the history information,For the interactive results of all targets of the current frame,Predictive vectors for all targets of the current frame.
According to the iterative optimization type end-to-end intelligent vehicle sensing device, sensing information of an intelligent vehicle is obtained and input into an end-to-end network after iterative optimization, a detection task, a tracking task and a prediction task are executed, and a sensing result of the end-to-end intelligent vehicle is obtained based on the obtained detection result, tracking result and prediction result. The end-to-end network comprises a multi-target detection network, a multi-target tracking network and a multi-target track prediction network. Therefore, the problems of interdependence of three sensing tasks, independent algorithm, lower cooperativity, target loss caused by shielding and the like in tracking, detecting and predicting are solved, and the tracking rate of the shielded object, the robustness of a tracking result and the instantaneity and precision of a sensing scheme are improved through end-to-end detection tracking, three sensing module integration and implementation of an iterative optimization scheme.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
Memory 701, processor 702, and computer programs stored on memory 701 and executable on processor 702.
The processor 702, when executing the program, implements the iterative optimized end-to-end intelligent vehicle awareness method provided in the above embodiments.
Further, the electronic device further includes:
A communication interface 703 for communication between the memory 701 and the processor 702.
Memory 701 for storing a computer program executable on processor 702.
The memory 701 may include a high-speed RAM memory or may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.
If the memory 701, the processor 702, and the communication interface 703 are implemented independently, the communication interface 703, the memory 701, and the processor 702 may be connected to each other through a bus and perform communication with each other. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (PERIPHERAL COMPONENT, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 7, but not only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 701, the processor 702, and the communication interface 703 are integrated on a chip, the memory 701, the processor 702, and the communication interface 703 may communicate with each other through internal interfaces.
The processor 702 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the application.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an iterative optimized end-to-end intelligent vehicle awareness method as described above.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, for example, two, three, etc., unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer cartridge (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or part of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented as software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (6)

1. An iterative optimization type end-to-end intelligent vehicle sensing method is characterized by comprising the following steps of:
acquiring perception information of an intelligent vehicle;
Inputting the perception information into an end-to-end network after iterative optimization, executing a detection task, a tracking task and a prediction task, and simultaneously obtaining a detection result, a tracking result and a prediction result, wherein the end-to-end network comprises a multi-target detection network, a multi-target tracking network and a multi-target track prediction network; and
Acquiring a perception result of the end-to-end intelligent vehicle based on the detection result, the tracking result and the prediction result;
the detection formula of the multi-target detection network is as follows:
Fi=CNN(IMGi,IMGi-1,Oi-1)
Ci=FC(Fi)
Wherein F i is a feature vector, IMG i is image information at the current time, IMG i-1 is image information at the last time, O i-1 is detection network output information at the last time, C i is a low-dimensional encoding vector, For detecting important features related to the target, O i is a decoding vector at the current time;
The tracking formula of the multi-target tracking network is as follows:
If The target tracking is successful, and the kth target tracking result is that
IfThe kth target tracking result is fromBecomes as follows
IfTarget tracking fails, but the target k is kept;
If Observation result In order to be a new goal of the present invention,
Wherein,For the decoding vector of the kth i object of the ith frame,The predicted output of the ith frame for the kth i-1 th target of the ith-1 th frame,I represents the i frame in detection, and sigma and beta are preset thresholds;
the prediction formula of the multi-target track prediction network is as follows:
Or (b)
Wherein,For a predictive vector based on historical observation information, k is the target,As the predictive vector of the history information,For the interactive results of all targets of the current frame,Predictive vectors for all targets of the current frame.
2. The method of claim 1, further comprising, prior to inputting the awareness information into the iteratively optimized end-to-end network:
and performing end-to-end training on the end-to-end network, wherein in the training process, a preset iterative optimization type training mode is adopted to perform forward propagation of the network to respectively obtain initial results of detection, tracking and prediction, iterative optimization is performed, the prediction network is used for solving an intersection ratio matrix of a tracking part to obtain a new tracking result, the detection result and the tracking result are used for the prediction network to obtain the new prediction result, the new tracking result is obtained by tracking, and iteration is repeated until a network convergence condition is met to obtain the end-to-end network after iterative optimization.
3. An iterative optimization type end-to-end intelligent vehicle sensing device, which is characterized by comprising:
The acquisition module is used for acquiring the perception information of the intelligent vehicle;
The execution module is used for inputting the perception information into the end-to-end network after iterative optimization, executing a detection task, a tracking task and a prediction task, and simultaneously obtaining a detection result, a tracking result and a prediction result, wherein the end-to-end network comprises a multi-target detection network, a multi-target tracking network and a multi-target track prediction network; and
The sensing module is used for acquiring a sensing result of the end-to-end intelligent vehicle based on the detection result, the tracking result and the prediction result
The detection formula of the multi-target detection network is as follows:
Fi=CNN(IMGi,IMGi-1,Oi-1)
Ci=FC(Fi)
Wherein F i is a feature vector, IMG i is image information at the current time, IMG i-1 is image information at the last time, O i-1 is detection network output information at the last time, C i is a low-dimensional encoding vector, For detecting important features related to the target, O i is a decoding vector at the current time;
The tracking formula of the multi-target tracking network is as follows:
If The target tracking is successful, and the kth target tracking result is that
IfThe kth target tracking result is fromBecomes as follows
IfTarget tracking fails, but the target k is kept;
If Observation result In order to be a new goal of the present invention,
Wherein,For the decoding vector of the kth i object of the ith frame,The predicted output of the ith frame for the kth i-1 th target of the ith-1 th frame,Calculating the calculation result of the intersection ratio of the kth prediction of the ith frame and the jth detection, wherein i is the ith frame in the detection, and sigma and beta are preset thresholds;
the prediction formula of the multi-target track prediction network is as follows:
Or (b)
Wherein,For a predictive vector based on historical observation information, k is the target,As the predictive vector of the history information,For the interactive results of all targets of the current frame,Predictive vectors for all targets of the current frame.
4. A device according to claim 3, wherein the sensing module is specifically configured to:
and performing end-to-end training on the end-to-end network, wherein in the training process, a preset iterative optimization type training mode is adopted to perform forward propagation of the network to respectively obtain initial results of detection, tracking and prediction, iterative optimization is performed, the prediction network is used for solving an intersection ratio matrix of a tracking part to obtain a new tracking result, the detection result and the tracking result are used for the prediction network to obtain the new prediction result, the new tracking result is obtained by tracking, and iteration is repeated until a network convergence condition is met to obtain the end-to-end network after iterative optimization.
5. An electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the iterative optimized end-to-end intelligent vehicle awareness method of claim 1 or 2.
6. A computer readable storage medium having stored thereon a computer program, the program being executable by a processor for implementing the iterative optimized end-to-end intelligent vehicle awareness method according to claim 1 or 2.
CN202210200266.5A 2022-03-02 2022-03-02 Iterative optimization type end-to-end intelligent vehicle sensing method and device and electronic equipment Active CN114677555B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210200266.5A CN114677555B (en) 2022-03-02 2022-03-02 Iterative optimization type end-to-end intelligent vehicle sensing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210200266.5A CN114677555B (en) 2022-03-02 2022-03-02 Iterative optimization type end-to-end intelligent vehicle sensing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN114677555A CN114677555A (en) 2022-06-28
CN114677555B true CN114677555B (en) 2024-06-28

Family

ID=82072711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210200266.5A Active CN114677555B (en) 2022-03-02 2022-03-02 Iterative optimization type end-to-end intelligent vehicle sensing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114677555B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110660083A (en) * 2019-09-27 2020-01-07 国网江苏省电力工程咨询有限公司 Multi-target tracking method combined with video scene feature perception

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11017550B2 (en) * 2017-11-15 2021-05-25 Uatc, Llc End-to-end tracking of objects
CN112884742B (en) * 2021-02-22 2023-08-11 山西讯龙科技有限公司 Multi-target real-time detection, identification and tracking method based on multi-algorithm fusion
CN113129336A (en) * 2021-03-31 2021-07-16 同济大学 End-to-end multi-vehicle tracking method, system and computer readable medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110660083A (en) * 2019-09-27 2020-01-07 国网江苏省电力工程咨询有限公司 Multi-target tracking method combined with video scene feature perception

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"一种多特征融合的车辆追踪算法的研究与实现";刘慧敏 等;《小型微型计算机系统》;20200630;全文 *

Also Published As

Publication number Publication date
CN114677555A (en) 2022-06-28

Similar Documents

Publication Publication Date Title
US10946864B2 (en) Apparatus and method for fault diagnosis and back-up of advanced driver assistance system sensors based on deep learning
Islam et al. Revisiting salient object detection: Simultaneous detection, ranking, and subitizing of multiple salient objects
CN111582021B (en) Text detection method and device in scene image and computer equipment
CN111709975B (en) Multi-target tracking method, device, electronic equipment and storage medium
Wickstrøm et al. Uncertainty modeling and interpretability in convolutional neural networks for polyp segmentation
CN110751096B (en) Multi-target tracking method based on KCF track confidence
CN113313763A (en) Monocular camera pose optimization method and device based on neural network
CN115063454B (en) Multi-target tracking matching method, device, terminal and storage medium
CN113205138B (en) Face and human body matching method, equipment and storage medium
CN115345905A (en) Target object tracking method, device, terminal and storage medium
CN113811894B (en) Monitoring of a KI module for driving functions of a vehicle
CN113743228A (en) Obstacle existence detection method and device based on multi-data fusion result
KR101207225B1 (en) Method for detecting and tracking pointlike targets, in an optronic surveillance system
CN115249266A (en) Method, system, device and storage medium for predicting position of waypoint
CN116740145A (en) Multi-target tracking method, device, vehicle and storage medium
CN114677555B (en) Iterative optimization type end-to-end intelligent vehicle sensing method and device and electronic equipment
CN111383245B (en) Video detection method, video detection device and electronic equipment
CN114067371B (en) Cross-modal pedestrian trajectory generation type prediction framework, method and device
CN113192110A (en) Multi-target tracking method, device, equipment and storage medium
CN115690732A (en) Multi-target pedestrian tracking method based on fine-grained feature extraction
WO2019228654A1 (en) Method for training a prediction system and system for sequence prediction
CN111144383A (en) Method for detecting vehicle deflection angle
CN111612813A (en) Face tracking method and device
US20240153278A1 (en) Apparatus for predicting a driving path of a vehicle and a method therefor
CN113902776B (en) Target pedestrian trajectory prediction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant