CN114943750A

CN114943750A - Target tracking method and device and electronic equipment

Info

Publication number: CN114943750A
Application number: CN202210631663.8A
Authority: CN
Inventors: 刘媛媛; 陈博; 尹荣彬; 张伟伟
Original assignee: FAW Group Corp
Current assignee: FAW Group Corp
Priority date: 2022-06-06
Filing date: 2022-06-06
Publication date: 2022-08-26

Abstract

The invention discloses a target tracking method, a target tracking device and electronic equipment. Wherein, the method comprises the following steps: acquiring a current frame image of a vehicle running environment and a previous frame image of the current frame image; predicting a prediction sensing frame in the current frame image by adopting at least one sensing frame identified in the previous frame image; respectively matching at least one sensing frame identified in the current frame image with a prediction sensing frame at a corresponding position for the first time; acquiring a target perception frame successfully matched in a current frame image; calling a neural network model to extract the characteristics of the perception frame from the successfully matched target perception frame; performing secondary matching on the target perception frame by adopting a Hungary algorithm based on the characteristics of the perception frame of the target perception frame obtained by extraction; and in case of successful matching, determining the position of the tracked target object. The invention at least solves the problem of inaccurate target tracking caused by the occlusion between targets and the rapid change of the target motion.

Description

Target tracking method and device and electronic equipment

Technical Field

The invention relates to the field of automatic driving, in particular to a target tracking method and device and electronic equipment.

Background

Target tracking is an important technology in the field of automatic driving, and since various obstacles often exist in road traffic, an automatic driving automobile needs to identify and track the obstacles so as to facilitate subsequent path planning. The target tracking technology can realize the track tracking of the target, thereby providing support for the subsequent path planning of the automatic driving automobile.

However, the existing target tracking method often depends on a large neural network model, and the neural network models usually involve a large amount of complicated mathematical calculations, so that the target tracking speed is reduced, the requirement of the automatic driving automobile for real-time target tracking cannot be met, and the large-scale mass production of the automatic driving automobile is not facilitated. In addition, in practical application, occlusion may exist among a plurality of targets, and the motion of some targets changes rapidly, so that the target tracking accuracy is poor.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a target tracking method, a target tracking device and electronic equipment, which at least solve the technical problem of inaccurate target tracking caused by shielding among targets and rapid target motion change.

According to an aspect of an embodiment of the present invention, there is provided a target tracking method, including: acquiring a current frame image of a vehicle running environment and a previous frame image of the current frame image in the vehicle running process; predicting a prediction sensing frame in a current frame image by adopting at least one sensing frame identified in a previous frame image, wherein the sensing frame is used for identifying a target object to be tracked in the previous frame image; respectively matching at least one sensing frame identified in the current frame image with a prediction sensing frame at a corresponding position for the first time; acquiring a target perception frame successfully matched in a current frame image; calling a neural network model to extract the characteristics of the sensing frame from the successfully matched target sensing frame; performing secondary matching on the target perception box by adopting a Hungary algorithm based on the characteristics of the perception box of the target perception box obtained by extraction; and in case of successful matching, determining the position of the tracked target object.

Further, the target tracking method further includes: identifying a target object to be tracked from a previous frame of image, and identifying at least one perception frame matched with the target object in the previous frame of image; and predicting the position of the target object in the predicted sensing frame in the current frame image based on the coordinates of at least one sensing frame of the target object in the previous frame image.

Further, the target tracking method further comprises: acquiring a coordinate value of a target object under a camera coordinate system from a previous frame of image, wherein the camera coordinate system takes a camera for shooting the target object as a coordinate system under a positioning reference; converting the coordinate value of the target object under the camera coordinate system into an object coordinate system, wherein the object coordinate system is a coordinate system under the target object as a positioning reference; predicting the position of a target object in an object coordinate system of a current frame image based on the relative speed of the target object and the time difference before and after displacement occurs in two adjacent frame images; converting the predicted target object position into a camera coordinate system; calculating to obtain the position of the target object under the image coordinate system of the current frame image according to the coordinate change of the camera coordinate system in the two adjacent frame images; and acquiring the position of the target object in the prediction perception frame of the current frame image based on the position of the target object in the image coordinate system of the current frame image.

Further, the target tracking method further includes: performing cross comparison calculation on at least one sensing frame identified in the current frame image and the prediction sensing frame of the corresponding type to obtain the cross comparison IOU of each target object in the current frame image; determining a target object with the largest intersection ratio IOU from the current frame image to obtain a tracking object; if the intersection ratio of the tracked object meets the lowest limit compared with the IOU, checking whether coordinate jumping exists between the perception frame of the tracked object and the corresponding prediction perception frame; and if the coordinate jump does not exist, the perception frame of the tracked object is a target perception frame which is successfully matched in the current frame image.

Further, the target tracking method further includes: performing multi-task scheduling on the successfully matched target perception frame in the current frame image; carrying out external expansion on the range of the target sensing frame to obtain an externally expanded target sensing frame; and calling a neural network model to classify the target perception frame after the external expansion, determining the type of the target perception frame, and extracting the perception frame characteristics of the target perception frame successfully matched in the current frame image.

Further, the target tracking method further includes: performing cosine similarity calculation on at least one sensing frame feature extracted from the current frame image and the corresponding historical feature of the target object; determining a target perception frame with the maximum similarity from the calculation result; if the intersection ratio of the target sensing frame with the maximum similarity meets the lowest threshold value than the IOU, acquiring a weight matrix of the target sensing frame with the maximum similarity; and performing secondary matching on the target perception box with the maximum similarity by adopting a Hungary algorithm based on the weight matrix of the target perception box with the maximum similarity.

Further, the target tracking method further includes: under the condition of successful matching, checking whether coordinate jumping exists between the target sensing frame with the maximum similarity and the corresponding history sensing frame; and if the coordinate jump does not exist, determining that the target object corresponding to the target perception frame with the maximum similarity is the tracked target object, and acquiring the position of the tracked target object.

Further, the target tracking method further comprises: and acquiring the target sensing frame which fails to be matched, and giving new identification information to the target sensing frame which fails to be matched.

According to another aspect of the embodiments of the present invention, there is also provided a target tracking apparatus, including: the acquisition module is used for acquiring a current frame image of the vehicle running environment and a previous frame image of the current frame image; the prediction module is used for predicting a prediction sensing frame in a current frame image by adopting at least one sensing frame identified in a previous frame image, wherein the sensing frame is used for identifying a target object to be tracked in the previous frame image; the first matching module is used for respectively matching at least one sensing frame identified in the current frame image with the prediction sensing frame at the corresponding position for the first time; the acquisition module is used for acquiring a target perception frame which is successfully matched in the current frame image; the calling module is used for calling the neural network model to extract the characteristics of the perception frame from the successfully matched target perception frame; the second matching module is used for performing secondary matching on the target perception box by adopting a Hungary algorithm based on the perception box characteristics of the target perception box obtained by extraction; and the determining module is used for determining the position of the tracked target object under the condition of successful matching.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is arranged to perform the above-mentioned target tracking method when run.

In the embodiment of the invention, the method of matching twice according to the previous frame image and the current frame image is adopted, in the running process of the vehicle, a current frame image of the running environment of the vehicle and a previous frame image of the current frame image are collected, and at least one sensing frame identified in the previous frame image is adopted to predict a prediction sensing frame in the current frame image, then, at least one sensing frame identified in the current frame image is respectively matched with the prediction sensing frame at the corresponding position for the first time, and a target sensing frame which is successfully matched in the current frame image is obtained, and calling a neural network model to extract the sensing frame characteristics from the successfully matched target sensing frame, performing secondary matching on the target sensing frame by adopting the Hungarian algorithm based on the sensing frame characteristics of the target sensing frame obtained by extraction, and determining the position of the tracked target object under the condition of successful matching. The sensing frame is used for identifying a target object to be tracked in the previous frame of image.

According to the above, after the successfully matched target perception frame in the current frame image is obtained, the perception frame features are extracted from the successfully matched target perception frame through the neural network model, so that compared with the prior art, feature extraction is not required to be performed on all perception frames in the application, namely compared with the prior art, the number of the target perception frames to be processed by the neural network model in the application is less, the overall calculated amount is reduced, the target tracking speed is increased, and the requirement of mass production of the automatic driving automobile is met. In addition, after the at least one perception frame identified in the current frame image is respectively matched with the prediction perception frame at the corresponding position for the first time, the target perception frame is subjected to secondary matching by adopting a Hungarian algorithm based on the perception frame characteristics of the extracted target perception frame, namely, multiple characteristics are used for target matching in the method, the problem of inaccurate target tracking caused by shielding among multiple target objects and rapid motion change of some target objects can be effectively avoided through the two-time matching process, and the target tracking precision is improved.

Therefore, through the technical scheme of the application, the purpose of improving the target tracking speed and precision is achieved, the chip calculation power of the automatic driving automobile is saved, the effect of meeting the requirement of mass production of the automatic driving automobile is achieved, and the technical problem that target tracking is inaccurate due to the fact that shielding between targets and target motion change are rapid is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:

FIG. 1 is a flow diagram of an alternative target tracking method according to an embodiment of the invention;

FIG. 2 is a flow diagram of an alternative target tracking method according to an embodiment of the invention;

FIG. 3 is a schematic block diagram of an alternative object tracking device in accordance with an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided a method embodiment of a target tracking method, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

In addition, it should be further noted that the electronic device may be an execution subject of the target tracking method according to the embodiment of the present invention. Wherein the electronic device is mountable on an autonomous vehicle.

Fig. 1 is a flowchart of an alternative target tracking method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step S101, in the vehicle driving process, a current frame image of the vehicle driving environment and a previous frame image of the current frame image are collected.

In step S101, an image capture device is connected to the electronic device mounted on the autonomous vehicle. The image acquisition equipment can be equipment such as a vehicle event data recorder and a vehicle-mounted camera, and the image acquisition equipment can continuously take pictures of the vehicle running environment in the running process of the vehicle. In addition, the image acquisition equipment can also synchronously transmit the shot images to the electronic equipment, so that the electronic equipment performs framing processing on the images to obtain the current frame image and the previous frame image of the current frame image.

And S102, predicting the prediction sensing frame in the current frame image by adopting at least one sensing frame identified in the previous frame image.

In step S102, the sensing frame is used to identify a target object to be tracked in the previous frame of image. The target object may be a vehicle traveling on a road. After obtaining the current frame image and the previous frame image, the electronic device may predict a prediction sensing frame in the current frame image by using the sensing frame of the previous frame image. Specifically, the electronic device may first identify a vehicle to be tracked from a previous frame of image, identify at least one sensing frame matched with the vehicle to be tracked in the previous frame of image, then calculate coordinates of the at least one sensing frame in the previous frame of image of the vehicle to be tracked, and predict a position of a predicted sensing frame of the vehicle to be tracked in a current frame of image according to the coordinates.

Step S103, at least one sensing frame identified in the current frame image is respectively matched with the prediction sensing frame at the corresponding position for the first time.

In step S103, the electronic device may identify at least one sensing frame in the current frame image, and since the electronic device also predicts a predicted sensing frame for the current frame image from the previous frame image, the electronic device may perform a first matching of the predicted sensing frame with all sensing frames in the current frame image by using the intersection ratio IOU of each target object in the current frame image. And the intersection-to-union ratio IOU is the ratio of the intersection and the union of one sensing frame and the prediction sensing frame at the corresponding position of the sensing frame.

And step S104, acquiring a target perception frame successfully matched in the current frame image.

In step S104, through the first matching, the electronic device may determine, according to the matching result, a successfully matched target sensing frame from the at least one sensing frame corresponding to the current frame image. Specifically, the electronic device may perform cross-over ratio calculation on at least one sensing frame identified in the current frame image and the prediction sensing frame of the corresponding type, so as to obtain a cross-over ratio IOU of each target object in the current frame image, then determine, from the current frame image, that the target object with the cross-over ratio being the largest as the tracking object, if the cross-over ratio of the tracking object to the IOU satisfies the minimum limit, continue to check whether coordinate hopping exists between the sensing frame of the tracking object and the corresponding prediction sensing frame, and if the coordinate hopping does not exist, determine that the sensing frame of the tracking object is the target sensing frame successfully matched in the current frame image.

And step S105, calling a neural network model to extract the sensing frame characteristics from the successfully matched target sensing frame.

In step S105, the neural network model called by the electronic device is a fast neural network model. After the first matching is completed, the electronic device can perform multi-task scheduling on the result of the first matching, call the fast neural network model to extract the sensing frame features according to the target sensing frame matched on the result of the first matching, and classify the target sensing frame.

It should be noted that, because the present application extracts the sensing frame features from the successfully matched target sensing frames through the neural network model, compared with the prior art, the present application does not need to extract features of all sensing frames, and the neural network model in the present application only needs to process a small number of target sensing frames, thereby reducing the overall calculation amount, and increasing the tracking speed of targets (i.e. target objects), thereby realizing the requirement of mass production of the automatically driven vehicles.

And S106, performing secondary matching on the target perception box by adopting a Hungarian algorithm based on the perception box characteristics of the extracted target perception box.

In step S106, the electronic device may calculate a new weight by using the sensing box feature and the IOU extracted by the fast neural network model to obtain a weight matrix of the target sensing box, thereby performing secondary matching on the target sensing box by using the hungarian matching algorithm based on the weight matrix and generating a final matching result.

It should be noted that, after the first matching is performed, the target perception frame is subjected to secondary matching by using the hungarian algorithm based on the perception frame features of the extracted target perception frame, so that various features are actually used for target matching in the present application, and the problem of inaccurate target tracking due to blocking among a plurality of target objects and rapid motion change of some target objects can be effectively avoided by performing the secondary matching, thereby improving the accuracy of target tracking.

Step S107, in the case of successful matching, determines the position of the tracked target object.

In step S107, if the matching is successful, the electronic device checks whether coordinate jumping exists between the target sensing frame of the target object and the corresponding history sensing frame, and if coordinate jumping does not exist, determines that the target object is the tracked target object, and obtains the position of the tracked target object. In addition, the electronic equipment also acquires the target sensing frame which fails to be matched, and gives new identification information to the target sensing frame.

Based on the contents of the above steps S101 to S107, in the embodiment of the present invention, a manner of performing matching twice according to the previous frame image and the current frame image is adopted, during the vehicle driving process, the current frame image of the vehicle driving environment and the previous frame image of the current frame image are collected, at least one sensing frame identified in the previous frame image is adopted to predict a predicted sensing frame in the current frame image, then the at least one sensing frame identified in the current frame image is respectively matched with the predicted sensing frame at the corresponding position for the first time, and a target sensing frame successfully matched in the current frame image is obtained, so as to invoke the neural network model to extract sensing frame features from the target sensing frame successfully matched, and based on the sensing frame features of the extracted target sensing frame, the target sensing frame is subjected to secondary matching by using the hungarian algorithm, and further, under the condition that the matching is successful, the location of the tracked target object is determined. The sensing frame is used for identifying a target object to be tracked in a previous frame of image.

According to the above, after the successfully matched target perception frame in the current frame image is obtained, the perception frame features are extracted from the successfully matched target perception frame through the neural network model, so that compared with the prior art, feature extraction is not required to be performed on all perception frames in the application, namely compared with the prior art, the number of the target perception frames to be processed by the neural network model in the application is less, the overall calculated amount is reduced, the target tracking speed is increased, and the requirement of mass production of the automatic driving automobile is met. In addition, after the at least one perception frame identified in the current frame image is respectively matched with the prediction perception frame at the corresponding position for the first time, the target perception frame is subjected to secondary matching by adopting a Hungarian algorithm based on the perception frame characteristics of the extracted target perception frame, namely, multiple characteristics are used for target matching in the method, the problem of inaccurate target tracking caused by blocking among targets and rapid change of motion of some targets can be effectively avoided through the two matching processes, and the target tracking precision is improved.

In an alternative embodiment, the electronic device may identify a target object to be tracked from a previous frame of image, and identify at least one sensing frame matching the target object in the previous frame of image, so as to predict a position of the target object in a predicted sensing frame in a current frame of image based on coordinates of the at least one sensing frame of the target object in the previous frame of image.

Optionally, after obtaining the previous frame image and the current frame image, the electronic device may identify a target object to be tracked from the previous frame image, for example, the target object may be a moving object such as a vehicle or a pedestrian driving on a road. Meanwhile, the electronic device further identifies at least one sensing frame matching the target object in the previous frame of image, for example, there are multiple target objects in the previous frame of image, and the electronic device identifies a sensing frame corresponding to each target in the previous frame of image, where the sensing frame corresponding to each target may be one or multiple.

In addition, the electronic device may further acquire a coordinate value of the target object in the camera coordinate system from the previous frame image, convert the coordinate value of the target object in the camera coordinate system into the object coordinate system, predict a target object position of the target object in the object coordinate system of the current frame image based on a relative speed of the target object and a time difference before and after the target object is displaced in the two adjacent frame images, convert the predicted target object position into the camera coordinate system, calculate a position of the target object in the image coordinate system of the current frame image according to a coordinate change in the two adjacent frame images of the camera coordinate system, and acquire a position of the predicted sensing frame of the target object in the current frame image based on the position of the target object in the image coordinate system of the current frame image. The camera coordinate system is a coordinate system under the positioning reference of a camera for shooting a target object, and the object coordinate system is a coordinate system under the positioning reference of the target object.

Alternatively, taking the target object as a vehicle as an example, the electronic device may calculate the coordinate values of the vehicle in the camera coordinate system according to the pinhole imaging principle by using the height of the vehicle in the previous frame of image, and then convert the coordinate values from the camera coordinate system to the vehicle coordinate system (i.e., the object coordinate system with the vehicle as a positioning reference). Then, the electronic device predicts the vehicle position (i.e., the target object position) of the vehicle in the vehicle coordinate system of the current frame image by combining the relative speed of the vehicle and the time difference before and after the vehicle is displaced in the two adjacent frame images (i.e., the previous frame image and the current frame image) in the vehicle coordinate system, and converts the predicted vehicle position into the camera coordinate system.

Further, after converting the predicted vehicle position into the camera coordinate system, the electronic device may calculate the position of the vehicle in the image coordinate system of the current frame image according to the coordinate change of the camera coordinate system in two adjacent frame images. Specifically, the electronic device may calculate a change ratio of an X coordinate (abscissa), and then calculate a width and a height of the vehicle in an image coordinate system of the current frame image according to the change ratio, where the width and the height may be used to represent a position of the vehicle. Finally, the electronic device can acquire the position of the vehicle in the prediction perception frame in the current frame image according to the pinhole imaging principle based on the position of the vehicle in the image coordinate system of the current frame image.

In the process, the position of the target object in the image coordinate system of the current frame image is obtained by calculating the coordinate change of the camera coordinate system in the two adjacent frame images, and the position of the predicted sensing frame of the target object in the current frame image is obtained based on the position of the target object in the image coordinate system of the current frame image, so that the effect of obtaining the predicted sensing frame of the current frame image according to the sensing frame prediction of the previous frame image is realized.

In an optional embodiment, the electronic device may perform intersection ratio calculation on at least one sensing frame identified in the current frame image and the prediction sensing frame of the corresponding type, obtain an intersection ratio IOU of each target object in the current frame image, and then determine a target object with a largest intersection ratio IOU from the current frame image, to obtain the tracked object. Meanwhile, if the intersection ratio of the tracked object meets the lowest limit of the IOU, checking whether coordinate jumping exists between the perception frame of the tracked object and the corresponding prediction perception frame; and if the coordinate jump does not exist, the perception frame of the tracked object is a target perception frame which is successfully matched in the current frame image.

Optionally, still taking the target object as a vehicle as an example, when at least one vehicle exists in the current frame image, the electronic device may calculate an interaction ratio IOU for at least one sensing frame corresponding to each vehicle and the prediction sensing frame of the corresponding type, and select, according to the calculation result, the sensing frame with the largest IOU for each vehicle, thereby determining, from the current frame image, the vehicle with the largest interaction ratio IOU as the tracking object.

Further, the electronic device further judges whether the interaction ratio IOU of the tracked object meets the minimum limit, if yes, checks whether coordinate jumping exists between the perception frame of the tracked object and the corresponding prediction perception frame, if no coordinate jumping exists, the perception frame of the tracked object is a target perception frame successfully matched in the current frame image, and meanwhile, the electronic device stores the target perception frame successfully matched.

It should be noted that, by calculating the intersection-to-parallel ratio IOU, the electronic device may perform first matching on the prediction sensing frame corresponding to each object and the sensing frame in the current frame image, so as to determine a target sensing frame that is successfully matched, and by performing the first matching, a portion of sensing frames that do not conform to the matching features may be removed, thereby achieving an effect of reducing the calculation amount of the subsequent neural network model.

In an optional embodiment, the electronic device performs multi-task scheduling on the matching result of the first matching to invoke the neural network model to extract the sensing frame features from the target sensing frames successfully matched. Specifically, the electronic device may perform multitask scheduling on the successfully matched target sensing frame in the current frame image, perform outward expansion on the range of the target sensing frame to obtain an outward expanded target sensing frame, so as to call the neural network model to classify the outward expanded target sensing frame, determine the type of the target sensing frame, and extract and obtain the sensing frame characteristics of the successfully matched target sensing frame in the current frame image.

Optionally, the electronic device may select a target sensing frame successfully matched in the current frame image to perform multi-task scheduling, and extend the range of the target sensing frame (i.e., the width and height of the target sensing frame) by a certain coefficient to obtain the extended target sensing frame. Wherein certain coefficients can be set by the operator in a customized manner.

Optionally, the electronic device may further call the fast neural network model to classify the expanded target sensing frame, so as to determine the type of the target sensing frame, and extract the sensing frame feature of the target sensing frame for the subsequent second matching.

It should be noted that, the target sensing frame is subjected to multi-task scheduling according to the result of the first matching, the features of the sensing frame are extracted by using the fast neural network, and the features of all the sensing frames are not required to be extracted, so that the calculation workload of the neural network model is reduced, and the target tracking speed is increased.

In an optional embodiment, the electronic device may perform secondary matching on the target perception box by using a hungarian algorithm based on the extracted perception box features of the target perception box. Specifically, the electronic device calculates cosine similarity between at least one sensing frame feature extracted from the current frame image and the corresponding historical feature of the target object, determines the target sensing frame with the maximum similarity from the calculation result, acquires a weight matrix of the target sensing frame with the maximum similarity if the intersection ratio of the target sensing frame with the maximum similarity to the IOU meets the lowest threshold value, and performs secondary matching on the target sensing frame with the maximum similarity by adopting a Hungarian algorithm based on the weight matrix of the target sensing frame with the maximum similarity.

Optionally, the electronic device may check each sensing frame extracted from the current frame image, check whether the sensing frame has sensing frame features, if so, perform cosine similarity calculation on at least one sensing frame feature extracted from the current frame image and the corresponding historical features of the target object, and determine the target sensing frame with the maximum similarity from the at least one sensing frame according to the calculation result.

Further, after determining the target sensing frame with the maximum similarity,the electronic equipment also judges whether the intersection ratio of the target sensing frame meets the IOU or not, and if so, the IOU value w is calculated _IOU And cosine similarity w _f And combining according to a certain weight coefficient to calculate a weight matrix of the target perception box, and performing secondary matching on the target perception box by the electronic equipment through a Hungary algorithm based on the weight matrix. The formula for calculating the weight matrix of the target perception frame is as follows:

w _new ＝α*w _f +β*w _IOU

wherein w in the above formula _new Is a weight matrix, alpha and beta are cosine similarity w respectively _f And IOU value w _IOU The coefficient of (a).

It should be noted that, in the above process, two times of matching based on different strategies are combined with a fast neural network model, so that the fast neural network can be used for extracting features to perform target matching, the calculation amount is reduced, the calculation force of an automatic driving chip is saved, and the target tracking speed and precision are improved. In addition, the two-time matching can also use various characteristics to carry out target matching, thereby further improving the target tracking precision

In an optional embodiment, in the case of successful matching, the electronic device checks whether coordinate jumping exists between the target sensing frame with the maximum similarity and the corresponding history sensing frame, and if coordinate jumping does not exist, determines that the target object corresponding to the target sensing frame with the maximum similarity is the tracked target object, and obtains the position of the tracked target object.

Optionally, after the result of the second matching is obtained, if the target sensing frame with the largest similarity is successfully matched, the electronic device checks whether coordinate hopping exists between the target sensing frame and the corresponding history sensing frame, if the coordinate hopping does not occur, it is determined that the target object corresponding to the target sensing frame is the tracked target object, and meanwhile, the electronic device acquires the position of the tracked target object, thereby achieving target tracking.

Further, in the second matching process, there may be a plurality of target sensing frames with the same similarity, and if the IOU values of some target sensing frames do not satisfy the minimum threshold or there is a coordinate jump, these target sensing frames will fail to match. On the basis, the electronic equipment acquires the target sensing frames with failed matching and endows the target sensing frames with new identification information.

In an optional embodiment, fig. 2 shows a flowchart of an optional target tracking method according to an embodiment of the present invention, and as shown in fig. 2, the electronic device first predicts a sensing frame in a current frame image by using a sensing frame of a previous frame image, i.e., obtains a predicted sensing frame, and then performs first matching on the sensing frame of the predicted sensing frame and the current frame image (which may be understood as an actual sensing frame of the current frame image) according to an intersection ratio IOU of each target object in the current frame image, and stores a matching result. Furthermore, the electronic equipment calls a fast neural network model for the successfully matched target perception frame in the first matching process, so that the target perception frame is classified and the features of the target perception frame are extracted. In addition, the electronic equipment calculates a weight matrix of the target perception box with the maximum similarity based on the perception box characteristics extracted by the rapid neural network model and the corresponding IOU value, so that secondary matching is performed on the target perception box with the maximum similarity by adopting a Hungary algorithm according to the weight matrix, a final matching result is generated, and target tracking is achieved.

Therefore, through the technical scheme, the purpose of improving the target tracking speed and precision is achieved, the chip computing power of the automatic driving automobile is saved, the effect of meeting the requirement of the automatic driving automobile in mass production is achieved, and the technical problem that target tracking is inaccurate due to the fact that shielding between targets and target motion change are rapid is solved.

Example 2

According to another aspect of the embodiments of the present invention, there is also provided an embodiment of a target tracking apparatus, where fig. 3 is a schematic block diagram of an alternative target tracking apparatus according to the embodiment of the present invention, as shown in fig. 3, the apparatus includes: an acquisition module 301, a prediction module 302, a first matching module 303, an acquisition module 304, a calling module 305, a second matching module 306, and a determination module 307.

The acquisition module 301 is configured to acquire a current frame image of a vehicle driving environment and a previous frame image of the current frame image; a prediction module 302, configured to predict a prediction sensing frame in a current frame image by using at least one sensing frame identified in a previous frame image, where the sensing frame is used to identify a target object to be tracked in the previous frame image; a first matching module 303, configured to perform first matching on at least one sensing frame identified in the current frame image and the prediction sensing frame at the corresponding position respectively; an obtaining module 304, configured to obtain a successfully matched target sensing frame in the current frame image; the calling module 305 is configured to call the neural network model to extract the sensing frame features from the successfully matched target sensing frames; the second matching module 306 is used for performing secondary matching on the target perception box by adopting a Hungarian algorithm based on the perception box characteristics of the extracted target perception box; a determining module 307, configured to determine a location of the tracked target object if the matching is successful.

It should be noted that the acquiring module 301, the predicting module 302, the first matching module 303, the obtaining module 304, the invoking module 305, the second matching module 306, and the determining module 307 correspond to steps S101 to S107 in the above embodiment, and 7 modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1.

Optionally, the prediction module further includes: an identification module and a first prediction module. The system comprises a recognition module, a tracking module and a tracking module, wherein the recognition module is used for recognizing a target object to be tracked from a previous frame of image and identifying at least one perception frame matched with the target object in the previous frame of image; and the first prediction module is used for predicting the position of the target object in the prediction perception frame in the current frame image based on the coordinates of at least one perception frame of the target object in the previous frame image.

Optionally, the first prediction module further includes: the device comprises a first acquisition module, a first conversion module, a second prediction module, a second conversion module, a calculation module and a second acquisition module. The first acquisition module is used for acquiring coordinate values of the target object under a camera coordinate system from the previous frame of image, wherein the camera coordinate system is a coordinate system under a positioning reference of a camera for shooting the target object; the first conversion module is used for converting coordinate values of the target object under a camera coordinate system into an object coordinate system, wherein the object coordinate system is a coordinate system under the positioning reference of the target object; the second prediction module is used for predicting the target object position of the target object in the object coordinate system of the current frame image based on the relative speed of the target object and the time difference before and after displacement in the two adjacent frame images; the second conversion module is used for converting the predicted target object position into a camera coordinate system; the calculation module is used for calculating the position of the target object under the image coordinate system of the current frame image according to the coordinate change of the camera coordinate system in the two adjacent frame images; and the second acquisition module is used for acquiring the position of the target object in the prediction perception frame of the current frame image based on the position of the target object in the image coordinate system of the current frame image.

Optionally, the first matching module further includes: the device comprises a first calculation module, a first determination module, a verification module and a second determination module. The first calculation module is used for performing cross comparison calculation on at least one sensing frame identified in the current frame image and the prediction sensing frame of the corresponding type to acquire a cross comparison IOU (input output unit) of each target object in the current frame image; the first determining module is used for determining a target object with the largest intersection ratio than the IOU from the current frame image to obtain a tracking object; the detection module is used for verifying whether coordinate jumping exists between the perception frame of the tracked object and the corresponding prediction perception frame or not if the intersection ratio of the tracked object and the IOU meets the lowest limit; and the second determining module is used for determining the perception frame of the tracked object as a target perception frame which is successfully matched in the current frame image if the coordinate jump does not exist.

Optionally, the calling module further includes: the device comprises a scheduling module, an external expansion module and a first calling module. The scheduling module is used for performing multi-task scheduling on the successfully matched target perception frame in the current frame image; the external expansion module is used for externally expanding the range of the target sensing frame to obtain an externally expanded target sensing frame; and the first calling module is used for calling the neural network model to classify the externally expanded target perception frame, determining the type of the target perception frame and extracting the perception frame characteristics of the target perception frame which is successfully matched in the current frame image.

Optionally, the second matching module further includes: the device comprises a similarity meter module, a third determining module, a third acquiring module and a secondary matching module. The similarity calculation module is used for performing cosine similarity calculation on at least one sensing frame feature extracted from the current frame image and the corresponding historical feature of the target object; the third determining module is used for determining the target perception frame with the maximum similarity from the calculation result; the third obtaining module is used for obtaining a weight matrix of the target sensing frame with the maximum similarity if the intersection ratio of the target sensing frame with the maximum similarity meets the lowest threshold value than the IOU; and the secondary matching module is used for performing secondary matching on the target perception box with the maximum similarity by adopting a Hungarian algorithm based on the weight matrix of the target perception box with the maximum similarity.

Optionally, the determining module further includes: a first verification module and a fourth determination module. The first checking module is used for checking whether coordinate jumping exists between the target sensing frame with the maximum similarity and the corresponding history sensing frame or not under the condition that matching is successful; and the fourth determining module is used for determining the target object corresponding to the target perception frame with the maximum similarity as the tracked target object and acquiring the position of the tracked target object if the coordinate jump does not exist.

Optionally, the target tracking apparatus further includes: and the fourth acquisition module is used for acquiring the target sensing frame which fails to be matched and endowing the target sensing frame which fails to be matched with new identification information.

Example 3

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is configured to perform the object tracking method of embodiment 1 described above when running.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit may be a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims

1. A target tracking method, comprising:

acquiring a current frame image of a vehicle running environment and a previous frame image of the current frame image in the vehicle running process;

predicting a prediction sensing frame in the current frame image by adopting at least one sensing frame identified in the previous frame image, wherein the sensing frame is used for identifying a target object to be tracked in the previous frame image;

respectively matching at least one sensing frame identified in the current frame image with a prediction sensing frame at a corresponding position for the first time;

acquiring a target sensing frame successfully matched in the current frame image;

calling a neural network model to extract sensing frame characteristics from the successfully matched target sensing frame;

performing secondary matching on the target perception frame by adopting a Hungarian algorithm based on the extracted perception frame characteristics of the target perception frame;

and in case of successful matching, determining the tracked position of the target object.

2. The method of claim 1, wherein predicting the prediction sensing frame in the current frame image by using the at least one sensing frame identified in the previous frame image comprises:

identifying the target object to be tracked from the previous frame of image, and identifying at least one sensing frame matched with the target object in the previous frame of image;

and predicting the position of the target object in the predicted sensing frame in the current frame image based on the coordinates of at least one sensing frame of the target object in the previous frame image.

3. The method according to claim 2, wherein predicting the position of the target object in the current frame image based on the coordinates of at least one sensing frame of the target object in the previous frame image comprises:

acquiring coordinate values of the target object under a camera coordinate system from the previous frame of image, wherein the camera coordinate system is a coordinate system under a positioning reference of a camera for shooting the target object;

converting coordinate values of the target object under the camera coordinate system into an object coordinate system, wherein the object coordinate system is a coordinate system under the target object as a positioning reference;

predicting the target object position of the target object in the object coordinate system of the current frame image based on the relative speed of the target object and the time difference before and after the target object is displaced in the two adjacent frame images;

converting the predicted target object position into the camera coordinate system;

calculating to obtain the position of the target object under the image coordinate system of the current frame image according to the coordinate change of the camera coordinate system in the two adjacent frame images;

and acquiring the position of the target object in the prediction perception frame of the current frame image based on the position of the target object in the image coordinate system of the current frame image.

4. The method according to any one of claims 1 to 3, wherein the first matching of the at least one sensing frame identified in the current frame image with the prediction sensing frame at the corresponding position respectively comprises:

performing intersection ratio calculation on at least one sensing frame identified in the current frame image and the prediction sensing frame of the corresponding type to obtain an intersection ratio IOU of each target object in the current frame image;

determining a target object with the largest intersection ratio IOU from the current frame image to obtain a tracking object;

if the intersection ratio of the tracked object meets the lowest limit compared with the IOU, checking whether coordinate jumping exists between the perception frame of the tracked object and the corresponding prediction perception frame or not;

and if the coordinate jump does not exist, the perception frame of the tracked object is a target perception frame successfully matched in the current frame image.

5. The method of claim 4, wherein invoking a neural network model to extract the sensing box features from the target sensing box successfully matched comprises:

performing multi-task scheduling on the successfully matched target perception frame in the current frame image;

carrying out external expansion on the range of the target sensing frame to obtain an externally expanded target sensing frame;

and calling the neural network model to classify the target perception frame after the external expansion, determining the type of the target perception frame, and extracting the perception frame characteristics of the target perception frame which is successfully matched in the current frame image.

6. The method as claimed in claim 5, wherein performing secondary matching on the target perception box by using Hungarian algorithm based on the perception box features of the target perception box obtained by extraction comprises:

performing cosine similarity calculation on at least one sensing frame feature extracted from the current frame image and the corresponding historical feature of the target object;

determining a target perception frame with the maximum similarity from the calculation result;

if the intersection ratio of the target sensing frame with the maximum similarity meets the lowest threshold value than the IOU, acquiring a weight matrix of the target sensing frame with the maximum similarity;

and performing secondary matching on the target perception box with the maximum similarity by adopting the Hungarian algorithm based on the weight matrix of the target perception box with the maximum similarity.

7. The method of claim 6, wherein determining the tracked location of the target object if the matching is successful comprises:

under the condition of successful matching, checking whether coordinate jumping exists between the target perception frame with the maximum similarity and the corresponding historical perception frame;

and if the coordinate jump does not exist, determining that the target object corresponding to the target perception frame with the maximum similarity is the tracked target object, and acquiring the position of the tracked target object.

8. The method of claim 7, further comprising:

and acquiring a target sensing frame which fails to be matched, and giving new identification information to the target sensing frame which fails to be matched.

9. An object tracking device, comprising:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a current frame image of a vehicle running environment and a previous frame image of the current frame image;

the prediction module is used for predicting a prediction sensing frame in the current frame image by adopting at least one sensing frame identified in the previous frame image, wherein the sensing frame is used for identifying a target object to be tracked in the previous frame image;

the first matching module is used for respectively carrying out first matching on at least one sensing frame identified in the current frame image and the prediction sensing frame at the corresponding position;

the acquisition module is used for acquiring the target sensing frame which is successfully matched in the current frame image;

the calling module is used for calling a neural network model to extract the characteristics of the sensing frame from the target sensing frame which is successfully matched;

the second matching module is used for performing secondary matching on the target perception box by adopting a Hungarian algorithm based on the perception box characteristics of the target perception box obtained through extraction;

and the determining module is used for determining the position of the tracked target object under the condition that the matching is successful.

10. An electronic device, characterized in that the electronic device comprises one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is arranged to perform the object tracking method of any of claims 1 to 8 when run.