CN113793364B

CN113793364B - Target tracking method and device, computer equipment and storage medium

Info

Publication number: CN113793364B
Application number: CN202111352452.2A
Authority: CN
Inventors: 程飞洋; 刘国清; 杨广; 王启程; 高宏远; 薛婉莹; 张松
Original assignee: Shenzhen Minieye Innovation Technology Co Ltd
Current assignee: Shenzhen Youjia Innovation Technology Co.,Ltd.
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2022-04-15
Anticipated expiration: 2041-11-16
Also published as: CN113793364A

Abstract

The present application relates to a target tracking method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: determining a reference frame and a current frame in a tracking image sequence, and performing convolution calculation on the difference characteristics of an image block where a target is located and a search image block to obtain the tracking state of the target image block; when the target exists in the search image block, carrying out convolution calculation on the difference characteristics to obtain the matching probability of each pixel point of the target in the search image block; selecting candidate pixel points with the matching probability larger than zero from the searched image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point; and determining the position and the size of the target in the search image block according to the position and the size of the target outer frame corresponding to each candidate pixel point, so that the current vehicle can accurately identify the tail frames at different angles on the road, and the safety of the advanced auxiliary driving system is improved.

Description

Target tracking method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of image recognition technologies, and in particular, to a target tracking method, an apparatus, a computer device, a storage medium, and a computer program product.

Background

In the process of developing a high-grade auxiliary driving system based on monocular vision, Forward Collision Warning (FCW) and an automatic Emergency braking system AEB (automatic Emergency brake) function require that a camera has the capability of stably and accurately measuring the distance and relative movement of a vehicle, so that corresponding auxiliary measures are taken in time when peripheral vehicles accelerate or decelerate and change lanes, safe driving is ensured, and traffic accidents are avoided.

However, in the conventional technology, in the driving environment facing different time, weather and road types and the transportation vehicles facing different angles, the tail frames of the vehicles facing different angles on the road cannot be accurately identified, and thus the distance between the preceding vehicle and the current vehicle cannot be accurately calculated, and the safety of the advanced assistant driving system is reduced.

Disclosure of Invention

In view of the above, it is necessary to provide a target tracking method, an apparatus, a computer device, a computer readable storage medium and a computer program product for solving the above technical problems.

In a first aspect, the present application provides a target tracking method. The method comprises the following steps: determining a reference frame and a current frame in a tracking image sequence, determining an image block where a target is located from the reference frame, and determining a search image block from the current frame; performing convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the image block where the target is located; the tracking state is used for indicating the existence state of the target in the search image block; when the target exists in the search image block, performing convolution calculation on the differential characteristics to obtain the matching probability of each pixel point of the target in the search image block; selecting candidate pixel points with the matching probability larger than zero from the search image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point; and according to the matching probability of each candidate pixel point, carrying out weighted average on the positions and the sizes of the target outer covering frames corresponding to all the candidate pixel points to obtain the final position and size of the target in the search image block.

In one embodiment, the determining a reference frame and a current frame in a tracking image sequence, determining an image block where an object is located from the reference frame, and determining a search image block from the current frame includes: randomly selecting two adjacent frames of images in the tracking image sequence, taking the former frame of the two adjacent frames of images as a reference frame, and taking the latter frame of the two adjacent frames of images as a current frame; determining an image block where a target is located from the reference frame; and finding a corresponding position in the current frame according to the position of the image block where the target is located, and expanding the area of the corresponding position by a preset multiple to be used as a search image block.

In one embodiment, the determining, from the reference frame, an image block where the target is located includes: if the reference frame is a first frame, detecting the initial position and the size of a target in the reference frame, and determining an image block where the target is located according to the initial position and the size of the target; if the reference frame is a non-first frame, obtaining the position and the size of the target in the non-first frame according to the position and the size of the target in a previous frame adjacent to the non-first frame, and determining the image block where the target is located according to the position and the size of the target in the non-first frame.

In one embodiment, the convolution calculation is implemented by using a trained vehicle tracking neural network model; the training mode of the vehicle tracking neural network model comprises the following steps: obtaining a plurality of training image samples, wherein each training image sample comprises a group of sample target image blocks in a sample reference frame and a sample search image block in a sample current frame; the group of sample reference frames comprises at least one sample reference frame, a sample target image block of each sample reference frame is obtained on the basis of a pre-labeled labeling area containing a target, and a sample search image block is obtained on the basis of a pre-labeled labeling area containing the target by expanding preset times; coding sample target image blocks in a group of sample reference frames to obtain a group of coding features, and then carrying out average calculation on the group of coding features to obtain reference average features of the target; coding a sample search image block in the sample current frame to obtain the characteristics of the sample search image block; subtracting the reference average characteristic from the characteristic of the sample search image block to obtain a difference characteristic of the target matched with each point of the sample search image block, and performing convolution calculation on the difference characteristic to obtain a prediction matching probability of each pixel point of the target in the sample search image block; calculating a first loss function according to the predicted matching probability and the real matching probability; performing convolution calculation according to the difference characteristics to obtain the fraction of the target tracking state; calculating a second loss function according to the fraction of the target tracking state; performing convolution calculation according to the characteristics of the sample search image blocks to obtain the predicted distance from each pixel point in the sample search image blocks to the upper boundary, the lower boundary, the left boundary and the right boundary of a target; calculating a third loss function according to the predicted distance and the real distance from each pixel point in the sample search image block to the upper boundary, the lower boundary, the left boundary and the right boundary of the target; and training a vehicle tracking neural network model according to the first loss function, the second loss function and the third loss function to obtain a trained vehicle tracking neural network model.

In one embodiment, the determining method of the labeled region includes: acquiring a vehicle tail frame marked on a vehicle sample image; and smoothly fitting the width or height of the tail frame by adopting a cubic curve equation to obtain a labeled area.

In a second aspect, the present application further provides a target tracking apparatus. The device comprises: the image determining module is used for determining a reference frame and a current frame in a tracking image sequence, determining an image block where a target is located from the reference frame, and determining a search image block from the current frame; the state determining module is used for performing convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the target image block; the tracking state is used for indicating the existence state of the target in the search image block; the probability obtaining module is used for performing convolution calculation on the difference characteristics to obtain the matching probability of each pixel point of the target in the search image block when the target exists in the search image block;

the prediction module is used for selecting candidate pixel points with the matching probability larger than zero from the search image blocks, and carrying out convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer enclosure frame corresponding to each candidate pixel point; and the target determining module is used for carrying out weighted average on the positions and the sizes of the target outer covering frames corresponding to all the candidate pixel points according to the matching probability of each candidate pixel point to obtain the final position and size of the target in the search image block.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program: determining a reference frame and a current frame in a tracking image sequence, determining an image block where a target is located from the reference frame, and determining a search image block from the current frame; performing convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the target image block; the tracking state is used for indicating the existence state of the target in the search image block; when the target exists in the search image block, performing convolution calculation on the differential characteristics to obtain the matching probability of each pixel point of the target in the search image block; selecting candidate pixel points with the matching probability larger than zero from the search image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point; and according to the matching probability of each candidate pixel point, carrying out weighted average on the positions and the sizes of the target outer covering frames corresponding to all the candidate pixel points to obtain the final position and size of the target in the search image block.

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of: determining a reference frame and a current frame in a tracking image sequence, determining an image block where a target is located from the reference frame, and determining a search image block from the current frame; performing convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the target image block; the tracking state is used for indicating the existence state of the target in the search image block; when the target exists in the search image block, performing convolution calculation on the differential characteristics to obtain the matching probability of each pixel point of the target in the search image block; selecting candidate pixel points with the matching probability larger than zero from the search image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point; and according to the matching probability of each candidate pixel point, carrying out weighted average on the positions and the sizes of the target outer covering frames corresponding to all the candidate pixel points to obtain the final position and size of the target in the search image block.

In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of: determining a reference frame and a current frame in a tracking image sequence, determining an image block where a target is located from the reference frame, and determining a search image block from the current frame; performing convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the target image block; the tracking state is used for indicating the existence state of the target in the search image block; when the target exists in the search image block, performing convolution calculation on the differential characteristics to obtain the matching probability of each pixel point of the target in the search image block; selecting candidate pixel points with the matching probability larger than zero from the search image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point; and according to the matching probability of each candidate pixel point, carrying out weighted average on the positions and the sizes of the target outer covering frames corresponding to all the candidate pixel points to obtain the final position and size of the target in the search image block.

According to the target tracking method, the target tracking device, the computer equipment, the storage medium and the computer program product, a reference frame and a current frame are determined in a tracking image sequence, an image block where a target is located is determined from the reference frame, and a search image block is determined from the current frame; carrying out convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the target image block; when the target exists in the search image block, carrying out convolution calculation on the difference characteristics to obtain the matching probability of each pixel point of the target in the search image block; selecting candidate pixel points with the matching probability larger than zero from the searched image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point; the position and the size of the target in the search image block are determined according to the position and the size of the target outer packaging frame corresponding to each candidate pixel point, and the sizes and the positions of the target prediction outer packaging frames are weighted and averaged, so that the current vehicle can accurately identify the tail frames at different angles on the road, and the safety of the advanced auxiliary driving system is improved.

Drawings

FIG. 1 is a diagram of an application environment of a target tracking method in one embodiment;

FIG. 2 is a schematic flow chart diagram of a target tracking method in one embodiment;

FIG. 3 is a schematic flow chart of the image determination step in one embodiment;

FIG. 4 is a flow chart illustrating the state determination step in one embodiment;

FIG. 5 is a schematic diagram of a vehicle tracking neural network model in one embodiment;

FIG. 6 is a graph showing the variation of the width of a rectangular frame before and after the tailstock frame marking process according to an embodiment;

FIG. 7 is a schematic flow chart diagram of a target tracking method in another embodiment;

FIG. 8 is a block diagram of a target tracking device in one embodiment;

FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The target tracking method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The method comprises the steps that a terminal 102 collects a tracking image sequence, a server 104 receives the tracking image sequence sent by the terminal 102, a reference frame and a current frame are determined in the tracking image sequence, an image block where a target is located is determined from the reference frame, and a search image block is determined from the current frame; carrying out convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the target image block; the tracking state is used for indicating the existence state of the target in the search image block; when the target exists in the search image block, carrying out convolution calculation on the difference characteristics to obtain the matching probability of each pixel point of the target in the search image block;

selecting candidate pixel points with the matching probability larger than zero from the searched image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point; and according to the matching probability of each candidate pixel point, carrying out weighted average on the positions and the sizes of the target outer covering frames corresponding to all the candidate pixel points to obtain the final position and size of the target in the search image block. The terminal 102 may be, but is not limited to, a camera on a personal computer, a notebook computer, a smart phone, or a tablet computer; the server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

In one embodiment, as shown in fig. 2, a target tracking method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

s202, determining a reference frame and a current frame in the tracking image sequence, determining an image block where a target is located from the reference frame, and determining a search image block from the current frame.

The target image block in the reference frame refers to an image block where a target in the reference frame image is located, and the search image block in the current frame refers to an image block of an area used for searching the target image block in the current frame. When the reference frame is the first frame image on the tracking image sequence, the image block where the target is located in the reference frame image can be obtained through the detector, and when the reference frame image is not the first frame image, the image block where the target is located in the reference frame at the current moment is determined according to the image block where the target is located in the video frame at the previous moment.

Specifically, the server obtains a section of image sequence to be tracked sent by the terminal equipment, randomly selects two adjacent frames from the image sequence to be tracked, and sends the two adjacent frames to the terminal equipmentt-the image frame at time 1 as a reference frame, willtThe image frame of the moment is taken as the current frame, whereintIt is shown that any time in the image sequence is tracked, then the image block of the target is determined from the reference frame, and the search image block is determined from the current frame.

S204, carrying out convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the target image block; wherein the tracking status is used to indicate the presence status of the target in the search image block.

Specifically, the feature difference value of the target matched with each point of the search area is obtained by subtracting the obtained target image feature of the reference frame from the feature of the current frame search image block, the feature difference value is input to a first convolution layer to obtain a first convolution result, the first convolution result is input to a second convolution layer to obtain a second convolution result, and then the second convolution result is input to a full-connection layer to obtain the existence state of the target in the search image block, wherein the existence state of the target in the search image block can be represented by a true value "1" when the target exists in the search image block, and the existence state of the target in the search image block can be represented by a true value "0" when the target does not exist in the search image block.

And S206, when the target exists in the search image block, performing convolution calculation on the difference characteristics to obtain the matching probability of each pixel point of the target in the search image block.

Specifically, when the target obtained in S204 exists in the search image block, a feature difference obtained by subtracting the feature of the reference frame target image from the feature of the current frame search image block is input into the first convolution layer to obtain a first convolution result, and the first convolution result is input into the sampling layer to obtain the matching probability of each pixel point of the target in the search image block.

S208, selecting candidate pixel points with the matching probability larger than zero from the searched image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of the target outer covering frame corresponding to each candidate pixel point.

Specifically, the server encodes a search image block through an encoder to obtain search image features, selects pixel points with a matching probability greater than zero from the search image features, determines the pixel points as candidate pixel points, inputs the candidate pixel point features into a first convolution layer to obtain a first convolution result, inputs the first convolution result into a second convolution layer to obtain a second convolution result, inputs the second convolution result into a third convolution layer to obtain a third convolution result, inputs the third convolution result into an upper sampling layer to obtain the final output of each pixel point as a value of four channels, and can determine the size and the position of a prediction target image of each pixel point in a current frame search image according to the values of the four channels.

S210, according to the matching probability of each candidate pixel point, carrying out weighted average on the positions and the sizes of the target outer covering frames corresponding to all the candidate pixel points to obtain the final position and size of the target in the search image block.

And performing weighted average on the size and the position of the target image obtained by prediction of each candidate pixel point obtained in the step S208 to obtain an averaged target position and size, and taking the averaged target position and size as the position and size of the target in the search image.

The target tracking method comprises the steps of determining a reference frame and a current frame in a tracking image sequence, determining an image block where a target is located from the reference frame, and determining a search image block from the current frame; carrying out convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the target image block; when the target exists in the search image block, carrying out convolution calculation on the difference characteristics to obtain the matching probability of each pixel point of the target in the search image block; selecting candidate pixel points with the matching probability larger than zero from the searched image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point; the position and the size of the target in the search image block are determined according to the position and the size of the target outer packaging frame corresponding to each candidate pixel point, and the sizes and the positions of the target prediction outer packaging frames are weighted and averaged, so that the current vehicle can accurately identify the tail frames at different angles on the road, and the safety of the advanced auxiliary driving system is improved.

In one embodiment, as shown in fig. 3, determining an image block where an object is located from a reference frame includes:

s302, randomly selecting two adjacent frames of images in the tracking image sequence, taking the former frame of the two adjacent frames of images as a reference frame, and taking the latter frame of the two adjacent frames of images as a current frame.

The tracking image sequence comprises video frames at various moments in a period of time. When the target in each frame image in the tracking image sequence to be identified is identified, when the identification object is a first frame image, the first frame image is a current frame, and no reference frame exists at the moment; when the identification object is not the first frame image, the identification object is determined as the current frame, and the previous frame image of the current frame is determined as the reference frame.

And S304, determining the image block where the target is located from the reference frame.

There are two ways to determine the image block where the target is located from the reference frame. If the reference frame is a first frame image, detecting the initial position and the size of a target in the first frame image by using a detector, and determining an image block where the target is located according to the initial position and the size of the target; if the reference frame is a non-first frame, obtaining the position and the size of the target in the non-first frame according to the position and the size of the target in the previous frame adjacent to the non-first frame, and determining the image block where the target is located according to the position and the size of the target in the non-first frame.

In a specific embodiment, if the current frame is the first frame in the tracking video sequence, the initial position and size of the target in the first frame image are detected by the detector,

and if the current frame is the second frame in the tracking video sequence, taking the first frame in the tracking video sequence as a reference frame, acquiring the image characteristics of the target in the first frame image, and taking the image characteristics of the target in the first frame image as the first target characteristics. And inputting the first target characteristic into a pre-trained vehicle tracking neural network model, thereby predicting the position and size of a target image consistent with the first target characteristic in the current frame, namely the search image block of the second frame, and determining the image block where the target of the current frame is located according to the position and size of the target image.

And if the current frame is the third frame in the tracking video sequence, taking the second frame in the tracking video sequence as a reference frame, and taking the target image block in the second frame as the image block where the target of the reference frame is located.

S306, finding a corresponding position in the current frame according to the position of the image block where the target is located, and expanding the area of the corresponding position by a preset multiple to serve as a search image block.

And finding a corresponding position in the current frame according to the relative position relation of the image block of the target in the reference frame obtained in the step S304, and expanding the width and height of the image block of the target by preset times to form a search image block on the current frame.

In this embodiment, when identifying a target in each current frame, an image in which the target in the previous frame of the current frame is located is used as a target image block of the reference frame, instead of always using an image block in which the target in the initial frame is located as an image block in which the target in the reference frame is located, so that the accuracy of target tracking can be improved.

In one embodiment, the determining method of the difference characteristics of the image block where the target is located and the search image block includes:

if the reference frame is the first frame, acquiring the characteristics of the image block where the target is located, and calculating the difference characteristics between the characteristics of the image block where the target is located and the characteristics of the search image block; if the reference frame is not the first frame, acquiring the target characteristics on the previous frame of the reference frame and the target characteristics on the reference frame, carrying out weighted average to obtain the characteristics of the image block where the target is located, and then calculating the difference characteristics between the characteristics of the image block where the target is located and the characteristics of the search image block.

In this embodiment, by performing weighted average on the target features of multiple reference frames, and the target image blocks on each frame of reference frame have different offsets and angles, the position and size of the target in the current frame can be better identified in the next tracking process.

In one embodiment, as shown in fig. 4, a training method of a vehicle tracking neural network model is provided, and the convolution calculation is implemented by using the trained vehicle tracking neural network model, which is shown in fig. 5; the training mode of the vehicle tracking neural network model comprises the following steps:

s402, obtaining a plurality of training image samples, wherein each training image sample comprises a group of sample target image blocks in a sample reference frame and a sample search image block in a sample current frame; the group of sample reference frames comprises at least one sample reference frame, a sample target image block of each sample reference frame is obtained on the basis of a pre-labeled labeling area containing a target, and a sample search image block is obtained on the basis of a pre-labeled labeling area containing the target by expanding preset times.

Specifically, a section of image sequence to be tracked is obtained, the image sequence to be tracked includes the same target and the size of the labeled specific target, a frame in the image sequence to be tracked is randomly selected as a current frame, a frame adjacent to the current frame is a reference frame, target image blocks intercepted from a group of reference frames are defined as patch11, patch 12. Specifically, for a frame of reference image, according to the position of the labeled target image, after slight random disturbance is applied to the size of the labeled frame, the corresponding image block is intercepted as the target image block, and the coincidence degree of the intercepted target area and the labeled target area is larger than 90%. And for the current frame, adding random offset according to the marked size of the target, and intercepting the image block which is 2 times as large as the target size. For a set of positive samples, a limit should be imposed on random offset to ensure that the target is completely present within the truncated search image block. And for a group of negative samples, random offset should be limited, and it is ensured that the target is not complete and exists in the intercepted search image block, namely, the condition of target tracking failure is represented. The positive and negative samples are used for training a subsequent tracking state judgment classifier.

S404, coding sample target image blocks in a group of sample reference frames to obtain a group of coding features, and then carrying out average calculation on the group of coding features to obtain reference average features of the target; and coding the sample search image block in the sample current frame to obtain the characteristics of the sample search image block.

And coding sample target image blocks in a group of sample reference frames to obtain a group of coding features, and performing mean processing on the coding features of all the reference frames in the reorganization to obtain a reference average feature of the target. It is emphasized that all target image blocks and search image blocks need to be subjected to a certain boundary zero padding operation before being input into the model. And acquiring the characteristics of the sample search image block in the sample search image in the current frame by using the encoder. The searching image block is encoded to obtain the characteristics of the sample searching image block. Sampling the search image block to 2H x 2W, carrying out boundary zero padding, inputting the search image block into an encoder for encoding, and obtaining a four-dimensional (H/4) x (W/4) feature map. Since the size of the search image block is twice the size of the target image block, which simulates the search range between image frames in the actual tracking process, the output of the encoder is characterized by a four-dimensional feature map with a length and a width, where the feature of the target and the feature of the surrounding boundary are included. The area inside the object can be regarded as positive samples, and the boundary around the object can be regarded as negative samples, which constitute the input features of the next object matching.

S406, subtracting the reference average characteristic from the characteristic of the sample search image block to obtain a difference characteristic matched with each point of the target and the sample search image block, and performing convolution calculation on the difference characteristic to obtain a predicted matching probability of each pixel point of the target in the sample search image block; a first loss function is calculated based on the predicted match probability and the true match probability.

Wherein the KL divergence function may be used to calculate the difference between the prediction probability and the true match probability:

wherein the content of the first and second substances,ieach pixel point is represented.

Subtracting the obtained reference frame target average characteristic from the characteristic of the current frame search image block to obtain a characteristic difference value of each point matching of the target and the search area, calculating through the convolution layer and the upper sampling layer to obtain a predicted matching probability of the target in the search area, comparing the predicted matching probability with the real matching probability, and calculating a first loss function.

S408, carrying out convolution calculation according to the difference characteristics to obtain the fraction of the target tracking state; and calculating a second loss function according to the fraction of the target tracking state.

Wherein, the second loss function used to train the classification task may be a cross-entropy loss function:

wherein the content of the first and second substances,y _ifor predicted samplesiThe label of (a) is used,cis a sampleiThe real label of (a) is,p(y _i) Is a sampleiThe label isy _iThe probability of (c). When the judgment condition of the function is that the value is 1, otherwise, the value is 0.

The feature difference value of the target and each point matching of the search area contains the information of target matching, so that the feature difference value is also used as the feature for judging the target tracking state, and the score of the target tracking state can be obtained after the calculation of the convolution layer and the full connection layer. In the training stage, if the target exists in the search area, the category is 1, otherwise, the category is 0.

S410, performing convolution calculation according to the characteristics of the sample search image block to obtain the predicted distance from each pixel point in the sample search image block to the upper boundary, the lower boundary, the left boundary and the right boundary of the target; and calculating a third loss function according to the predicted distance and the real distance from each pixel point in the sample search image block to the upper boundary, the lower boundary, the left boundary and the right boundary of the target.

Wherein the third loss function may be an L1 loss function, which is defined as follows:

where err represents the error between a predicted value and the true value, e.g. h₁₁Is measured and the error between the predicted value and the true value of (c). smooth L₁The meaning of the loss function is that when the error is too large, the back propagation gradient is ± 1, the parameter can be updated faster in training, and when the error is less than 1, the back propagation gradient is | x |, the update speed of the parameter is reduced.

And S412, training the vehicle tracking neural network model according to the first loss function, the second loss function and the third loss function to obtain the trained vehicle tracking neural network model.

And training the vehicle tracking neural network model according to the first loss function obtained in the step S406, the second loss function obtained in the step S408 and the third loss function obtained in the step S410, so that the trained vehicle tracking neural network model is obtained.

In this embodiment, since the N target image blocks represent the same target with different angles and offsets, average calculation is performed on the features, different inputs with differences can be constrained to learn similar target features, and the encoder is guaranteed to have robust encoding capability on targets with different states.

In one embodiment, the determining method of the label area includes: and acquiring a vehicle tail frame marked on the vehicle sample image.

Specifically, taking the target as the end frame as an example, the method for labeling the end frame in the reference frame is as follows: when the target is located in a vehicle in front, the upper boundary of the marking frame takes the plane of the roof as a reference, the lower boundary takes the grounding points of the two rear wheels as a reference, the left boundary takes the widest position on the left side of the vehicle tail as a reference, and the right boundary takes the widest position on the right side of the vehicle tail as a reference. When the target is located at the tail of the left side vehicle, the upper, left and right landmark marking criteria are the same as those of the right front vehicle, and the lower boundary takes the grounding point of the left rear wheel as the reference. When the target is located at the tail of the right side vehicle, the upper, left and right landmark marking criteria are the same as those of the right front vehicle, and the lower boundary takes the grounding point of the right rear wheel as the reference.

And smoothly fitting the width or height of the tail frame by adopting a cubic curve equation to obtain a labeling area.

Due to the limited upper limit of the precision of manual labeling, image blurring and other factors, the labeling result of the car tail frame cannot be accurate to a pixel level or even a sub-pixel level, and is reflected on the car tail frame curve of sequence labeling to present jitter curve change. As shown in fig. 6, the vertical axis represents the variation of the width of the tail box of a sequence and the horizontal axis represents the number of image frames, wherein the jittered curves illustrate the non-smoothness of the original manual labeling. Therefore, some post-processing is required to suppress the labeling noise, so as to prompt the subsequent neural network model to learn and output a smoother tail frame size. Fitting the smooth frame width change of the car tail frame by adopting a cubic curve equation:

wherein, in the step (A),xthe number of the image frames is represented,ythe width or height of the vehicle tail frame is represented, and the width and the height of the vehicle tail frame are respectively subjected to smoothing post-processing. In other embodiments, in the running process of an actual vehicle, due to complex driving conditions such as too frequent acceleration and deceleration and curve running, a high-width change curve of the tail frame is complex and cannot be fitted by a low-order curve equation, and in this case, an original manual label is used as training data. And when the average error of curve fitting is larger than a certain threshold value, for example, three pixels, abandoning the post-fitting processing, and continuously adopting the width and height of the mark of the original tail frame as a training true value.

In the embodiment, a cubic curve equation is adopted to smoothly fit the width or height of the vehicle tail frame, wherein the smooth curve is a width change curve of a vehicle tail frame sequence subjected to post-processing, so that the smoothness of training data is effectively improved, the influence of labeling noise is avoided, and better supervision data is provided for subsequent model training.

In one embodiment, as shown in fig. 7, a method for testing robust tailstock tracking is provided, the method comprising the steps of:

and S1, acquiring the image sequence and the input image blocks of the model through the camera. Specifically, two adjacent frames are takent-1 andtwill bet-1 frame as reference frame, obtaining a target image block, andtthe frame is taken as the current frame. If the reference frame is the 1 st frame, the initial position and size of the target are obtained through a pre-trained detector, otherwise, the initial position and size of the target are obtained through a pre-trained detectort1 target position and size of frame by last trackingt-2 frames of target. According tot-1 target position and size of frame, intAnd intercepting an area with the size of 2 times as a search image block at the same position of the frame. And acquiring a reference frame target image block and a current frame search image block, and predicting the tracking state and the position and size of the target in the current frame.

And S2, if the reference frame is the first frame, taking the coding characteristics of the reference frame target as a target template, otherwise, carrying out weighted average on the target template of the last tracking calculation and the target characteristics obtained by the current tracking to obtain a new target template.

And S3, calculating the matching features and acquiring the tracking state classification. If the tracking state is 0, the tracking is judged to be invalid, the current frame is taken as the initial frame of the first frame, the target is detected again, and the tracking process is restarted. And if the tracking state is 1, entering the subsequent step and calculating the position and the size of the target.

And S4, when the tracking is judged to be successful, calculating the matching probability of the reference target in the searching image block area. The predicted matching probability distribution is similar to the training true value, the matching probability of the target is in the target, the position close to the target center is larger than 0 in a certain range, the position far away from the target center and the position beyond the target range are provided, and the matching probability is 0.

S5, for a position with matching probability greater than 0x ₁Can be made ofObtaining a set of predictionsh ₁₁, h ₁₂, w ₁₁, w ₁₂Value is then based onx ₁The coordinate position of the self can be used for calculating the starting coordinate of the upper left corner of the target outer covering frame (r _x1，c _x1) And width and height: (w ₁₁+ w ₁₂,h ₁₁+h ₁₂). For all points with a matching probability greater than 0x ₁, x ₂, ...,x _NThe N candidate target bounding boxes may be calculated, and the final target bounding box position and size may be obtained by performing weighted average on the start coordinates and widths and heights of all bounding boxes, where the weight is the matching probability of the point.

And S6, repeatedly executing the substep S1, and continuing the next tracking process until all tracking tasks are completed.

In the embodiment, a training strategy for training simulation reasoning is provided, so that the robustness of the vehicle tail frame tracking model is improved; and simultaneously predicting a plurality of candidate car tail frames, carrying out weighted average on the candidate car tail frames according to the tracked matching probability, and calculating the final car tail frame, so that the stability and the smoothness of the sequence tracking result are improved.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a target tracking apparatus for implementing the above-mentioned target tracking method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the target tracking device provided below can be referred to the limitations of the target tracking method in the foregoing, and details are not described herein again.

In one embodiment, as shown in FIG. 8, there is provided a target tracking device 800 comprising: the device comprises an image determining module, a state determining module, a probability obtaining module, a predicting module and a target determining module, wherein:

an image determining module 802, configured to determine a reference frame and a current frame in a tracking image sequence, determine an image block where a target is located from the reference frame, and determine a search image block from the current frame;

the state determining module 804 is configured to perform convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain a tracking state of the target image block; the tracking state is used for indicating the existence state of the target in the search image block;

a probability obtaining module 806, configured to perform convolution calculation on the difference features to obtain a matching probability of each pixel point of the target in the search image block when the target exists in the search image block;

the prediction module 808 is configured to select candidate pixel points with a matching probability greater than zero from the search image block, and perform convolution calculation on the image features of each candidate pixel point to obtain a position and a size of a target bounding box corresponding to each candidate pixel point;

and the target determining module 810 is configured to perform weighted average on the positions and sizes of the target bounding boxes corresponding to all the candidate pixels according to the matching probability of each candidate pixel, so as to obtain a final position and size of the target in the search image block.

The target tracking apparatus 800 determines a reference frame and a current frame in a tracking image sequence, determines an image block where a target is located from the reference frame, and determines a search image block from the current frame; carrying out convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the target image block; when the target exists in the search image block, carrying out convolution calculation on the difference characteristics to obtain the matching probability of each pixel point of the target in the search image block; selecting candidate pixel points with the matching probability larger than zero from the searched image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point; the position and the size of the target in the search image block are determined according to the position and the size of the target outer packaging frame corresponding to each candidate pixel point, and the sizes and the positions of the target prediction outer packaging frames are weighted and averaged, so that the current vehicle can accurately identify the tail frames at different angles on the road, and the safety of the advanced auxiliary driving system is improved.

In one embodiment, the image determination module comprises: the current frame determining submodule is used for randomly selecting two adjacent frames of images in the tracking image sequence, taking the former frame of the two adjacent frames of images as a reference frame and taking the latter frame of the two adjacent frames of images as a current frame; the reference image block submodule is used for determining an image block where a target is located from the reference frame; and the image block searching determining submodule is used for searching a corresponding position in the current frame according to the position of the image block where the target is located, and expanding the area of the corresponding position by a preset multiple to be used as a searching image block.

In one embodiment, the image determining module is configured to detect an initial position and a size of a target in the reference frame if the reference frame is a first frame, and determine an image block where the target is located according to the initial position and the size of the target; if the reference frame is a non-first frame, obtaining the position and the size of the target in the non-first frame according to the position and the size of the target in a previous frame adjacent to the non-first frame, and determining the image block where the target is located according to the position and the size of the target in the non-first frame.

In one embodiment, the state determination module is configured to, if the reference frame is the first frame, obtain a feature of an image block where the target is located, and calculate a difference feature between the feature of the image block where the target is located and the feature of the search image block; if the reference frame is not the first frame, acquiring the target characteristics on the previous frame of the reference frame and the target characteristics on the reference frame, carrying out weighted average to obtain the characteristics of the image block where the target is located, carrying out weighted average on the target characteristics input in last tracking and the target characteristics acquired in current tracking to obtain the characteristics of the image block where the target is located, and then calculating the difference characteristics between the characteristics of the image block where the target is located and the characteristics of the search image block.

In one embodiment, the apparatus further comprises a model training module, wherein the model training module comprises: the system comprises a sample acquisition sub-module, a sample acquisition sub-module and a training image acquisition sub-module, wherein the sample acquisition sub-module is used for acquiring a plurality of training image samples, and each training image sample comprises a sample target image block in a group of sample reference frames and a sample search image block in a sample current frame; the group of sample reference frames comprises at least one sample reference frame, a sample target image block of each sample reference frame is obtained on the basis of a pre-labeled labeling area containing a target, and a sample search image block is obtained on the basis of a pre-labeled labeling area containing the target by expanding preset times; the probability obtaining sub-module is used for subtracting the reference average characteristic from the characteristic of the sample search image block to obtain a difference characteristic of the target matched with each point of the sample search image block, and performing convolution calculation on the difference characteristic to obtain a predicted matching probability of each pixel point of the target in the sample search image block; a first function calculation sub-module for calculating a first loss function according to the predicted matching probability and the true matching probability; the score acquisition submodule is used for carrying out convolution calculation according to the difference characteristics to obtain the score of the target tracking state; the second function acquisition submodule is used for calculating a second loss function according to the fraction of the target tracking state; the distance prediction sub-module is used for performing convolution calculation according to the characteristics of the sample search image block to obtain the prediction distance from each pixel point in the sample search image block to the upper boundary, the lower boundary, the left boundary and the right boundary of a target; the third function obtaining submodule is used for calculating a third loss function according to the predicted distance and the real distance from each pixel point in the sample search image block to the upper boundary, the lower boundary, the left boundary and the right boundary of the target; and the tracking model generation submodule is used for training a vehicle tracking neural network model according to the first loss function, the second loss function and the third loss function to obtain the trained vehicle tracking neural network model.

In one embodiment, the sample acquisition sub-module comprises: the vehicle tail frame acquiring unit is used for acquiring a vehicle tail frame marked on the vehicle sample image; and the region labeling unit is used for smoothly fitting the width or height of the tail frame by adopting a cubic curve equation to obtain a labeled region.

The modules in the target tracking device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store match probability data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of object tracking. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: determining a reference frame and a current frame in a tracking image sequence, determining an image block where a target is located from the reference frame, and determining a search image block from the current frame; performing convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the target image block; the tracking state is used for indicating the existence state of the target in the search image block; when the target exists in the search image block, performing convolution calculation on the differential characteristics to obtain the matching probability of each pixel point of the target in the search image block; selecting candidate pixel points with the matching probability larger than zero from the search image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point; and according to the matching probability of each candidate pixel point, carrying out weighted average on the positions and the sizes of the target outer covering frames corresponding to all the candidate pixel points to obtain the final position and size of the target in the search image block.

In one embodiment, the processor, when executing the computer program, further performs the steps of: randomly selecting two adjacent frames of images in the tracking image sequence, taking the former frame of the two adjacent frames of images as a reference frame, and taking the latter frame of the two adjacent frames of images as a current frame; determining an image block where a target is located from the reference frame; and finding a corresponding position in the current frame according to the position of the image block where the target is located, and expanding the area of the corresponding position by a preset multiple to be used as a search image block.

In one embodiment, the processor, when executing the computer program, further performs the steps of: if the reference frame is a first frame, detecting the initial position and the size of a target in the reference frame, and determining an image block where the target is located according to the initial position and the size of the target; if the reference frame is a non-first frame, obtaining the position and the size of the target in the non-first frame according to the position and the size of the target in a previous frame adjacent to the non-first frame, and determining the image block where the target is located according to the position and the size of the target in the non-first frame.

In one embodiment, the processor, when executing the computer program, further performs the steps of: if the reference frame is the first frame, acquiring the characteristics of the image block where the target is located, and calculating the difference characteristics between the characteristics of the image block where the target is located and the characteristics of the search image block; if the reference frame is not the first frame, acquiring the target characteristics on the previous frame of the reference frame and the target characteristics on the reference frame, carrying out weighted average to obtain the characteristics of the image block where the target is located, and then calculating the difference characteristics between the characteristics of the image block where the target is located and the characteristics of the search image block.

In one embodiment, the processor, when executing the computer program, further performs the steps of: obtaining a plurality of training image samples, wherein each training image sample comprises a group of sample target image blocks in a sample reference frame and a sample search image block in a sample current frame; the group of sample reference frames comprises at least one sample reference frame, a sample target image block of each sample reference frame is obtained on the basis of a pre-labeled labeling area containing a target, and a sample search image block is obtained on the basis of a pre-labeled labeling area containing the target by expanding preset times; coding sample target image blocks in a group of sample reference frames to obtain a group of coding features, and then carrying out average calculation on the group of coding features to obtain reference average features of the target; coding a sample search image block in the sample current frame to obtain the characteristics of the sample search image block; subtracting the reference average characteristic from the characteristic of the sample search image block to obtain a difference characteristic of the target matched with each point of the sample search image block, and performing convolution calculation on the difference characteristic to obtain a prediction matching probability of each pixel point of the target in the sample search image block; calculating a first loss function according to the predicted matching probability and the real matching probability; performing convolution calculation according to the difference characteristics to obtain the fraction of the target tracking state; calculating a second loss function according to the fraction of the target tracking state; performing convolution calculation according to the characteristics of the sample search image blocks to obtain the predicted distance from each pixel point in the sample search image blocks to the upper boundary, the lower boundary, the left boundary and the right boundary of a target; calculating a third loss function according to the predicted distance and the real distance from each pixel point in the sample search image block to the upper boundary, the lower boundary, the left boundary and the right boundary of the target; and training a vehicle tracking neural network model according to the first loss function, the second loss function and the third loss function to obtain a trained vehicle tracking neural network model.

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a vehicle tail frame marked on a vehicle sample image; and smoothly fitting the width or height of the tail frame by adopting a cubic curve equation to obtain a labeled area.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: determining a reference frame and a current frame in a tracking image sequence, determining an image block where a target is located from the reference frame, and determining a search image block from the current frame; performing convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the target image block; the tracking state is used for indicating the existence state of the target in the search image block; when the target exists in the search image block, performing convolution calculation on the differential characteristics to obtain the matching probability of each pixel point of the target in the search image block; selecting candidate pixel points with the matching probability larger than zero from the search image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point; and according to the matching probability of each candidate pixel point, carrying out weighted average on the positions and the sizes of the target outer covering frames corresponding to all the candidate pixel points to obtain the final position and size of the target in the search image block.

In one embodiment, the computer program when executed by the processor further performs the steps of: randomly selecting two adjacent frames of images in the tracking image sequence, taking the former frame of the two adjacent frames of images as a reference frame, and taking the latter frame of the two adjacent frames of images as a current frame; determining an image block where a target is located from the reference frame; and finding a corresponding position in the current frame according to the position of the image block where the target is located, and expanding the area of the corresponding position by a preset multiple to be used as a search image block.

In one embodiment, the computer program when executed by the processor further performs the steps of: if the reference frame is a first frame, detecting the initial position and the size of a target in the reference frame, and determining an image block where the target is located according to the initial position and the size of the target; if the reference frame is a non-first frame, obtaining the position and the size of the target in the non-first frame according to the position and the size of the target in a previous frame adjacent to the non-first frame, and determining the image block where the target is located according to the position and the size of the target in the non-first frame.

In one embodiment, the computer program when executed by the processor further performs the steps of: if the reference frame is the first frame, acquiring the characteristics of the image block where the target is located, and calculating the difference characteristics between the characteristics of the image block where the target is located and the characteristics of the search image block; if the reference frame is not the first frame, acquiring the target characteristics on the previous frame of the reference frame and the target characteristics on the reference frame, carrying out weighted average to obtain the characteristics of the image block where the target is located, carrying out weighted average on the target characteristics input in last tracking and the target characteristics acquired in current tracking to obtain the characteristics of the image block where the target is located, and then calculating the difference characteristics between the characteristics of the image block where the target is located and the characteristics of the search image block.

In one embodiment, the computer program when executed by the processor further performs the steps of: obtaining a plurality of training image samples, wherein each training image sample comprises a group of sample target image blocks in a sample reference frame and a sample search image block in a sample current frame; the group of sample reference frames comprises at least one sample reference frame, a sample target image block of each sample reference frame is obtained on the basis of a pre-labeled labeling area containing a target, and a sample search image block is obtained on the basis of a pre-labeled labeling area containing the target by expanding preset times; coding sample target image blocks in a group of sample reference frames to obtain a group of coding features, and then carrying out average calculation on the group of coding features to obtain reference average features of the target; coding a sample search image block in the sample current frame to obtain the characteristics of the sample search image block; subtracting the reference average characteristic from the characteristic of the sample search image block to obtain a difference characteristic of the target matched with each point of the sample search image block, and performing convolution calculation on the difference characteristic to obtain a prediction matching probability of each pixel point of the target in the sample search image block; calculating a first loss function according to the predicted matching probability and the real matching probability; performing convolution calculation according to the difference characteristics to obtain the fraction of the target tracking state; calculating a second loss function according to the fraction of the target tracking state; performing convolution calculation according to the characteristics of the sample search image blocks to obtain the predicted distance from each pixel point in the sample search image blocks to the upper boundary, the lower boundary, the left boundary and the right boundary of a target; calculating a third loss function according to the predicted distance and the real distance from each pixel point in the sample search image block to the upper boundary, the lower boundary, the left boundary and the right boundary of the target; and training a vehicle tracking neural network model according to the first loss function, the second loss function and the third loss function to obtain a trained vehicle tracking neural network model.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a vehicle tail frame marked on a vehicle sample image; and smoothly fitting the width or height of the tail frame by adopting a cubic curve equation to obtain a labeled area.

In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of: determining a reference frame and a current frame in a tracking image sequence, determining an image block where a target is located from the reference frame, and determining a search image block from the current frame; performing convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the target image block; the tracking state is used for indicating the existence state of the target in the search image block; when the target exists in the search image block, performing convolution calculation on the differential characteristics to obtain the matching probability of each pixel point of the target in the search image block; selecting candidate pixel points with the matching probability larger than zero from the search image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point; and according to the matching probability of each candidate pixel point, carrying out weighted average on the positions and the sizes of the target outer covering frames corresponding to all the candidate pixel points to obtain the final position and size of the target in the search image block.

In one embodiment, the computer program when executed by the processor further performs the steps of: if the reference frame is the first frame, acquiring the characteristics of the image block where the target is located, and calculating the difference characteristics between the characteristics of the image block where the target is located and the characteristics of the search image block; if the reference frame is not the first frame, acquiring the target characteristics on the previous frame of the reference frame and the target characteristics on the reference frame, carrying out weighted average to obtain the characteristics of the image block where the target is located, and then calculating the difference characteristics between the characteristics of the image block where the target is located and the characteristics of the search image block.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A method of target tracking, the method comprising:

determining a reference frame and a current frame in a tracking image sequence, determining an image block where a target is located from the reference frame, and determining a search image block from the current frame;

performing convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the image block where the target is located; the tracking state is used for indicating the existence state of the target in the search image block;

when the target exists in the search image block, performing convolution calculation on the differential characteristics to obtain the matching probability of each pixel point of the target in the search image block;

selecting candidate pixel points with the matching probability larger than zero from the search image blocks, and performing convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer covering frame corresponding to each candidate pixel point;

and according to the matching probability of each candidate pixel point, carrying out weighted average on the positions and the sizes of the target outer covering frames corresponding to all the candidate pixel points to obtain the final position and size of the target in the search image block.

2. The method according to claim 1, wherein the determining a reference frame and a current frame in a tracking image sequence, determining an image block where an object is located from the reference frame, and determining a search image block from the current frame comprises:

randomly selecting two adjacent frames of images in the tracking image sequence, taking the former frame of the two adjacent frames of images as a reference frame, and taking the latter frame of the two adjacent frames of images as a current frame;

determining an image block where a target is located from the reference frame;

and finding a corresponding position in the current frame according to the position of the image block where the target is located, and expanding the area of the corresponding position by a preset multiple to be used as a search image block.

3. The method according to claim 2, wherein the determining the image block where the target is located from the reference frame comprises:

if the reference frame is a first frame, detecting the initial position and the size of a target in the reference frame, and determining an image block where the target is located according to the initial position and the size of the target;

if the reference frame is a non-first frame, obtaining the position and the size of the target in the non-first frame according to the position and the size of the target in a previous frame adjacent to the non-first frame, and determining the image block where the target is located according to the position and the size of the target in the non-first frame.

4. The method according to claim 1, wherein the determining manner of the difference features between the image block where the target is located and the search image block comprises:

if the reference frame is the first frame, acquiring the characteristics of the image block where the target is located, and calculating the difference characteristics between the characteristics of the image block where the target is located and the characteristics of the search image block;

if the reference frame is not the first frame, acquiring the target characteristics on the previous frame of the reference frame and the target characteristics on the reference frame, carrying out weighted average to obtain the characteristics of the image block where the target is located, and then calculating the difference characteristics between the characteristics of the image block where the target is located and the characteristics of the search image block.

5. The method of claim 1, wherein the convolution calculation is implemented using a trained vehicle tracking neural network model; the training mode of the vehicle tracking neural network model comprises the following steps:

obtaining a plurality of training image samples, wherein each training image sample comprises a group of sample target image blocks in a sample reference frame and a sample search image block in a sample current frame; the group of sample reference frames comprises at least one sample reference frame, a sample target image block of each sample reference frame is obtained on the basis of a pre-labeled labeling area containing a target, and a sample search image block is obtained on the basis of a pre-labeled labeling area containing the target by expanding preset times;

coding sample target image blocks in a group of sample reference frames to obtain a group of coding features, and then carrying out average calculation on the group of coding features to obtain reference average features of the target; coding a sample search image block in the sample current frame to obtain the characteristics of the sample search image block;

subtracting the reference average characteristic from the characteristic of the sample search image block to obtain a difference characteristic of the target matched with each point of the sample search image block, and performing convolution calculation on the difference characteristic to obtain a prediction matching probability of each pixel point of the target in the sample search image block; calculating a first loss function according to the predicted matching probability and the real matching probability;

performing convolution calculation according to the difference characteristics to obtain the fraction of the target tracking state;

calculating a second loss function according to the fraction of the target tracking state;

performing convolution calculation according to the characteristics of the sample search image blocks to obtain the predicted distance from each pixel point in the sample search image blocks to the upper boundary, the lower boundary, the left boundary and the right boundary of a target;

calculating a third loss function according to the predicted distance and the real distance from each pixel point in the sample search image block to the upper boundary, the lower boundary, the left boundary and the right boundary of the target;

and training a vehicle tracking neural network model according to the first loss function, the second loss function and the third loss function to obtain a trained vehicle tracking neural network model.

6. The method of claim 5, wherein the determining the label area comprises:

acquiring a vehicle tail frame marked on a vehicle sample image;

and smoothly fitting the width or height of the tail frame by adopting a cubic curve equation to obtain a labeled area.

7. An object tracking apparatus, characterized in that the apparatus comprises:

the image determining module is used for determining a reference frame and a current frame in a tracking image sequence, determining an image block where a target is located from the reference frame, and determining a search image block from the current frame;

the state determining module is used for performing convolution calculation on the difference characteristics of the image block where the target is located and the search image block to obtain the tracking state of the image block where the target is located; the tracking state is used for indicating the existence state of the target in the search image block;

the probability obtaining module is used for performing convolution calculation on the difference characteristics to obtain the matching probability of each pixel point of the target in the search image block when the target exists in the search image block;

the prediction module is used for selecting candidate pixel points with the matching probability larger than zero from the search image blocks, and carrying out convolution calculation on the image characteristics of each candidate pixel point to obtain the position and the size of a target outer enclosure frame corresponding to each candidate pixel point;

and the target determining module is used for determining the position and the size of the target in the search image block according to the position and the size of the target outer covering frame corresponding to each candidate pixel point.

8. The apparatus of claim 7, wherein the image determination module comprises:

a current frame determining submodule, configured to randomly select two adjacent frames of images in the tracking image sequence, and use a previous frame of the two adjacent frames of images as a reference frame and a next frame of the two adjacent frames of images as a current frame;

the reference image block submodule is used for determining an image block where a target is located from the reference frame;

and the image block searching determining submodule is used for searching a corresponding position in the current frame according to the position of the image block where the target is located, and expanding the area of the corresponding position by a preset multiple to be used as a searching image block.

9. The apparatus according to claim 7, wherein the image determining module is configured to detect an initial position and a size of an object in the reference frame if the reference frame is a first frame, and determine an image block where the object is located according to the initial position and the size of the object; if the reference frame is a non-first frame, obtaining the position and the size of the target in the non-first frame according to the position and the size of the target in a previous frame adjacent to the non-first frame, and determining the image block where the target is located according to the position and the size of the target in the non-first frame.

10. The apparatus according to claim 7, wherein the state determining module is configured to, if the reference frame is the first frame, obtain the feature of the image block where the object is located, and calculate a difference feature between the feature of the image block where the object is located and the feature of the search image block; if the reference frame is not the first frame, acquiring the target characteristics on the previous frame of the reference frame and the target characteristics on the reference frame, carrying out weighted average to obtain the characteristics of the image block where the target is located, carrying out weighted average on the target characteristics input in last tracking and the target characteristics acquired in current tracking to obtain the characteristics of the image block where the target is located, and then calculating the difference characteristics between the characteristics of the image block where the target is located and the characteristics of the search image block.

11. The apparatus of claim 7, further comprising a model training module, wherein the model training module comprises:

the system comprises a sample acquisition sub-module, a sample acquisition sub-module and a training image acquisition sub-module, wherein the sample acquisition sub-module is used for acquiring a plurality of training image samples, and each training image sample comprises a sample target image block in a group of sample reference frames and a sample search image block in a sample current frame; the group of sample reference frames comprises at least one sample reference frame, a sample target image block of each sample reference frame is obtained on the basis of a pre-labeled labeling area containing a target, and a sample search image block is obtained on the basis of a pre-labeled labeling area containing the target by expanding preset times;

the probability obtaining sub-module is used for subtracting the reference average characteristic from the characteristic of the sample search image block to obtain a difference characteristic of the target matched with each point of the sample search image block, and performing convolution calculation on the difference characteristic to obtain a predicted matching probability of each pixel point of the target in the sample search image block;

a first function calculation sub-module for calculating a first loss function according to the predicted matching probability and the true matching probability;

the score acquisition submodule is used for carrying out convolution calculation according to the difference characteristics to obtain the score of the target tracking state;

the second function acquisition submodule is used for calculating a second loss function according to the fraction of the target tracking state;

the distance prediction sub-module is used for performing convolution calculation according to the characteristics of the sample search image block to obtain the prediction distance from each pixel point in the sample search image block to the upper boundary, the lower boundary, the left boundary and the right boundary of a target;

the third function obtaining submodule is used for calculating a third loss function according to the predicted distance and the real distance from each pixel point in the sample search image block to the upper boundary, the lower boundary, the left boundary and the right boundary of the target;

and the tracking model generation submodule is used for training a vehicle tracking neural network model according to the first loss function, the second loss function and the third loss function to obtain the trained vehicle tracking neural network model.

12. The apparatus of claim 11, wherein the sample acquisition sub-module comprises:

the vehicle tail frame acquiring unit is used for acquiring a vehicle tail frame marked on the vehicle sample image;

and the region labeling unit is used for smoothly fitting the width or height of the tail frame by adopting a cubic curve equation to obtain a labeled region.

13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.