CN110826403A

CN110826403A - Tracking target determination method and related equipment

Info

Publication number: CN110826403A
Application number: CN201910925725.4A
Authority: CN
Inventors: 丁旭; 胡文泽
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2020-02-21
Anticipated expiration: 2039-09-27
Also published as: CN110826403B; WO2021057309A1

Abstract

The application discloses a tracking target determining method and related equipment, which are applied to electronic equipment, wherein the method comprises the following steps: acquiring a first image and a second image in the same target video file, and acquiring N first tracking frames of the first image, wherein the first image is a pre-set frame image of the second image, and the first image and the second image both comprise N tracking targets; inputting the second image into the hourglass network model for feature extraction, and outputting a target feature map; inputting the target characteristic diagram into a prediction network to output a thermodynamic diagram, a width and height value set and a characteristic vector set; determining N second tracking frames based on the thermodynamic diagram and the width and height value set; determining N first corresponding relations based on the N first tracking frames, the N second tracking frames and the feature vector set; and determining the tracking targets framed and selected by the N second tracking frames based on the N first corresponding relations. By adopting the method and the device, the accuracy of determining the tracking target can be improved.

Description

Tracking target determination method and related equipment

Technical Field

The present application relates to the field of electronic technologies, and in particular, to a tracking target determining method and a related device.

Background

Target tracking is one of the key technologies in the fields of image processing and video processing. The target tracking is used for identifying a tracking target in a video or an image, and is widely applied to related fields of intelligent traffic, man-machine interaction, national defense investigation and the like. The determination of the tracking target is one of the essential key steps for realizing the target tracking, the determination of the tracking target is mainly realized by adopting a deep sort algorithm at present, and the deep sort algorithm is used for matching only by utilizing the predicted position information, so that the prediction accuracy is low.

Disclosure of Invention

The embodiment of the application provides a tracking target determining method and related equipment, which are used for improving the accuracy of determining a tracking target.

In a first aspect, an embodiment of the present application provides a tracking target determining method, which is applied to an electronic device, and the method includes:

acquiring a first image and a second image in the same target video file, and acquiring N first tracking frames of the first image, wherein the first image is a pre-set frame image of the second image, the first image and the second image both comprise N tracking targets, the N first tracking frames are used for framing the N tracking targets in the first image, and N is an integer greater than 1;

inputting the second image into a hourglass network model for feature extraction, and outputting a target feature map;

inputting the target feature map into a prediction network to output a thermodynamic map, a set of width and height values, and a set of feature vectors;

determining N second tracking frames based on the thermodynamic diagram and the set of width and height values;

determining N first corresponding relations based on the N first tracking frames, the N second tracking frames and the feature vector set, wherein the N first corresponding relations are used for representing one-to-one corresponding relations of the N first tracking frames and the N second tracking frames;

determining the tracking target framed by the N second tracking frames based on the N first corresponding relations.

In a second aspect, an embodiment of the present application provides a tracking target determining apparatus, which is applied to an electronic device, and includes:

the information acquisition unit is used for acquiring a first image and a second image in the same target video file and acquiring N first tracking frames of the first image, wherein the first image is a pre-set frame image of the second image, the first image and the second image both comprise N tracking targets, the N first tracking frames are used for framing the N tracking targets in the first image, and N is an integer greater than 1;

the characteristic extraction unit is used for inputting the second image into a hourglass network model for characteristic extraction and outputting a target characteristic diagram;

a data determination unit for inputting the target feature map into a prediction network to output a thermodynamic map, a set of width and height values, and a set of feature vectors;

a tracking frame determination unit, configured to determine N second tracking frames based on the thermodynamic diagram and the set of width and height values;

a correspondence determining unit, configured to determine N first correspondences based on the N first tracking frames, the N second tracking frames, and the feature vector set, where the N first correspondences are used to represent one-to-one correspondences between the N first tracking frames and the N second tracking frames;

a tracking target determination unit, configured to determine the tracking target framed by the N second tracking frames based on the N first corresponding relationships.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for executing steps in the method according to the first aspect of the embodiment of the present application.

In a fourth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program makes a computer perform some or all of the steps described in the method according to the first aspect of the present application.

In a fifth aspect, the present application provides a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps described in the method according to the first aspect of the present application. The computer program product may be a software installation package.

It can be seen that, in the embodiment of the present application, a first image and a second image are obtained from the same target video file, the first image is a preset frame image before the second image, the second image is input into a hourglass network model to obtain a target feature map, the target feature map is input into a prediction network to obtain a thermodynamic diagram, a width-height value set and a feature vector set, a second tracking frame is determined according to the thermodynamic diagram and the width-height value set, the second tracking frame is used for framing the N tracking targets in the second image, and finally the tracking target is determined according to the first tracking frame, the second tracking frame and the feature vector set, and the first tracking frame is used for framing the N tracking targets in the first image. Therefore, the tracking target is determined based on the combination of the certain image, the previous preset frame image of the certain image and the tracking frame associated with the previous preset frame image, the tracking which changes along with the change of the position of the tracking target is realized, and the accuracy of determining the tracking target is further improved.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1A is a schematic flowchart of a tracking target determination method according to an embodiment of the present application;

fig. 1B is a schematic structural diagram of an hourglass network model provided in an embodiment of the present application;

FIG. 1C is a schematic diagram of a thermodynamic diagram provided by an embodiment of the present application;

fig. 2A is a schematic flowchart of another tracking target determination method provided in an embodiment of the present application;

fig. 2B is a schematic diagram of another tracking target determination method provided in the embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a tracking target determining apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order.

Electronic devices may include various handheld devices, vehicle mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem with wireless communication capabilities, as well as various forms of User Equipment (UE), Mobile Stations (MS), terminal equipment (terminal device), and so forth.

As shown in fig. 1A, fig. 1A is a method for determining a tracking target according to an embodiment of the present application, which is applied to the electronic device, and specifically includes the following steps:

step 101: the electronic equipment acquires a first image and a second image in the same target video file and acquires N first tracking frames of the first image, wherein the first image is a pre-set frame image of the second image, the first image and the second image both comprise N tracking targets, the N first tracking frames are used for framing the N tracking targets in the first image, and N is an integer greater than 1.

Wherein the acquiring N first tracking frames of the first image comprises: and acquiring second widths of the N first tracking frames, second heights of the N first tracking frames, second positions of the N first tracking frames and feature vectors of second central points of the N first tracking frames.

Wherein the first image and the second image have the same size, i.e. width and height. The first image and the second image are both images including N tracking targets, that is, the first image and the second image each display N tracking targets.

For example, 4 tracked objects are displayed in the first image, and the 4 tracked objects are 1, 2, 3, and 4, so that the tracked objects 1, 2, 3, and 4 are also displayed in the second image.

Wherein the previous preset frame image is, for example, a previous frame image, a previous two frame image, a previous 4 frame image, a previous 5 frame image, and the like.

And the target video file is a video file for follow-up shooting of the tracking target. The target video file is stored in the electronic device, or the target video file is stored in the cloud, and the like

Step 102: and the electronic equipment inputs the second image into the hourglass network model for feature extraction, and outputs a target feature map.

The target feature map comprises M feature points of N tracking targets, wherein M is a positive integer. The number of the characteristic points of each tracking target can be the same or different, and the characteristic points of each tracking target can be 8, 10, 13, 18 and the like, and the characteristic points are used for marking different positions of the tracking target. For example, assuming that the tracking target is a human being, the feature points may be joint points of the human being.

Step 103: the electronic device inputs the target feature map into a prediction network to output a thermodynamic map, a set of width and height values, and a set of feature vectors.

Step 104: the electronic device determines N second tracking boxes based on the thermodynamic diagram and the set of width and height values.

Step 105: the electronic device determines N first corresponding relationships based on the N first tracking frames, the N second tracking frames, and the feature vector set, where the N first corresponding relationships are used to characterize one-to-one correspondence relationships between the N first tracking frames and the N second tracking frames.

The first tracking frame and the second tracking frame are the same in shape, and the first tracking frame and the second tracking frame can be in other shapes such as a rectangle, a square, a diamond, a circle and the like.

Wherein the width of the first image is greater than the width of the N first tracking frames, and the height of the first image is greater than the height of the N first tracking frames; the width of the second image is greater than the width of the N second tracking frames, and the height of the second image is greater than the height of the N second tracking frames.

Wherein, there may be an overlapping portion between two adjacent first tracking frames of the N first tracking frames, and there may be an overlapping portion between two adjacent second tracking frames of the N second tracking frames.

The one-to-one correspondence relationship means that one second tracking frame exists in the N second tracking frames and the tracking target framed by one first tracking frame in the N first tracking frames is the same as the tracking target framed by one first tracking frame in the N first tracking frames.

For example, assume that there are 3 second tracking frames, such as second tracking frame 1, second tracking frame 2, and second tracking frame 3, and that there are 3 tracking targets, such as A, B and C. If the second tracking frame 1 selects a, the second tracking frame 2 selects B, and the second tracking frame 3 selects C, the number of the first tracking frames is 3, such as the first tracking frame 1, the first tracking frame 2, and the first tracking frame 3, if the first tracking frame 1 corresponds to the second tracking frame 1 one by one, the first tracking frame 2 corresponds to the second tracking frame 2 one by one, and the first tracking frame 3 corresponds to the second tracking frame 3 one by one, then the first tracking frame 1 selects a, the first tracking frame 2 selects B, and the first tracking frame 3 selects C.

The heights of the first tracking frame and the second tracking frame having the corresponding relationship may be the same or different, and are not limited herein. The widths of the first tracking frame and the second tracking frame having the corresponding relationship may be the same or different, and are not limited herein.

Step 106: the electronic equipment determines the tracking target framed by the N second tracking frames based on the N first corresponding relations.

In an implementation manner of the present application, after step 206, the method further includes: the electronic device displays the N second tracking frames on the second image.

In an implementation manner of the application, the hourglass network model is formed by sequentially arranging i hourglass networks, an input image of the ith hourglass network is an image obtained by synthesizing an input image and an output image of the (i-1) th hourglass network, and i is an integer greater than or equal to 2;

performing a first process each time passing through one of the hourglass networks, in which: the input image is downsampled through a plurality of first convolution blocks of an hourglass network, and a first feature map is output; the first characteristic diagram is up-sampled through a plurality of second convolution blocks of the hourglass network, and a second characteristic diagram is output; and superposing the second feature map and the input image, and outputting a third feature map.

The first convolution block is a first convolution neural network, the second convolution block is a second convolution neural network, and the first convolution neural network and the second convolution neural network have different functions.

The hourglass network model can be formed by sequentially arranging 2, 4, 5, 7 or other numbers of hourglass networks. The schematic structure of the hourglass network model is shown in fig. 1B. Under the condition that the hourglass network model consists of 2 hourglass networks, on one hand, the calculation accuracy can be ensured, and on the other hand, the calculation speed can be improved.

And the input image of the first hourglass network in the hourglass network model is the target image, and the characteristic diagram output by the last hourglass network in the hourglass network model is the target characteristic diagram.

As shown in fig. 1B, each hourglass network is a symmetric network, each hourglass network can perform downsampling and upsampling, the downsampling is performed before and after, and the number of times of upsampling and downsampling of each hourglass network is the same, such as 4 times, 6 times, 7 times, and other values. The down-sampling adopts the nearest neighbor interpolation technology for reducing the image resolution. The upsampling adopts a technique of maximum pooling or average pooling for improving the picture resolution.

In the embodiment of the present application, the hourglass network a is not the first hourglass network arranged in the hourglass network model, the input image of the hourglass network a subjected to the first downsampling is the image 1 (the image 1 is obtained by synthesizing the input image of the hourglass network b and the output image of the hourglass network b, in the hourglass network model, the hourglass network a is adjacent to the hourglass network b and is positioned behind the hourglass network b), the input image of the hourglass network a subjected to the next downsampling is the output image of the previous downsampling, the resolution of the output image of the hourglass network a subjected to the next downsampling is reduced by one time on the basis of the resolution of the input image subjected to the next downsampling, the input image of the hourglass network a subjected to the first upsampling is the output image of the hourglass network a subjected to the last downsampling, and the input image of the hourglass network a subjected to the next upsampling is the output image of the previous upsampling and the symmetrical downsampling And (4) superposition and combination are carried out, and the resolution of the output image which is subjected to up-sampling next time by the hourglass network a is doubled on the basis of the resolution of the input image which is subjected to up-sampling next time.

The input image which is firstly sampled by the first hourglass network in the hourglass network model is the target image, and the specific implementation mode of up-sampling and down-sampling the first hourglass network in the hourglass network model is the same as that of the hourglass network a, and the detailed description is given in detail in the above description, and is not described again here.

For example, assuming that the number of upsampling and downsampling of the hourglass network a is 4, the image 1 is 6 × 128, the 6 is the number of channels, 128 × 128 is the resolution of the image 1, the first downsampling is performed by using the proximity interpolation method, the image 2 with the resolution of 6 × 64 is output, the second downsampling is performed on the image 2, the image 3 with the resolution of 6 × 32 is output, the third downsampling is performed on the image 3, the image 4 with the resolution of 6 × 16 is output, the fourth downsampling is performed on the image 4, the image 5 with the resolution of 6 × 8 is output, after the 4 downsampling is completed, the averaging pooling is performed on the image 5, the first upsampling is performed, the image 6 with the resolution of 6 × 16 is output, the image 6 and the image 4 output by the third downsampling are merged as the input of the second upsampling, a second upsampling is performed to output image 7 with a resolution of 6 x 32, image 7 is merged with image 3 as input for a third upsampling, a third upsampling is performed to output image 8 with a resolution of 6 x 64, and finally image 8 is merged with image 2 as input for a fourth upsampling, and a fourth upsampling is performed to output image 9 with a resolution of 6 x 128.

It can be seen that in the embodiment of the application, each hourglass network is used for performing multiple downsampling and multiple upsampling, so that the features of different areas in the target image can be extracted, the spatial relationship among feature points in the target image can be reserved, and the probability of determining the tracked target image can be improved.

In an implementation of the present application, the prediction network includes thermodynamic diagram branches, wide high branches, and feature vector branches; the electronic device inputs the target feature map into a prediction network to output a thermodynamic map, a set of width and height values, and a set of feature vectors, including:

the electronic device inputting the target feature map into the thermodynamic diagram branch to output a thermodynamic diagram and inputting the target feature map into the broad-height branch to output a broad-height value set;

the electronic device inputs the thermodynamic diagram into the feature vector branch to output a set of feature vectors.

Wherein the inputting the target feature map into the wide-high branch to output a wide-high value set comprises: inputting the target feature map, the second widths of the N first tracking boxes, and the second heights of the N first tracking boxes to the wide-high branch to output a set of wide-high values.

Wherein the inputting the thermodynamic diagram into the feature vector branch to output a set of feature vectors comprises: inputting the thermodynamic diagram and feature vectors of second center points of the N first tracking frames into the feature vector branch to output a feature vector set.

Wherein the electronic device inputting the target feature map into the thermodynamic diagram branch and inputting the target feature map into the wide-high branch are performed in parallel.

And the thermodynamic diagram branch is obtained by training the electronic equipment on the third convolution block by adopting a first formula.

The first formula is:

wherein H is the height of the target feature map; w is the width of the target feature map; the P is_ijIs the probability that the feature point at position (i, j) is the target feature point; said y_ijFor the mark value of the feature point at the position (i, j) in the first image, when calculating the probability that the feature point at the position (i, j) is the target feature point, the mark value is used for representing the corresponding feature pointThe probability of the calculation error occurring at the point is higher as the flag value is larger, and the probability of the calculation error occurring is lower as the flag value is smaller, the flag value is set when the electronic device trains the third convolution block, and the α and the β are fixed values, and the α and the β may be different in different cases.

In the thermodynamic diagram shown in fig. 1C, a point in fig. 1C represents a center point, the left ordinate in fig. 1C represents probability, and the abscissa and the right ordinate in fig. 1C jointly represent the position of the center point.

And the wide and high branches are obtained by training the fourth convolution block by the electronic equipment by adopting a second formula.

The second formula is: l is₂＝|f(x)-Y|²

Wherein f (x) and Y are each width or height, L₂Either the square of the width difference or the square of the height difference.

The width and height value set includes a correspondence relationship between a width and a square of the width difference and a correspondence relationship between a height and a square of the height difference, which are specifically shown in table 1.

TABLE 1

Wherein the third convolution block is a third convolution neural network and the fourth convolution block is a fourth convolution neural network. The third convolutional neural network and the fourth convolutional neural network have different roles from each other.

The characteristic vector branches comprise a first branch, a second branch and a third branch, the first branch is obtained by training a fifth convolution block by the electronic equipment by adopting a third formula, the second branch is obtained by training a sixth convolution block by the electronic equipment by adopting a fourth formula, and the third branch is obtained by training a seventh convolution block by the electronic equipment by adopting a fifth formula.

The fifth convolution block is a fifth convolution neural network, the sixth convolution block is a sixth convolution neural network, and the seventh convolution block is a seventh convolution neural network. The roles of the fifth convolutional neural network, the sixth convolutional neural network, and the seventh convolutional neural network are different from each other.

The third formula:

wherein, the

Feature vector of the second center point of any one of the first tracking frames, the

The feature vector of the first center point of the second tracking frame corresponding to the arbitrary first tracking frame, e_kAnd the mean value of the feature vector of the second central point of any one first tracking frame and the feature vector of the first central point of the corresponding second tracking frame is obtained.

The fourth formula:

wherein, said e_kThe feature vector of the second center point of one of the N first tracking frames and the mean value of the feature vectors of the first center points of the second tracking frames corresponding to the one of the N first tracking frames are obtained; said e_jAnd the mean value of the feature vector of the second central point of another first tracking frame in the N first tracking frames and the feature vector of the first central point of the second tracking frame corresponding to the another first tracking frame is obtained. And the delta is 1.

The fifth formula:

d₁₂＝||x₁-x₂||

said x₁Is the feature vector of the first center point, x₂Is the feature vector of the second center point.

The feature vector set includes feature vectors of first center points of the N second tracking frames, which are specifically shown in table 2.

TABLE 2

First center point	Feature vector
		(a1,b1)	c1
(a2,b2)	3c2
		(a3,b3)	1.5c3
……	……

The eigenvectors corresponding to the second centroids (a1, b1) are c1, the eigenvectors corresponding to the second centroids (a2, b2) are 3c2, the eigenvectors corresponding to the second centroids (a3, b3) are 1.5c3, c1, c2 and c3 are all basic solution systems, and may be the same or different.

It can be seen that, in the embodiment of the present application, since the target feature map is input into the two branches and executed in parallel, the time required by the convolution operation is reduced, and further the calculation efficiency is improved.

In an implementation of the present application, the determining N second tracking frames based on the thermodynamic diagram and the set of width and height values

The electronic device determines a first position of a first center point of the N second tracking frames based on the thermodynamic diagram;

the electronic device determines a first height of the N second tracking frames and a first width of the N second tracking frames based on the set of width-to-height values.

The first heights of any two first tracking frames in the N first tracking frames may be equal or unequal, the first widths of any two first tracking frames in the N first tracking frames may be equal or unequal, and the first center points of any two first tracking frames in the N first tracking frames are at different positions.

Specifically, through the thermodynamic diagram, the probability that each feature point of the M feature points is the first center point can be obtained, and then the first N feature points with higher probability in the M feature points are used as the first center points, so that the first positions of the N first center points can be obtained. For example, as shown in fig. 1C, the feature point 1, the feature point 2, and the feature point 3 are 3 feature points having a high probability of corresponding to all the feature points shown in fig. 1C, and when the thermodynamic diagram is shown in fig. 1C, the first center points are the feature point 1, the feature point 2, and the feature point 3.

Specifically, the first height is known, the square of the height difference corresponding to the first height can be obtained through table 1, and then the second height can be calculated based on the second formula. For example, assuming that the first height is C and the square of the height difference corresponding to the first height is C, the second height is

The first width is known, and the square of the width difference corresponding to the first width can be obtained through table 1, and then the second width can be calculated based on the second formula. For example, assuming that the first width is D and the square of the width difference corresponding to the first width is D, the second height is

In an implementation manner of the present application, the determining, by the electronic device, N first corresponding relationships based on the N first tracking frames, the N second tracking frames, and the feature vector set includes:

determining feature vectors of the N first central points according to the feature vector set and first positions of the N first central points;

determining N offset sets according to the feature vectors of the N first central points and the feature vectors of second central points of the N first tracking frames, and determining N matching degree sets according to the N first tracking frames and the N second tracking frames, wherein the N offset sets correspond to the feature vectors of the N first central points one by one, each offset set comprises N offsets, the N offsets are the feature vectors of the corresponding first central points, the positions of the feature vectors of the second central points relative to the feature vector sets are offset, the N matching degree sets correspond to the N first tracking frames one by one, each matching degree set comprises N matching degrees, and the N matching degrees are the matching degrees of the corresponding first tracking frames and the corresponding second tracking frames;

determining N second corresponding relations according to the N offset sets and the N matching degree sets, wherein the N second corresponding relations are used for representing one-to-one corresponding relations between the N second central points and the N first central points;

and determining N first corresponding relations according to the N second corresponding relations.

Wherein, the N offset sets are calculated by adopting a sixth formula. The sixth formula:

wherein d is_aA feature vector representing a second center point of the first tracking box a, d_bA feature vector representing a first center point of a second tracking frame b, said

Covariance matrix representing the first tracking box a, d⁽¹⁾(a, b) represents a position offset of the feature vector of the first center point of the first tracking frame a relative to the feature vector of the second center point of the second tracking frame b at the feature vector set.

And calculating N matching degree sets by adopting a seventh formula.

The seventh formula:

wherein, L is_kIs 100, the

Feature vector of the second center point in the k frame for the first tracking frame a

Transpose of feature vectors representing a second center point of a second tracking frame b, R_aIs the set of feature vectors of the second center point of the first tracking frame a in the last 100 frames, d⁽²⁾And (a, b) represents the matching degree of the first tracking frame a and the second tracking frame b in appearance.

And finally, carrying out weighted calculation on any one offset set in the N offset sets and any one in the N matching degree sets by adopting an eighth formula.

The eighth formula:

C_a,b＝λd⁽¹⁾(a,b)+(1-λ)d⁽²⁾(a,b)

wherein λ is a fixed value, and the value of λ may be different in different cases; said C is_a,bIs a weighted sum.

If the shortest distance between the second central point of the first tracking frame o and the first central point of the second tracking frame p is determined through the N offset sets, and the weighted sum of the first tracking frame o and the second tracking frame p is greater than a first numerical value, the first tracking frame o and the second tracking frame p have a corresponding relationship, the first tracking frame o is one of the N first tracking frames, and the second tracking frame p is one of the N second tracking frames.

Optionally, the one-to-one correspondence between the N first central points and the N second central points may be determined by the feature vectors of the first central points and the feature vectors of the second central points. For example, a1 and a2 are first center points, B1 and B2 are second center points, and the relationship between a1 and B1 and B2 and the relationship between a2 and B1 and B2 cannot be determined, so that two cases may occur: a1 corresponds to B1, A2 corresponds to B2; a1 corresponds to B2 and A2 corresponds to B1. Assuming that A1 corresponds to B1 and a2 corresponds to B2, the distance between A1 and B1 is first drawn using the third formula, then the distance between A1 and B2 is drawn using the fourth formula, and finally the distance between A1 and B1, A1B1, is calculated using the fifth formula. Assuming that A1 corresponds to B2 and a2 corresponds to B1, the distance between A1 and B2 is first drawn using the third formula, then the distance between A1 and B1 is drawn using the fourth formula, and finally the distance between A1 and B2, A1B2, is calculated using the fifth formula. Finally comparing the distance between A1B1 and A1B2, if A1B1> A1B2, then A1 corresponds to B1; if A1B1< A1B2, then A1 corresponds to B2.

It can be seen that, in the embodiment of the present application, in the process of determining whether a corresponding relationship exists between a certain first tracking frame and a certain second tracking frame, when a distance between a first central point of the certain first tracking frame and a second central point of the certain second tracking frame is closest, a position offset of a feature vector of the second central point of the certain first tracking frame in a feature vector branch with respect to a feature vector of the first central point of the certain second tracking frame is minimum, and a weighted sum of matching degrees of the certain first tracking frame and the certain second tracking frame is greater than a first value, it is determined that the corresponding relationship exists between the certain first tracking frame and the certain second tracking frame, accuracy of determining the corresponding relationship of the tracking frames is improved, and accuracy of determining the tracking target is further improved.

It should be noted that fig. 1B and fig. 1C provided in the embodiments of the present application are only for example and do not constitute a limitation to the embodiments of the present application.

Referring to fig. 2A, please refer to fig. 2A in accordance with the embodiment shown in fig. 1A, where fig. 2A is a schematic flowchart of another method for determining a tracking target according to an embodiment of the present application, and the method is applied to the electronic device, and specifically includes the following steps:

step 201: the electronic equipment acquires a first image and a second image in the same target video file and acquires N first tracking frames of the first image, wherein the first image is a pre-set frame image of the second image, the first image and the second image both comprise N tracking targets, the N first tracking frames are used for framing the N tracking targets in the first image, and N is an integer greater than 1.

Step 202: and the electronic equipment inputs the second image into the hourglass network model for feature extraction, and outputs a target feature map.

Step 203: an electronic device inputs the target feature map into the thermodynamic diagram branch to output a thermodynamic diagram, and inputs the target feature map into the broad height branch to output a broad height value set.

Step 204: the electronic device inputs the thermodynamic diagram into the feature vector branch to output a set of feature vectors.

Step 205: the electronic device determines a first location of a first center point of the N second tracking frames based on the thermodynamic diagram.

Step 206: the electronic device determines a first height of the N second tracking frames and a first width of the N second tracking frames based on the set of width-to-height values.

Step 207: the electronic equipment determines the feature vectors of the N first central points according to the feature vector set and the first positions of the N first central points.

Step 208: the electronic device determines N sets of offsets according to the feature vectors of the N first center points and the feature vectors of the second center points of the N first tracking frames, and determining N sets of matching degrees according to the N first tracking frames and the N second tracking frames, the N offset sets correspond to the feature vectors of the N first centerpoints one by one, each of the offset sets comprises N offsets, the N offsets are the feature vectors of the first central point corresponding to the N offsets, and are offset from the feature vector set of any feature vector of the second central point, the N matching degree sets correspond to the N first tracking frames one by one, each matching degree set comprises N matching degrees, the N matching degrees are the matching degrees of the corresponding first tracking frame and any second tracking frame.

Step 209: and the electronic equipment determines N second corresponding relations according to the N offset sets and the N matching degree sets, wherein the N second corresponding relations are used for representing the one-to-one corresponding relations between the N second central points and the N first central points. Step 210: and the electronic equipment determines N first corresponding relations according to the N second corresponding relations.

Step 211: the electronic equipment determines the tracking target framed by the N second tracking frames based on the N first corresponding relations.

As shown in fig. 2B, for example, a first image including a tracking target S and a tracking target D is input to the hourglass network model, after the target characteristic diagram is input through the hourglass network model, the target characteristic diagram is input into thermodynamic diagram branches and wide and high branches of a prediction module respectively, after passing through the 2 branches, outputting a thermodynamic diagram and a width and height value set respectively, then inputting the thermodynamic diagram into a characteristic vector branch of a prediction module, after the branch is passed, a feature vector set is output, then the one-to-one corresponding relations between the N second tracking frames and the N first tracking frames are determined by combining the N first tracking frames, the thermodynamic diagram and the width and height value set, finally, based on the one-to-one corresponding relations between the N second tracking frames and the N first tracking frames, the specific tracked target framed by the N second tracked money can be known, and the purpose of determining the tracked target is further achieved.

It should be noted that, for the specific implementation process of the present embodiment, reference may be made to the specific implementation process described in the above method embodiment, and a description thereof is omitted here.

Consistent with the embodiments shown in fig. 1A and fig. 2A, please refer to fig. 3, and fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in the drawing, the electronic device includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for performing the following steps:

determining N second tracking frames based on the thermodynamic diagram and the width and height value set, wherein the N second tracking frames are used for framing N tracking targets in the second image;

In an implementation of the present application, the prediction network includes thermodynamic diagram branches, wide high branches, and feature vector branches; in inputting the target feature map into a prediction network to output a thermodynamic map, a set of width and height values, and a set of feature vectors, the program includes instructions for performing the steps of:

inputting the target feature map into the thermodynamic diagram branch to output a thermodynamic diagram, and inputting the target feature map into the broad height branch to output a broad height value set;

inputting the thermodynamic diagram into the feature vector branches to output a set of feature vectors.

In one implementation of the present application, in determining N second tracking boxes based on the thermodynamic diagram and the set of width and height values, the program includes instructions for performing the following steps:

determining a first location of a first center point of the N second tracking frames based on the thermodynamic diagram;

determining a first height of the N second tracking frames and a first width of the N second tracking frames based on the set of width-to-height values.

In an implementation manner of the present application, in determining N first correspondences based on the N first tracking frames, the N second tracking frames, and the feature vector set, the program includes instructions for executing the following steps:

Referring to fig. 4, fig. 4 is a tracking target determining apparatus applied to the electronic device according to an embodiment of the present application, and the apparatus includes:

an information obtaining unit 401, configured to obtain a first image and a second image in a same target video file, and obtain N first tracking frames of the first image, where the first image is a previous preset frame image of the second image, the first image and the second image both include N tracking targets, the N first tracking frames are used to frame the N tracking targets in the first image, and N is an integer greater than 1;

a feature extraction unit 402, configured to input the second image into an hourglass network model for feature extraction, and output a target feature map;

a data determining unit 403, configured to input the target feature map into a prediction network to output a thermodynamic diagram, a set of width and height values, and a set of feature vectors;

a tracking frame determination unit 404, configured to determine N second tracking frames based on the thermodynamic diagram and the set of width and height values;

a correspondence determining unit 405, configured to determine N first correspondences based on the N first tracking frames, the N second tracking frames, and the feature vector set, where the N first correspondences are used to represent one-to-one correspondences between the N first tracking frames and the N second tracking frames;

a tracking target determining unit 406, configured to determine the tracking target framed by the N second tracking frames based on the N first correspondences.

In one implementation of the present application,

the hourglass network model is formed by sequentially arranging i hourglass networks, an input image of the ith hourglass network is an image obtained by synthesizing an input image and an output image of the (i-1) th hourglass network, and i is an integer greater than or equal to 2;

In one implementation of the present application, a prediction network includes thermodynamic diagram branches, broad-high branches, and feature vector branches; the target feature map is input to a prediction network to output aspects of a thermodynamic diagram, a set of width and height values, and a set of feature vectors, and the data determination unit 403 is specifically configured to:

In an implementation manner of the present application, the determining N second tracking frames based on the thermodynamic diagram and the set of width and height values, and the tracking frame determining unit 404 is further configured to:

In an implementation manner of the present application, the determining unit 405 is further configured to determine N first correspondences based on the N first tracking frames, the N second tracking frames, and the feature vector set, and:

It should be noted that the information acquisition unit 401, the feature extraction unit 402, the data determination unit 403, the tracking frame determination unit 404, the correspondence determination unit 405, and the tracking target determination unit 406 may be implemented by processors.

The present application also provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program makes a computer perform some or all of the steps described in the electronic device in the above method embodiments.

Embodiments of the present application also provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to perform some or all of the steps described in the electronic device in the above method. The computer program product may be a software installation package.

The steps of a method or algorithm described in the embodiments of the present application may be implemented in hardware, or may be implemented by a processor executing software instructions. The software instructions may be comprised of corresponding software modules that may be stored in Random Access Memory (RAM), flash Memory, Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a compact disc Read only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in an access network device, a target network device, or a core network device. Of course, the processor and the storage medium may reside as discrete components in an access network device, a target network device, or a core network device.

Those skilled in the art will appreciate that in one or more of the examples described above, the functionality described in the embodiments of the present application may be implemented, in whole or in part, by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the embodiments of the present application in further detail, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present application, and are not intended to limit the scope of the embodiments of the present application, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the embodiments of the present application should be included in the scope of the embodiments of the present application.

Claims

1. A tracking target determination method is applied to an electronic device, and comprises the following steps:

2. The method according to claim 1, wherein the hourglass network model is formed by sequentially arranging i hourglass networks, wherein the input image of the ith hourglass network is an image obtained by synthesizing the input image and the output image of the (i-1) th hourglass network, and i is an integer greater than or equal to 2;

3. The method of claim 1 or 2, wherein the prediction network comprises thermodynamic, wide high, and feature vector branches; the inputting the target feature map into a prediction network to output a thermodynamic map, a set of width and height values, and a set of feature vectors, comprising:

4. The method of claim 3, wherein determining N second tracking boxes based on the thermodynamic diagram and the set of width and height values comprises:

5. The method of claim 4, wherein the determining N first correspondences based on the N first tracking boxes, the N second tracking boxes, and the set of feature vectors comprises:

6. A tracking target determination device applied to an electronic device, the device comprising:

7. The device according to claim 6, wherein the hourglass network model is formed by sequentially arranging i hourglass networks, the input image of the ith hourglass network is the synthesized image of the input image and the output image of the ith-1 hourglass network, and i is an integer greater than or equal to 2;

8. The apparatus of claim 6 or 7, wherein the prediction network comprises a thermodynamic branch, a wide high branch, and a feature vector branch; in inputting the target feature map into a prediction network to output a thermodynamic map, a set of width and height values, and a set of feature vectors, the data determination unit is specifically configured to:

9. An electronic device, comprising a processor, memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-5.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, wherein the computer program is processed to perform the method according to any one of claims 1-5.