WO2021057309A1 - Tracked target determination method and related device - Google Patents

Tracked target determination method and related device Download PDF

Info

Publication number
WO2021057309A1
WO2021057309A1 PCT/CN2020/108990 CN2020108990W WO2021057309A1 WO 2021057309 A1 WO2021057309 A1 WO 2021057309A1 CN 2020108990 W CN2020108990 W CN 2020108990W WO 2021057309 A1 WO2021057309 A1 WO 2021057309A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
tracking
feature
output
tracking frames
Prior art date
Application number
PCT/CN2020/108990
Other languages
French (fr)
Chinese (zh)
Inventor
丁旭
胡文泽
Original Assignee
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术股份有限公司 filed Critical 深圳云天励飞技术股份有限公司
Publication of WO2021057309A1 publication Critical patent/WO2021057309A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Definitions

  • This application relates to the field of electronic technology, and in particular to a tracking target determination method and related equipment.
  • Target tracking is one of the key technologies in the field of image processing and video processing. Target tracking is used to identify tracking targets in videos or images, and is widely used in related fields such as smart transportation, human-computer interaction, and national defense investigation.
  • the determination of the tracking target is one of the essential key steps to achieve target tracking. At present, the determination of the tracking target is mainly based on the deep sort (deep sort) algorithm, and the use of deep sort only uses the predicted position information for matching, and the prediction accuracy low.
  • deep sort deep sort
  • the embodiment of the present application provides a tracking target determination method and related equipment, which are used to improve the accuracy of determining the tracking target.
  • an embodiment of the present application provides a tracking target determination method, which is applied to an electronic device, and the method includes:
  • N first correspondences are determined, and the N first correspondences are used to characterize the N first correspondences.
  • the tracking target selected by the N second tracking frames is determined based on the N first correspondences.
  • an embodiment of the present application provides a tracking target determination device, which is applied to an electronic device, and the device includes:
  • the information acquiring unit is configured to acquire a first image and a second image in the same target video file, and acquire N first tracking frames of the first image, wherein the first image is the image of the second image
  • the first preset frame image, the first image and the second image both include N tracking targets, and the N first tracking frames are used to frame and select the N tracking targets in the first image,
  • the N is an integer greater than 1;
  • the feature extraction unit is configured to input the second image into the hourglass network model for feature extraction, and output a target feature map;
  • the data determining unit is used to input the target feature map to the prediction network to output the heat map, the width and height value set, and the feature vector set;
  • a tracking frame determining unit configured to determine N second tracking frames based on the heat map and the width and height value set;
  • the correspondence relationship determining unit is configured to determine N first correspondence relationships based on the N first tracking frames, the N second tracking frames, and the feature vector set, where the N first correspondence relationships are used for Characterize the one-to-one correspondence between the N first tracking frames and the N second tracking frames;
  • the tracking target determining unit is configured to determine the tracking target selected by the N second tracking frames based on the N first correspondences.
  • an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and are configured to be processed by the above
  • the above program includes instructions for executing the steps in the method described in the first aspect of the embodiments of the present application.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the above-mentioned computer-readable storage medium stores a computer program for electronic data exchange, wherein the above-mentioned computer program enables a computer to execute Part or all of the steps described in the method described in one aspect.
  • the embodiments of the present application provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute as implemented in this application. Examples include part or all of the steps described in the method described in the first aspect.
  • the computer program product may be a software installation package.
  • the first image and the second image are first obtained from the same target video file, the first image is the previous preset frame image of the second image, and then the second image is input into the hourglass network model Obtain the target feature map, and then input the target feature map into the prediction network to obtain the heat map, the width and height value set and the feature vector set, and then determine the second tracking frame according to the heat map and the width and height value set, and the second tracking frame is used for Frame the N tracking targets in the second image, and finally determine the tracking target according to the first tracking frame, the second tracking frame, and the feature vector set.
  • the first tracking frame is used to frame all the tracking targets in the first image.
  • the N tracking targets are described.
  • this application jointly determines the tracking target based on a certain image, the previous preset frame image of the certain image, and the tracking frame associated with the previous preset frame image, and realizes the tracking that changes with the position of the tracking target, and then Improve the accuracy of determining the tracking target.
  • FIG. 1A is a schematic flowchart of a method for determining a tracking target provided by an embodiment of the present application
  • FIG. 1B is a schematic structural diagram of an hourglass network model provided by an embodiment of the present application.
  • Fig. 1C is a schematic diagram of a thermal map provided by an embodiment of the present application.
  • FIG. 2A is a schematic flowchart of another tracking target determination method provided by an embodiment of the present application.
  • FIG. 2B is a schematic diagram of another tracking target determination method provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an apparatus for determining a tracking target according to an embodiment of the present application.
  • Electronic devices can include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (UE), mobile stations ( Mobile Station, MS), terminal device (terminal device) and so on.
  • UE user equipment
  • MS Mobile Station
  • terminal device terminal device
  • FIG. 1A shows a tracking target determination method provided by an embodiment of the present application, which is applied to the above-mentioned electronic device and specifically includes the following steps:
  • Step 101 The electronic device obtains a first image and a second image in the same target video file, and obtains N first tracking frames of the first image, where the first image is the front of the second image A preset frame image, the first image and the second image each include N tracking targets, the N first tracking frames are used to frame the N tracking targets in the first image, so Said N is an integer greater than 1.
  • said obtaining the N first tracking frames of the first image includes: obtaining the second widths of the N first tracking frames, the second heights of the N first tracking frames, and the N Feature vectors of the second positions of the N first tracking frames and the second center points of the N first tracking frames.
  • the size of the first image and the second image that is, the width and the height are the same.
  • the first image and the second image are both images including N tracking targets, that is, both the first image and the second image display N tracking targets.
  • the tracking targets 1, 2, 3, and 4 are also displayed in the second image.
  • the previous preset frame image is, for example, the previous frame image, the previous two frame images, the previous 4 frame images, the previous 5 frame images, and so on.
  • the target video file is a video file followed by the tracking target.
  • the target video file is stored in an electronic device, or the target video file is stored in the cloud, etc.
  • Step 102 The electronic device inputs the second image into the hourglass network model for feature extraction, and outputs a target feature map.
  • the target feature map includes M feature points of N tracking targets, and the M is a positive integer.
  • the number of feature points of each tracking target can be the same or different.
  • the feature points of each tracking target can be 8, 10, 13, 18 and other values.
  • the feature points are used to mark the tracking target. different positions. For example, assuming that the tracking target is a person, the feature points may be the joint points of the person.
  • Step 103 The electronic device inputs the target feature map to the prediction network to output the heat map, the width and height value set, and the feature vector set.
  • Step 104 The electronic device determines N second tracking frames based on the heat map and the width and height value set.
  • Step 105 The electronic device determines N first correspondences based on the N first tracking frames, the N second tracking frames, and the feature vector set, and the N first correspondences are used to characterize all The one-to-one correspondence between the N first tracking frames and the N second tracking frames.
  • the shape of the first tracking frame and the second tracking frame are the same, and the shapes of the first tracking frame and the second tracking frame may be rectangles, squares, diamonds, circles and other shapes.
  • the width of the first image is greater than the width of the N first tracking frames
  • the height of the first image is greater than the height of the N first tracking frames
  • the width of the second image is greater than the width of the The width of the N second tracking frames
  • the height of the second image is greater than the height of the N second tracking frames.
  • two adjacent first tracking frames in the N first tracking frames may have overlapping parts
  • two adjacent second tracking frames in the N second tracking frames may have overlapping parts
  • the one-to-one correspondence means that there is a second tracking frame in the N second tracking frames that is the same as the tracking target selected by a first tracking frame in the N first tracking frames.
  • second tracking frames such as second tracking frame 1, second tracking frame 2, and second tracking frame 3
  • tracking targets such as A, B, and C
  • the second tracking box 1 selects A
  • the second tracking box 2 selects B
  • the second tracking box 3 selects C
  • there are 3 first tracking boxes such as the first tracking box 1, the first tracking box 2, and the first tracking box.
  • One tracking frame 3 if the first tracking frame 1 corresponds to the second tracking frame 1, the first tracking frame 2 corresponds to the second tracking frame 2, and the first tracking frame 3 corresponds to the second tracking frame 3 one to one.
  • the first tracking box 1 selects A
  • the first tracking box 2 selects B
  • the first tracking box 3 selects C.
  • the heights of the first tracking frame and the second tracking frame that have a corresponding relationship may be the same or different, which is not limited here.
  • the width of the first tracking frame and the second tracking frame that have a corresponding relationship may be the same or different, which is not limited here.
  • Step 106 The electronic device determines the tracking target selected by the N second tracking frames based on the N first correspondences.
  • the method further includes: the electronic device displays the N second tracking frames on the second image.
  • the first image and the second image are first obtained from the same target video file, the first image is the previous preset frame image of the second image, and then the second image is input into the hourglass network model Obtain the target feature map, and then input the target feature map into the prediction network to obtain the heat map, the width and height value set and the feature vector set, and then determine the second tracking frame according to the heat map and the width and height value set, and the second tracking frame is used for Frame the N tracking targets in the second image, and finally determine the tracking target according to the first tracking frame, the second tracking frame, and the feature vector set.
  • the first tracking frame is used to frame all the tracking targets in the first image.
  • the N tracking targets are described.
  • this application jointly determines the tracking target based on a certain image, the previous preset frame image of the certain image, and the tracking frame associated with the previous preset frame image, and realizes the tracking that changes with the position of the tracking target, and then Improve the accuracy of determining the tracking target.
  • the hourglass network model is composed of i hourglass networks arranged in sequence, and the input image of the i-th hourglass network is an image obtained by synthesizing the input image and the output image of the i-1th hourglass network,
  • the i is an integer greater than or equal to 2;
  • the first processing is performed.
  • the input image is down-sampled through multiple first convolution blocks of the hourglass network, and the first feature map is output;
  • the feature map is up-sampled through a plurality of second convolution blocks of the hourglass network to output a second feature map; the second feature map is superimposed with the input image to output a third feature map.
  • the first convolutional block is a first convolutional neural network
  • the second convolutional block is a second convolutional neural network
  • the difference between the first convolutional neural network and the second convolutional neural network The role is different.
  • the hourglass network model can be composed of 2, 4, 5, 7 or other numbers of hourglass networks arranged in sequence.
  • the structure diagram of the hourglass network model is shown in Figure 1B.
  • the hourglass network model is composed of two hourglass networks, on the one hand, the accuracy of the calculation can be ensured, and on the other hand, the calculation speed can be improved.
  • the input image of the first hourglass network in the hourglass network model is the target image
  • the feature map output by the last hourglass network in the hourglass network model is the target feature map
  • each hourglass network is a symmetric network, and each hourglass network can perform down-sampling and up-sampling.
  • the number of sampling is the same, such as 4 times, 6 times, 7 times and other values.
  • the technique used for downsampling is nearest neighbor interpolation, which is used to reduce image resolution.
  • the technique used for upsampling is maximum pooling or average pooling, which is used to improve the image resolution.
  • the hourglass network a is not the first hourglass network arranged in the hourglass network model.
  • the input image of the hourglass network a that is downsampled for the first time is image 1 (image 1 is the input image of the hourglass network b and The output image of the hourglass network b is synthesized.
  • the hourglass network a is adjacent to the hourglass network b and is behind the hourglass network b).
  • the input image of the hourglass network a that is downsampled next time is the previous downsample
  • the output image of the hourglass network a will be downsampled the next time the resolution of the output image is reduced by twice the resolution of the input image that will be downsampled next time, and the input image of the hourglass network a that will be upsampled for the first time It is the output image of the hourglass network a that was down-sampled for the last time.
  • the input image of the hourglass network a that is up-sampled next time is the superposition and merging of the output image of the previous up-sampling and the output image of the symmetric down-sampling.
  • the hourglass network a is the next time
  • the resolution of the output image to be upsampled is doubled on the basis of the resolution of the input image to be upsampled next time.
  • the input image of the first hourglass network in the hourglass network model for downsampling for the first time is the target image, and the specific implementation method of the first hourglass network in the hourglass network model for upsampling and downsampling is the same as that of the hourglass network a.
  • the specific implementation method of the first hourglass network in the hourglass network model for upsampling and downsampling is the same as that of the hourglass network a.
  • image 1 is 6*128*128, where 6 is the number of channels, 128*128 is the resolution of image 1, and adjacent interpolation is used
  • image 2 After performing the first down-sampling, it outputs image 2 with a resolution of 6*64*64.
  • image 3 After performing the second down-sampling on image 2, it outputs image 3 with a resolution of 6*32*32. Perform the third on image 3.
  • the output resolution is 6*16*16 image 4
  • the output resolution is 6*8*8 image 5
  • the output resolution is 6*16*16 image 6.
  • multiple down-sampling and multiple up-sampling are performed through each hourglass network, so that the features of different regions in the target image can be extracted, and the feature points in the target image can be preserved.
  • the spatial relationship can improve the probability of determining the tracking target image.
  • the prediction network includes a heat map branch, a wide-height branch, and a feature vector branch; the electronic device inputs the target feature map to the prediction network to output a heat map and a wide-height value set.
  • feature vector set including:
  • the electronic device inputs the target feature map to the heat map branch to output a heat map, and inputs the target feature map to the width and height branch to output a width and height value set;
  • the electronic device inputs the heat map into the feature vector branch to output a feature vector set.
  • the inputting the target feature map to the width and height branch to output the width and height value set includes: adding the target feature map, the second width of the N first tracking frames, and the N The second height of the first tracking frame is input to the width and height branch to output the width and height value set.
  • the inputting the heat map into the feature vector branch to output a feature vector set includes: inputting the heat map and the feature vectors of the second center points of the N first tracking frames into the feature Vector branches to output feature vector sets.
  • the electronic device inputs the target feature map into the heat map branch, and inputs the target feature map into the wide-height branch is executed in parallel.
  • the heat map branch is obtained by the electronic device using the first formula to train the third convolution block.
  • the first formula is:
  • the label value of the feature point at the position (i, j) in the first image when calculating the probability that the feature point at the position (i, j) is the target feature point, the label value is used to indicate its corresponding feature point
  • the possibility of calculation error occurs. The larger the mark value, the greater the possibility of calculation error. The smaller the mark value, the lower the possibility of calculation error.
  • the mark value means that the electronic device is in the third volume.
  • the ⁇ and ⁇ are fixed values that are set when the product block is trained. Under different circumstances, the values of ⁇ and ⁇ may be different.
  • the heat map is shown in FIG. 1C, the point in FIG. 1C represents the center point, the ordinate on the left in FIG. 1C represents the probability, and the abscissa and the ordinate on the right in FIG. 1C jointly represent the location of the center point.
  • the width and height branches are obtained by the electronic device using the second formula to train the fourth convolution block.
  • the f(x) and Y are both width or height, and L 2 is the square of the width difference or the square of the height difference.
  • the width and height value set includes the corresponding relationship between the width and the square of the width difference and the corresponding relationship between the height and the square of the height difference, as shown in Table 1.
  • the third convolution block is a third convolutional neural network
  • the fourth convolution block is a fourth convolutional neural network.
  • the functions of the third convolutional neural network and the fourth convolutional neural network are different from each other.
  • the feature vector branch includes a first branch, a second branch, and a third branch.
  • the first branch is obtained by the electronic device using the third formula to train the fifth convolution block
  • the second branch is the electronic device using the fourth formula It is obtained by training the sixth convolution block
  • the third branch is obtained by training the seventh convolution block by the electronic device using the fifth formula.
  • the fifth convolutional block is a fifth convolutional neural network
  • the sixth convolutional block is a sixth convolutional neural network
  • the seventh convolutional block is a seventh convolutional neural network.
  • the functions of the fifth convolutional neural network, the sixth convolutional neural network, and the seventh convolutional neural network are different from each other.
  • the Is the feature vector of the second center point of any first tracking frame the Is the feature vector of the first center point of the second tracking frame corresponding to the any one of the first tracking frames
  • the e k is the feature vector of the second center point of the any one of the first tracking frames and its corresponding second The mean value of the feature vector of the first center point of the tracking frame.
  • the e k is the feature vector of the second center point of one of the N first tracking frames, and the first center of the second tracking frame corresponding to the one of the first tracking frames
  • the e j is the second tracking frame corresponding to the second center point of the other first tracking frame in the N first tracking frames
  • the ⁇ 1.
  • the x 1 is the feature vector of the first center point
  • the x 2 is the feature vector of the second center point.
  • the feature vector set includes the feature vectors of the first center points of the N second tracking frames, as shown in Table 2.
  • the eigenvector corresponding to the second center point (a1, b1) is c1
  • the eigenvector corresponding to the second center point (a2, b2) is 3c2
  • the eigenvector corresponding to the second center point (a3, b3) is 1.5c3
  • C1, c2 and c3 are all basic solution systems, which can be the same or different.
  • the N second tracking frames are determined based on the heat map and the width and height value set.
  • the electronic device determines the first position of the first center point of the N second tracking frames based on the heat map
  • the electronic device determines the first height of the N second tracking frames and the first width of the N second tracking frames based on the width and height value set.
  • the first heights of any two first tracking frames of the N first tracking frames may be equal or unequal, and the first widths of any two first tracking frames of the N first tracking frames It may be equal or unequal, and the positions of the first center points of any two first tracking frames of the N first tracking frames are different.
  • the probability that each of the M feature points is the first center point can be obtained, and then the first N feature points with the highest probability among the M feature points are taken as the first center point.
  • a center point, and then the first positions of N first center points can be obtained.
  • feature point 1, feature point 2, and feature point 3 are the three corresponding feature points with higher probability among all the feature points shown in Figure 1C.
  • the first center points are feature point 1, feature point 2, and feature point 3.
  • the first height is known, and the square of the height difference corresponding to the first height can be obtained from Table 1, and then the second height can be calculated based on the second formula. For example, suppose the first height is C, and the square of the height difference corresponding to the first height is c, then
  • the first width is known, and the square of the width difference corresponding to the first width can be obtained from Table 1, and then the second width can be calculated based on the second formula. For example, suppose the first width is D, and the square of the width difference corresponding to the first width is d, then
  • the electronic device determines N first correspondences based on the N first tracking frames, the N second tracking frames, and the feature vector set, including:
  • N second tracking frames determine N matching degree sets, the N offset sets are in one-to-one correspondence with the feature vectors of the N first center points, and each offset set includes N Offsets, the N offsets are the corresponding feature vectors of the first center point, and the position offsets in the feature vector set relative to the feature vector of any of the second center points,
  • the N matching degree sets have a one-to-one correspondence with the N first tracking frames, each of the matching degree sets includes N matching degrees, and the N matching degrees are the corresponding first tracking frames.
  • N second correspondences are determined, and the N second correspondences are used to characterize the N second center points and the N th One-to-one correspondence of a central point;
  • the sixth formula is used to calculate N offset sets.
  • d a represents the feature vector of the first and second center point of a tracking frame
  • D b represents the center point of the first feature vector b of a second tracking frame
  • the Re represents the covariance matrix of the first tracking frame a
  • the d (1) (a, b) represents the feature vector of the first center point of the first tracking frame a relative to the second center point of the second tracking frame b
  • the vector is offset in the position of the feature vector set.
  • the seventh formula is used to calculate N matching degree sets.
  • said R a is set in the most recent feature vector 100 in a second center point of a first track frame
  • the d (2) (a, b) represents the degree of match between the first tracking frame a and the second tracking frame b in appearance.
  • the eighth formula is used to perform a weighted calculation on any offset set in the N offset sets and any one in the N matching degree sets.
  • the ⁇ is a fixed value, and under different circumstances, the value of the ⁇ may be different; the C a and b are weighted sums.
  • the N offset sets that the distance between the second center point of the first tracking frame o and the first center point of the second tracking frame p is the shortest, and the first tracking frame o and the second tracking frame p If the weighted sum is greater than the first value, the first tracking frame o and the second tracking frame p have a corresponding relationship.
  • the first tracking frame o is one of the N first tracking frames
  • the second tracking frame p is the Nth tracking frame. One of the two tracking boxes.
  • the one-to-one correspondence between the N first center points and the N second center points can be determined.
  • A1 and A2 are the first center points
  • B1 and B2 are the second center points. Since the relationship between A1 and B1 and B2 cannot be judged, and the relationship between A2 and B1 and B2, there may be two situations: A1 and B1 Corresponding, A2 corresponds to B2; A1 corresponds to B2, and A2 corresponds to B1.
  • A1 corresponds to B1 and A2 corresponds to B2
  • first use the third formula to narrow the distance between A1 and B1
  • the fourth formula to widen the distance between A1 and B2
  • the fifth formula to calculate the distance
  • the distance between A1 and B1 is A1B1.
  • A1B1 if A1B1>A1B2, then A1 and B1 correspond; if A1B1 ⁇ A1B2, then A1 and B2 correspond.
  • the position offset in the feature vector branch is the smallest, and the weighted sum of the matching degree of the certain first tracking frame and the certain second tracking frame is greater than the first value, only then can the certain first tracking frame and the certain one be determined
  • the second tracking frame has a corresponding relationship, which improves the accuracy of determining the corresponding relationship of the tracking frame, thereby improving the accuracy of determining the tracking target.
  • FIG. 1B and FIG. 1C provided in the embodiments of the present application are only used as examples, and do not constitute a limitation to the embodiments of the present application.
  • FIG. 2A is a schematic flowchart of another tracking target determination method provided by an embodiment of the present application, which is applied to the above-mentioned electronic device and specifically includes the following steps:
  • Step 201 The electronic device obtains a first image and a second image in the same target video file, and obtains N first tracking frames of the first image, where the first image is the front of the second image A preset frame image, the first image and the second image each include N tracking targets, the N first tracking frames are used to frame the N tracking targets in the first image, so Said N is an integer greater than 1.
  • Step 202 The electronic device inputs the second image into the hourglass network model for feature extraction, and outputs a target feature map.
  • Step 203 The electronic device inputs the target feature map to the heat map branch to output a heat map, and inputs the target feature map to the width and height branch to output a width and height value set.
  • Step 204 The electronic device inputs the heat map into the feature vector branch to output a feature vector set.
  • Step 205 The electronic device determines the first positions of the first center points of the N second tracking frames based on the heat map.
  • Step 206 The electronic device determines the first height of the N second tracking frames and the first width of the N second tracking frames based on the width and height value set.
  • Step 207 The electronic device determines the feature vectors of the N first center points according to the first positions passing through the feature vector set and the N first center points.
  • Step 208 The electronic device determines N offset sets according to the feature vectors of the N first center points and the feature vectors of the second center points of the N first tracking frames, and according to the Nth A tracking frame and the N second tracking frames determine N matching degree sets, the N offset sets correspond to the feature vectors of the N first center points one-to-one, and each offset
  • the quantity sets each include N offsets, and the N offsets are the feature vector of the first center point corresponding to the feature vector of any one of the second center points in the feature vector set.
  • the N matching degree sets correspond to the N first tracking frames in a one-to-one correspondence, each of the matching degree sets includes N matching degrees, and the N matching degrees are all corresponding The degree of matching between the first tracking frame and any of the second tracking frames.
  • Step 209 The electronic device determines N second correspondences according to the N offset sets and the N matching degree sets, and the N second correspondences are used to characterize the relationship between the N second center points and the N second center points.
  • Step 210 The electronic device determines N first correspondences according to the N second correspondences.
  • Step 211 The electronic device determines the tracking target selected by the N second tracking frames based on the N first correspondences.
  • the first image including the tracking target S and the tracking target D is input into the hourglass network model, and the target feature map is input through the hourglass network model, and then the target feature maps are input into the heat of the prediction module.
  • the graph branch and the wide-height branch After passing through these two branches, the heat map and the wide-height value set are respectively output, and then the heat map is input into the feature vector branch of the prediction module.
  • the feature vector set After passing through the branch, the feature vector set is output, and then combined N first tracking frames, heat maps, width and height value sets determine the one-to-one correspondence between N second tracking frames and N second tracking frames and N first tracking frames, and finally based on N second tracking frames and According to the one-to-one correspondence of the N first tracking frames, it is possible to know which tracking target is selected by the N second tracking paragraphs, so as to achieve the purpose of determining the tracking target.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device includes a processor and a memory. , A communication interface, and one or more programs, wherein the one or more programs are stored in the memory and are configured to be executed by the processor, and the programs include instructions for executing the following steps:
  • N second tracking frames based on the heat map and the width and height value set, where the N second tracking frames are used to frame select N tracking targets in the second image;
  • N first correspondences are determined, and the N first correspondences are used to characterize the N first correspondences.
  • the tracking target selected by the N second tracking frames is determined based on the N first correspondences.
  • the hourglass network model is constructed by sequentially arranging i hourglass networks, and the input image of the i-th hourglass network is an image obtained by synthesizing the input image and the output image of the i-1th hourglass network ,
  • the i is an integer greater than or equal to 2;
  • the first processing is performed.
  • the input image is down-sampled through multiple first convolution blocks of the hourglass network, and the first feature map is output;
  • the feature map is up-sampled through a plurality of second convolution blocks of the hourglass network to output a second feature map; the second feature map is superimposed with the input image to output a third feature map.
  • the prediction network includes a heat map branch, a wide and high branch, and a feature vector branch; the target feature map is input to the prediction network to output the heat map, the wide and high value set, and the feature vector.
  • the above program includes instructions for executing the following steps:
  • the heat map is input into the feature vector branch to output a feature vector set.
  • the foregoing program includes instructions for executing the following steps:
  • the first height of the N second tracking frames and the first width of the N second tracking frames are determined based on the width and height value set.
  • the above-mentioned program includes methods for executing The following steps instructions:
  • N second tracking frames determine N matching degree sets, the N offset sets are in one-to-one correspondence with the feature vectors of the N first center points, and each offset set includes N Offsets, the N offsets are the corresponding feature vectors of the first center point, and the position offsets in the feature vector set relative to the feature vector of any of the second center points,
  • the N matching degree sets have a one-to-one correspondence with the N first tracking frames, each of the matching degree sets includes N matching degrees, and the N matching degrees are the corresponding first tracking frames.
  • N second correspondences are determined, and the N second correspondences are used to characterize the N second center points and the N th One-to-one correspondence of a central point;
  • FIG. 4 is a tracking target determination device provided by an embodiment of the present application, which is applied to the above electronic equipment, and the device includes:
  • the information obtaining unit 401 is configured to obtain a first image and a second image in the same target video file, and obtain N first tracking frames of the first image, where the first image is the second image
  • the first image and the second image each include N tracking targets, and the N first tracking frames are used to frame and select the N tracking targets in the first image ,
  • the N is an integer greater than 1;
  • the feature extraction unit 402 is configured to input the second image into the hourglass network model for feature extraction, and output a target feature map;
  • the data determining unit 403 is configured to input the target feature map to the prediction network to output the heat map, the width and height value set, and the feature vector set;
  • a tracking frame determination unit 404 configured to determine N second tracking frames based on the heat map and the width and height value set;
  • the correspondence relationship determining unit 405 is configured to determine N first correspondence relationships based on the N first tracking frames, the N second tracking frames, and the feature vector set, and the N first correspondence relationships are used To characterize the one-to-one correspondence between the N first tracking frames and the N second tracking frames;
  • the tracking target determining unit 406 is configured to determine the tracking target selected by the N second tracking frames based on the N first correspondences.
  • the hourglass network model is composed of i hourglass networks arranged in sequence, the input image of the i-th hourglass network is an image obtained by synthesizing the input image and the output image of the i-1th hourglass network, and the i is greater than or equal to 2 Integer
  • the first processing is performed.
  • the input image is down-sampled through multiple first convolution blocks of the hourglass network, and the first feature map is output;
  • the feature map is up-sampled through a plurality of second convolution blocks of the hourglass network to output a second feature map; the second feature map is superimposed with the input image to output a third feature map.
  • the prediction network includes a heat map branch, a wide-height branch, and a feature vector branch; the target feature map is input to the prediction network to output the heat map, the wide-height value set, and the feature vector
  • the data determining unit 403 is specifically configured to:
  • the heat map is input into the feature vector branch to output a feature vector set.
  • the N second tracking frames are determined based on the heat map and the width and height value set, and the tracking frame determining unit 404 is further configured to:
  • the first height of the N second tracking frames and the first width of the N second tracking frames are determined based on the width and height value set.
  • the N first correspondences are determined based on the N first tracking frames, the N second tracking frames, and the feature vector set, and the correspondence determining unit 405 is also used for:
  • N second tracking frames determine N matching degree sets, the N offset sets are in one-to-one correspondence with the feature vectors of the N first center points, and each offset set includes N Offsets, the N offsets are the corresponding feature vectors of the first center point, and the position offsets in the feature vector set relative to the feature vector of any of the second center points,
  • the N matching degree sets have a one-to-one correspondence with the N first tracking frames, each of the matching degree sets includes N matching degrees, and the N matching degrees are the corresponding first tracking frames.
  • N second correspondences are determined, and the N second correspondences are used to characterize the N second center points and the N th One-to-one correspondence of a central point;
  • the information acquisition unit 401, the feature extraction unit 402, the data determination unit 403, the tracking frame determination unit 404, the correspondence determination unit 405, and the tracking target determination unit 406 may be implemented by a processor.
  • the embodiment of the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes the computer to execute the electronic Part or all of the steps described by the device.
  • the embodiments of the present application also provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the electronic Part or all of the steps described by the device.
  • the computer program product may be a software installation package.
  • the steps of the method or algorithm described in the embodiments of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, which can be stored in random access memory (Random Access Memory, RAM), flash memory, read-only memory (Read Only Memory, ROM), and erasable programmable read-only memory ( Erasable Programmable ROM (EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), registers, hard disk, mobile hard disk, CD-ROM, or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may be located in the ASIC.
  • the ASIC may be located in an access network device, a target network device, or a core network device.
  • the processor and the storage medium may also exist as discrete components in the access network device, the target network device, or the core network device.
  • the functions described in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a Digital Video Disc (DVD)), or a semiconductor medium (for example, a Solid State Disk (SSD)) )Wait.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A tracked target determination method and a related device, which are applied to an electronic device. The method comprises: acquiring a first image and a second image from the same target video file, and acquiring N first tracking frames of the first image (101), wherein the first image is a previous preset frame image of the second image, and the first image and the second image both comprise N tracked targets; inputting the second image into an hourglass network model to perform feature extraction, and outputting a target feature map (102); inputting the target feature map into a prediction network so as to output a thermodynamic chart, a width and height value set, and a feature vector set (103); determining N second tracking frames on the basis of the thermodynamic chart and the width and height value set (104); determining N first correlations on the basis of the N first tracking frames, the N second tracking frames, and the feature vector set (105); and determining, on the basis of the N first correlations, tracked targets selected by means of the N second tracking frames (106). The method can improve the precision of determining tracked targets.

Description

跟踪目标确定方法及相关设备Tracking target determination method and related equipment 技术领域Technical field
本申请涉及电子技术领域,尤其涉及一种跟踪目标确定方法及相关设备。This application relates to the field of electronic technology, and in particular to a tracking target determination method and related equipment.
背景技术Background technique
目标跟踪是图像处理和视频处理领域的关键技术之一。目标跟踪用于识别视频或图像中的跟踪目标,广泛应用于智慧交通、人机交互及国防侦查等相关领域。跟踪目标的确定是实现目标跟踪必不可少的关键步骤之一,目前实现跟踪目标的确定主要是采用deep sort(深度排序)算法,而使用deep sort仅利用预测的位置信息进行匹配,预测准确度低。Target tracking is one of the key technologies in the field of image processing and video processing. Target tracking is used to identify tracking targets in videos or images, and is widely used in related fields such as smart transportation, human-computer interaction, and national defense investigation. The determination of the tracking target is one of the essential key steps to achieve target tracking. At present, the determination of the tracking target is mainly based on the deep sort (deep sort) algorithm, and the use of deep sort only uses the predicted position information for matching, and the prediction accuracy low.
发明内容Summary of the invention
本申请实施例提供一种跟踪目标确定方法及相关设备,用于提升确定跟踪目标的准确度。The embodiment of the present application provides a tracking target determination method and related equipment, which are used to improve the accuracy of determining the tracking target.
第一方面,本申请实施例提供一种跟踪目标确定方法,应用于电子设备,所述方法包括:In the first aspect, an embodiment of the present application provides a tracking target determination method, which is applied to an electronic device, and the method includes:
在同一目标视频文件中获取第一图像和第二图像,并获取所述第一图像的N个第一跟踪框,其中,所述第一图像为所述第二图像的前预设帧图像,所述第一图像和所述第二图像均包括N个跟踪目标,所述N个第一跟踪框用于框选所述第一图像中的所述N个跟踪目标,所述N为大于1的整数;Acquiring a first image and a second image in the same target video file, and acquiring N first tracking frames of the first image, wherein the first image is the previous preset frame image of the second image, The first image and the second image both include N tracking targets, and the N first tracking frames are used to frame and select the N tracking targets in the first image, and the N is greater than 1. Integer
将所述第二图像输入沙漏网络模型进行特征提取,输出目标特征图;Input the second image into the hourglass network model for feature extraction, and output a target feature map;
将所述目标特征图输入到预测网络,以输出热力图、宽高数值集和特征向量集;Input the target feature map to the prediction network to output a heat map, a set of width and height values, and a set of feature vectors;
基于所述热力图和所述宽高数值集,确定N个第二跟踪框;Determine N second tracking frames based on the heat map and the width and height value set;
基于所述N个第一跟踪框、所述N个第二跟踪框和所述特征向量集,确定N个第一对应关系,所述N个第一对应关系用于表征所述N个第一跟踪框与所述N个第二跟踪框的一一对应关系;Based on the N first tracking frames, the N second tracking frames, and the feature vector set, N first correspondences are determined, and the N first correspondences are used to characterize the N first correspondences. A one-to-one correspondence between the tracking frame and the N second tracking frames;
基于所述N个第一对应关系确定所述N个第二跟踪框框选的所述跟踪目标。The tracking target selected by the N second tracking frames is determined based on the N first correspondences.
第二方面,本申请实施例提供一种跟踪目标确定装置,应用于电子设备,该装置包括:In a second aspect, an embodiment of the present application provides a tracking target determination device, which is applied to an electronic device, and the device includes:
信息获取单元,用于在同一目标视频文件中获取第一图像和第二图像,并获取所述第一图像的N个第一跟踪框,其中,所述第一图像为所述第二图像的前预设帧图像,所述第一图像和所述第二图像均包括N个跟踪目标,所述N个第一跟踪框用于框选所述第一图像中的所述N个跟踪目标,所述N为大于1的整数;The information acquiring unit is configured to acquire a first image and a second image in the same target video file, and acquire N first tracking frames of the first image, wherein the first image is the image of the second image The first preset frame image, the first image and the second image both include N tracking targets, and the N first tracking frames are used to frame and select the N tracking targets in the first image, The N is an integer greater than 1;
特征提取单元,用于将所述第二图像输入沙漏网络模型进行特征提取,输出目标特征图;The feature extraction unit is configured to input the second image into the hourglass network model for feature extraction, and output a target feature map;
数据确定单元,用于将所述目标特征图输入到预测网络,以输出热力图、宽高数值集和特征向量集;The data determining unit is used to input the target feature map to the prediction network to output the heat map, the width and height value set, and the feature vector set;
跟踪框确定单元,用于基于所述热力图和所述宽高数值集,确定N个第二跟踪框;A tracking frame determining unit, configured to determine N second tracking frames based on the heat map and the width and height value set;
对应关系确定单元,用于基于所述N个第一跟踪框、所述N个第二跟踪框和所述特征向量集,确定N个第一对应关系,所述N个第一对应关系用于表征所述N个第一跟踪框与所述N个第二跟踪框的一一对应关系;The correspondence relationship determining unit is configured to determine N first correspondence relationships based on the N first tracking frames, the N second tracking frames, and the feature vector set, where the N first correspondence relationships are used for Characterize the one-to-one correspondence between the N first tracking frames and the N second tracking frames;
跟踪目标确定单元,用于基于所述N个第一对应关系确定所述N个第二跟踪框框选的所述跟踪目标。The tracking target determining unit is configured to determine the tracking target selected by the N second tracking frames based on the N first correspondences.
第三方面,本申请实施例提供一种电子设备,包括处理器、存储器、通信接口以及一个或多个程序,其中,上述一个或多个程序被存储在上述存储器中,并且被配置由上述处理器执行,上述程序包括用于执行本申请实施例第一方面所述的方法中的步骤的指令。In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and are configured to be processed by the above The above program includes instructions for executing the steps in the method described in the first aspect of the embodiments of the present application.
第四方面,本申请实施例提供了一种计算机可读存储介质,其中,上述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,上述计算机程序使得计算机执行如本申请实施例第一方面所述的方法中所描述的部分或全部步骤。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the above-mentioned computer-readable storage medium stores a computer program for electronic data exchange, wherein the above-mentioned computer program enables a computer to execute Part or all of the steps described in the method described in one aspect.
第五方面,本申请实施例提供了一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如本申请实施例第一方面所述的方法中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。In a fifth aspect, the embodiments of the present application provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute as implemented in this application. Examples include part or all of the steps described in the method described in the first aspect. The computer program product may be a software installation package.
可以看出,在本申请实施例中,首先从同一目标视频文件中获取第一图像和第二图像,第一图像为第二图像的前预设帧图像,然后将第二图像输入沙漏网络模型得到目标特征图,再然后将目标特征图输入预测网络,得到热力图、宽高数值集和特征向量集,再然后根据热力图和宽高数值集确定第二跟踪框,第二跟踪框用于框选第二图像中的所述N个跟踪目标,最后根据第一跟踪框、第二跟踪框和特征向量集实现对跟踪目标的确定,第一跟踪框用于框选第一图像中的所述N个跟踪目标。可见,本申请是基于某一图像、该某一图像的前预设帧图像和该前预设帧图像关联的跟踪框联合确定跟踪目标,实现了随跟踪目标位置的变化而变化的跟踪,进而提升确定跟踪目标的准确度。It can be seen that in this embodiment of the application, the first image and the second image are first obtained from the same target video file, the first image is the previous preset frame image of the second image, and then the second image is input into the hourglass network model Obtain the target feature map, and then input the target feature map into the prediction network to obtain the heat map, the width and height value set and the feature vector set, and then determine the second tracking frame according to the heat map and the width and height value set, and the second tracking frame is used for Frame the N tracking targets in the second image, and finally determine the tracking target according to the first tracking frame, the second tracking frame, and the feature vector set. The first tracking frame is used to frame all the tracking targets in the first image. The N tracking targets are described. It can be seen that this application jointly determines the tracking target based on a certain image, the previous preset frame image of the certain image, and the tracking frame associated with the previous preset frame image, and realizes the tracking that changes with the position of the tracking target, and then Improve the accuracy of determining the tracking target.
本申请的这些方面或其他方面在以下实施例的描述中会更加简明易懂。These and other aspects of the present application will be more concise and understandable in the description of the following embodiments.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1A是本申请实施例提供的一种跟踪目标确定方法的流程示意图;FIG. 1A is a schematic flowchart of a method for determining a tracking target provided by an embodiment of the present application;
图1B是本申请实施例提供的一种沙漏网络模型的结构示意图;FIG. 1B is a schematic structural diagram of an hourglass network model provided by an embodiment of the present application;
图1C是本申请实施例提供的一种热力图的示意图;Fig. 1C is a schematic diagram of a thermal map provided by an embodiment of the present application;
图2A是本申请实施例提供的另一种跟踪目标确定方法的流程示意图;2A is a schematic flowchart of another tracking target determination method provided by an embodiment of the present application;
图2B是本申请实施例提供的另一种跟踪目标确定方法示意图;2B is a schematic diagram of another tracking target determination method provided by an embodiment of the present application;
图3本申请实施例提供的一种电子设备的结构示意图;FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;
图4本申请实施例提供的一种跟踪目标确定装置的结构示意图。FIG. 4 is a schematic structural diagram of an apparatus for determining a tracking target according to an embodiment of the present application.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。In order to enable those skilled in the art to better understand the solutions of the application, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only It is a part of the embodiments of this application, not all the embodiments.
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。The terms "first", "second", "third" and "fourth" in the specification and claims of the application and the drawings are used to distinguish different objects, rather than describing a specific order .
电子设备可以包括各种具有无线通信功能的手持设备、车载设备、可穿戴设备、计算设备或连接到无线调制解调器的其他处理设备,以及各种形式的用户设备(User Equipment,UE),移动台(Mobile Station,MS),终端设备(terminal device)等等。Electronic devices can include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (UE), mobile stations ( Mobile Station, MS), terminal device (terminal device) and so on.
如图1A所示,图1A是本申请实施例提供一种跟踪目标确定方法,应用于上述电子设备,具体包括以下步骤:As shown in FIG. 1A, FIG. 1A shows a tracking target determination method provided by an embodiment of the present application, which is applied to the above-mentioned electronic device and specifically includes the following steps:
步骤101:电子设备在同一目标视频文件中获取第一图像和第二图像,并获取所述第一图像的N个第一跟踪框,其中,所述第一图像为所述第二图像的前预设帧图像,所述第一图像和所述第二图像均包括N个跟踪目标,所述N个第一跟踪框用于框选所述第一图像中的所述N个跟踪目标,所述N为大于1的整数。Step 101: The electronic device obtains a first image and a second image in the same target video file, and obtains N first tracking frames of the first image, where the first image is the front of the second image A preset frame image, the first image and the second image each include N tracking targets, the N first tracking frames are used to frame the N tracking targets in the first image, so Said N is an integer greater than 1.
其中,所述获取所述第一图像的N个第一跟踪框,包括:获取所述N个第一跟踪框的第二宽度、所述N个第一跟踪框的第二高度、所述N个第一跟踪框的第二位置、所述N个第一跟踪框的第二中心点的特征向量。Wherein, said obtaining the N first tracking frames of the first image includes: obtaining the second widths of the N first tracking frames, the second heights of the N first tracking frames, and the N Feature vectors of the second positions of the N first tracking frames and the second center points of the N first tracking frames.
其中,所述第一图像和第二图像的尺寸大小,即宽度和高度相同。第一图像和第二图像均为包括N个跟踪目标的图像,也就是说,第一图像和第二图像均显示有N个跟踪目标。Wherein, the size of the first image and the second image, that is, the width and the height are the same. The first image and the second image are both images including N tracking targets, that is, both the first image and the second image display N tracking targets.
例如,第一图像中显示4个跟踪目标,这4个跟踪目标为1、2、3和4,那么第二图像中同样显示有跟踪目标1、2、3和4。For example, if 4 tracking targets are displayed in the first image, and the 4 tracking targets are 1, 2, 3, and 4, the tracking targets 1, 2, 3, and 4 are also displayed in the second image.
其中,所述前预设帧图像例如是前一帧图像、前两帧图像、前4帧图像、前5帧图像等等。Wherein, the previous preset frame image is, for example, the previous frame image, the previous two frame images, the previous 4 frame images, the previous 5 frame images, and so on.
其中,所述目标视频文件是对该跟踪目标跟拍的视频文件。该目标视频文件存储在电子设备中,或者该目标视频文件存储在云端等Wherein, the target video file is a video file followed by the tracking target. The target video file is stored in an electronic device, or the target video file is stored in the cloud, etc.
步骤102:电子设备将所述第二图像输入沙漏网络模型进行特征提取,输出目标特征图。Step 102: The electronic device inputs the second image into the hourglass network model for feature extraction, and outputs a target feature map.
所述目标特征图包括N个跟踪目标的M个特征点,所述M为正整数。每个跟踪目标的特征点数量可相同也可不同,每个跟踪目标的特征点可为8个,10个,13个,18个等其他值,所述特征点用于标记所述跟踪目标的不同位置。例如,假设跟踪目标为人,特征点可以是人的关节点。The target feature map includes M feature points of N tracking targets, and the M is a positive integer. The number of feature points of each tracking target can be the same or different. The feature points of each tracking target can be 8, 10, 13, 18 and other values. The feature points are used to mark the tracking target. different positions. For example, assuming that the tracking target is a person, the feature points may be the joint points of the person.
步骤103:电子设备将所述目标特征图输入到预测网络,以输出热力图、宽 高数值集和特征向量集。Step 103: The electronic device inputs the target feature map to the prediction network to output the heat map, the width and height value set, and the feature vector set.
步骤104:电子设备基于所述热力图和所述宽高数值集,确定N个第二跟踪框。Step 104: The electronic device determines N second tracking frames based on the heat map and the width and height value set.
步骤105:电子设备基于所述N个第一跟踪框、所述N个第二跟踪框和所述特征向量集,确定N个第一对应关系,所述N个第一对应关系用于表征所述N个第一跟踪框与所述N个第二跟踪框的一一对应关系。Step 105: The electronic device determines N first correspondences based on the N first tracking frames, the N second tracking frames, and the feature vector set, and the N first correspondences are used to characterize all The one-to-one correspondence between the N first tracking frames and the N second tracking frames.
其中,第一跟踪框和第二跟踪框的形状相同,第一跟踪框和第二跟踪框的形状可以是长方形,正方形,菱形,圆形等其他形状。Wherein, the shape of the first tracking frame and the second tracking frame are the same, and the shapes of the first tracking frame and the second tracking frame may be rectangles, squares, diamonds, circles and other shapes.
其中,所述第一图像的宽度大于所述N个第一跟踪框的宽度,所述第一图像的高度大于所述N个第一跟踪框的高度;所述第二图像的宽度大于所述N个第二跟踪框的宽度,所述第二图像的高度大于所述N个第二跟踪框的高度。Wherein, the width of the first image is greater than the width of the N first tracking frames, the height of the first image is greater than the height of the N first tracking frames; the width of the second image is greater than the width of the The width of the N second tracking frames, and the height of the second image is greater than the height of the N second tracking frames.
其中,所述N个第一跟踪框中相邻的两个第一跟踪框可以存在重叠部分,所述N个第二跟踪框中相邻的两个第二跟踪框可以存在重叠部分。Wherein, two adjacent first tracking frames in the N first tracking frames may have overlapping parts, and two adjacent second tracking frames in the N second tracking frames may have overlapping parts.
其中,所述一一对应关系是指N个第二跟踪框中存在一个第二跟踪框与N个第一跟踪框中某一个第一跟踪框框选的跟踪目标相同。Wherein, the one-to-one correspondence means that there is a second tracking frame in the N second tracking frames that is the same as the tracking target selected by a first tracking frame in the N first tracking frames.
举例来说,假设第二跟踪框有3个,如第二跟踪框1、第二跟踪框2和第二跟踪框3,跟踪目标有3个,如A、B和C。假如第二跟踪框1框选A、第二跟踪框2框选B、第二跟踪框3框选C,第一跟踪框有3个,如第一跟踪框1、第一跟踪框2和第一跟踪框3,如果第一跟踪框1与第二跟踪框1一一对应,第一跟踪框2与第二跟踪框2一一对应,第一跟踪框3与第二跟踪框3一一对应,那么第一跟踪框1框选A、第一跟踪框2框选B、第一跟踪框3框选C。For example, suppose there are three second tracking frames, such as second tracking frame 1, second tracking frame 2, and second tracking frame 3, and there are three tracking targets, such as A, B, and C. If the second tracking box 1 selects A, the second tracking box 2 selects B, and the second tracking box 3 selects C, there are 3 first tracking boxes, such as the first tracking box 1, the first tracking box 2, and the first tracking box. One tracking frame 3, if the first tracking frame 1 corresponds to the second tracking frame 1, the first tracking frame 2 corresponds to the second tracking frame 2, and the first tracking frame 3 corresponds to the second tracking frame 3 one to one. , Then the first tracking box 1 selects A, the first tracking box 2 selects B, and the first tracking box 3 selects C.
其中,存在对应关系的第一跟踪框与第二跟踪框的高度可以是相同的,也可以是不相同的,在此不作限定。存在对应关系的第一跟踪框与第二跟踪框的宽度可以是相同的,也可以是不相同的,在此不作限定。Wherein, the heights of the first tracking frame and the second tracking frame that have a corresponding relationship may be the same or different, which is not limited here. The width of the first tracking frame and the second tracking frame that have a corresponding relationship may be the same or different, which is not limited here.
步骤106:电子设备基于所述N个第一对应关系确定所述N个第二跟踪框框选的所述跟踪目标。Step 106: The electronic device determines the tracking target selected by the N second tracking frames based on the N first correspondences.
在本申请的一实现方式中,在步骤206之后,所述方法还包括:电子设备在所述第二图像上显示所述N个第二跟踪框。In an implementation manner of the present application, after step 206, the method further includes: the electronic device displays the N second tracking frames on the second image.
可以看出,在本申请实施例中,首先从同一目标视频文件中获取第一图像和第二图像,第一图像为第二图像的前预设帧图像,然后将第二图像输入沙漏 网络模型得到目标特征图,再然后将目标特征图输入预测网络,得到热力图、宽高数值集和特征向量集,再然后根据热力图和宽高数值集确定第二跟踪框,第二跟踪框用于框选第二图像中的所述N个跟踪目标,最后根据第一跟踪框、第二跟踪框和特征向量集实现对跟踪目标的确定,第一跟踪框用于框选第一图像中的所述N个跟踪目标。可见,本申请是基于某一图像、该某一图像的前预设帧图像和该前预设帧图像关联的跟踪框联合确定跟踪目标,实现了随跟踪目标位置的变化而变化的跟踪,进而提升确定跟踪目标的准确度。It can be seen that in this embodiment of the application, the first image and the second image are first obtained from the same target video file, the first image is the previous preset frame image of the second image, and then the second image is input into the hourglass network model Obtain the target feature map, and then input the target feature map into the prediction network to obtain the heat map, the width and height value set and the feature vector set, and then determine the second tracking frame according to the heat map and the width and height value set, and the second tracking frame is used for Frame the N tracking targets in the second image, and finally determine the tracking target according to the first tracking frame, the second tracking frame, and the feature vector set. The first tracking frame is used to frame all the tracking targets in the first image. The N tracking targets are described. It can be seen that this application jointly determines the tracking target based on a certain image, the previous preset frame image of the certain image, and the tracking frame associated with the previous preset frame image, and realizes the tracking that changes with the position of the tracking target, and then Improve the accuracy of determining the tracking target.
在本申请的一实现方式中,所述沙漏网络模型由i个沙漏网络依次排列构成,第i个沙漏网络的输入图像为第i-1个沙漏网络的输入图像与输出图像合成得到的图像,所述i为大于或等于2的整数;In an implementation of the present application, the hourglass network model is composed of i hourglass networks arranged in sequence, and the input image of the i-th hourglass network is an image obtained by synthesizing the input image and the output image of the i-1th hourglass network, The i is an integer greater than or equal to 2;
每经过一个所述沙漏网络均进行第一处理,在所述第一处理中:将输入图像通过沙漏网络的多个第一卷积块进行下采样,输出第一特征图;将所述第一特征图通过沙漏网络的多个第二卷积块进行上采样,输出第二特征图;将所述第二特征图与所述输入图像进行叠加,输出第三特征图。Each time the hourglass network passes through, the first processing is performed. In the first processing, the input image is down-sampled through multiple first convolution blocks of the hourglass network, and the first feature map is output; The feature map is up-sampled through a plurality of second convolution blocks of the hourglass network to output a second feature map; the second feature map is superimposed with the input image to output a third feature map.
其中,所述第一卷积块为第一卷积神经网络,所述第二卷积块为第二卷积神经网络,所述第一卷积神经网络和所述第二卷积神经网络的作用不同。Wherein, the first convolutional block is a first convolutional neural network, the second convolutional block is a second convolutional neural network, and the difference between the first convolutional neural network and the second convolutional neural network The role is different.
其中,沙漏网络模型可以由2个,4个、5个,7个或其他数量的沙漏网络依次排列构成。沙漏网络模型的结构示意图如图1B所示。在沙漏网络模型由2个沙漏网络组成的情况下,一方面可以保证计算的准确性,另一方面可以提高计算的速度。Among them, the hourglass network model can be composed of 2, 4, 5, 7 or other numbers of hourglass networks arranged in sequence. The structure diagram of the hourglass network model is shown in Figure 1B. In the case that the hourglass network model is composed of two hourglass networks, on the one hand, the accuracy of the calculation can be ensured, and on the other hand, the calculation speed can be improved.
其中,沙漏网络模型中的第一个沙漏网络的输入图像为所述目标图像,沙漏网络模型中的最后一个沙漏网络输出的特征图为所述目标特征图。Wherein, the input image of the first hourglass network in the hourglass network model is the target image, and the feature map output by the last hourglass network in the hourglass network model is the target feature map.
其中,如图1B所示,每个沙漏网络均为一个对称网络,每个沙漏网络能够进行下采样与上采样,下采样在前,上采样在后,每个沙漏网络进行上采样和进行下采样的次数是相同的,如4次,6次,7次等其他值。下采样采用的技术为最近邻插值,用于降低图像分辨率。上采样采用的技术为最大池化或平均池化,用于提升图片分辨率。Among them, as shown in Figure 1B, each hourglass network is a symmetric network, and each hourglass network can perform down-sampling and up-sampling. The number of sampling is the same, such as 4 times, 6 times, 7 times and other values. The technique used for downsampling is nearest neighbor interpolation, which is used to reduce image resolution. The technique used for upsampling is maximum pooling or average pooling, which is used to improve the image resolution.
在本申请实施例中,沙漏网络a不是在沙漏网络模型中排列的第一个沙漏网络,沙漏网络a第一次进行下采样的输入图像为图像1(图像1是沙漏网络b的输入图像和沙漏网络b的输出图像合成得到的,在沙漏网络模型中,沙漏网 络a与沙漏网络b相邻,且位于沙漏网络b之后),沙漏网络a下一次进行下采样的输入图像为前一次下采样的输出图像,沙漏网络a下一次进行下采样的输出图像的分辨率在该下一次进行下采样的输入图像的分辨率的基础上缩小一倍,沙漏网络a第一次进行上采样的输入图像为沙漏网络a最后一次进行下采样的输出图像,沙漏网络a下一次进行上采样的输入图像为前一次进行上采样的输出图像和对称的下采样的输出图像的叠加合并,沙漏网络a下一次进行上采样的输出图像的分辨率在该下一次进行上采样的输入图像的分辨率基础上扩大一倍。In the embodiment of this application, the hourglass network a is not the first hourglass network arranged in the hourglass network model. The input image of the hourglass network a that is downsampled for the first time is image 1 (image 1 is the input image of the hourglass network b and The output image of the hourglass network b is synthesized. In the hourglass network model, the hourglass network a is adjacent to the hourglass network b and is behind the hourglass network b). The input image of the hourglass network a that is downsampled next time is the previous downsample The output image of the hourglass network a will be downsampled the next time the resolution of the output image is reduced by twice the resolution of the input image that will be downsampled next time, and the input image of the hourglass network a that will be upsampled for the first time It is the output image of the hourglass network a that was down-sampled for the last time. The input image of the hourglass network a that is up-sampled next time is the superposition and merging of the output image of the previous up-sampling and the output image of the symmetric down-sampling. The hourglass network a is the next time The resolution of the output image to be upsampled is doubled on the basis of the resolution of the input image to be upsampled next time.
沙漏网络模型中的第一个沙漏网络第一次进行下采样的输入图像为目标图像,沙漏网络模型中的第一个沙漏网络进行上采样和进行下采样的具体实现方式与沙漏网络a相同,具体详见上述内容,在此不再叙述。The input image of the first hourglass network in the hourglass network model for downsampling for the first time is the target image, and the specific implementation method of the first hourglass network in the hourglass network model for upsampling and downsampling is the same as that of the hourglass network a. For details, please refer to the above content, which will not be described here.
举例来说,假设沙漏网络a的上采样和下采样的次数均为4次,图像1为6*128*128,所述6为通道数,128*128为图像1的分辨率,采用邻近插值法执行第一次下采样后输出分辨率为6*64*64的图像2,对图像2执行第二次下采样后输出分辨率为6*32*32的图像3,对图像3执行第三次下采样后输出分辨率为6*16*16的图像4,对图像4执行第四次下采样后输出分辨率为6*8*8的图像5,当完成4次下采样后,对图像5采用平均池化进行上采样,执行第一次上采样后输出分辨率为6*16*16的图像6,将图像6与第三次下采样输出的图像4合并作为第二次上采样的输入,执行第二次上采样输出分辨率为6*32*32图像7,将图像7与图像3合并作为第三次上采样的输入,执行第三次上采样输出分辨率为6*64*64的图像8,最后将图像8与图像2合并作为第四次上采样的输入,执行第四次上采样输出分辨率为6*128*128的图像9。For example, suppose that the number of upsampling and downsampling of the hourglass network a is 4 times, image 1 is 6*128*128, where 6 is the number of channels, 128*128 is the resolution of image 1, and adjacent interpolation is used After performing the first down-sampling, it outputs image 2 with a resolution of 6*64*64. After performing the second down-sampling on image 2, it outputs image 3 with a resolution of 6*32*32. Perform the third on image 3. After the second downsampling, the output resolution is 6*16*16 image 4, after the fourth downsampling is performed on image 4, the output resolution is 6*8*8 image 5, when the 4 downsampling is completed, the image 5Using average pooling for up-sampling, after performing the first up-sampling, the output resolution is 6*16*16 image 6. Combine image 6 and image 4 output from the third down-sampling as the second up-sampling Input, perform the second up-sampling output resolution of 6*32*32 image 7, merge image 7 and image 3 as the input of the third up-sampling, perform the third up-sampling output resolution of 6*64* Image 8 of 64, and finally image 8 and image 2 are combined as the input of the fourth up-sampling, and the fourth up-sampling is performed to output image 9 with a resolution of 6*128*128.
可以看出,在本申请实施例中,通过每个沙漏网络均进行多次下采样与多次上采样,这样能提取到目标图像中不同区域的特征,且能保留目标图像中特征点之间的空间关系,可提升确定跟踪目标图像的概率。It can be seen that in the embodiment of the present application, multiple down-sampling and multiple up-sampling are performed through each hourglass network, so that the features of different regions in the target image can be extracted, and the feature points in the target image can be preserved. The spatial relationship can improve the probability of determining the tracking target image.
在本申请的一实现方式中,所述预测网络包括热力图分支、宽高分支和特征向量分支;电子设备所述将所述目标特征图输入到预测网络,以输出热力图、宽高数值集和特征向量集,包括:In an implementation manner of the present application, the prediction network includes a heat map branch, a wide-height branch, and a feature vector branch; the electronic device inputs the target feature map to the prediction network to output a heat map and a wide-height value set. And feature vector set, including:
电子设备将所述目标特征图输入到所述热力图分支,以输出热力图,以及将所述目标特征图输入到所述宽高分支,以输出宽高数值集;The electronic device inputs the target feature map to the heat map branch to output a heat map, and inputs the target feature map to the width and height branch to output a width and height value set;
电子设备将所述热力图输入所述特征向量分支,以输出特征向量集。The electronic device inputs the heat map into the feature vector branch to output a feature vector set.
其中,所述将所述目标特征图输入到所述宽高分支,以输出宽高数值集,包括:将所述目标特征图、所述N个第一跟踪框的第二宽度和所述N个第一跟踪框的第二高度输入到所述宽高分支,以输出宽高数值集。Wherein, the inputting the target feature map to the width and height branch to output the width and height value set includes: adding the target feature map, the second width of the N first tracking frames, and the N The second height of the first tracking frame is input to the width and height branch to output the width and height value set.
其中,所述将所述热力图输入所述特征向量分支,以输出特征向量集,包括:将所述热力图和所述N个第一跟踪框的第二中心点的特征向量输入所述特征向量分支,以输出特征向量集。Wherein, the inputting the heat map into the feature vector branch to output a feature vector set includes: inputting the heat map and the feature vectors of the second center points of the N first tracking frames into the feature Vector branches to output feature vector sets.
其中,电子设备将所述目标特征图输入所述热力图分支,将所述目标特征图输入所述宽高分支是并行执行的。Wherein, the electronic device inputs the target feature map into the heat map branch, and inputs the target feature map into the wide-height branch is executed in parallel.
其中,热力图分支是电子设备采用第一公式对第三卷积块训练得到。Among them, the heat map branch is obtained by the electronic device using the first formula to train the third convolution block.
第一公式为:The first formula is:
Figure PCTCN2020108990-appb-000001
Figure PCTCN2020108990-appb-000001
其中,所述H为目标特征图的高;所述W为目标特征图的宽;所述P ij为位置在(i,j)的特征点为目标特征点的概率;所述y ij为所述第一图像中位置(i,j)的特征点的标记值,在计算位置(i,j)处的特征点为目标特征点的概率时,所述标记值用于表示其对应的特征点出现计算误差的可能性,所述标记值越大表示出现计算误差的可能性越大,所述标记值越小表示出现计算误差的可能性越小,该标记值是电子设备在对第三卷积块进行训练时设定的,所述α、所述β是固定值,不同情况下,所述α和所述β的值可以是不同的。 Wherein, the H is the height of the target feature map; the W is the width of the target feature map; the P ij is the probability that the feature point located at (i, j) is the target feature point; the y ij is the target feature point; The label value of the feature point at the position (i, j) in the first image, when calculating the probability that the feature point at the position (i, j) is the target feature point, the label value is used to indicate its corresponding feature point The possibility of calculation error occurs. The larger the mark value, the greater the possibility of calculation error. The smaller the mark value, the lower the possibility of calculation error. The mark value means that the electronic device is in the third volume. The α and β are fixed values that are set when the product block is trained. Under different circumstances, the values of α and β may be different.
其中,热力图如图1C所示,图1C中的点表示中心点,图1C中的左边的纵坐标表示概率,图1C中的横坐标和右边的纵坐标联合表示中心点的位置。The heat map is shown in FIG. 1C, the point in FIG. 1C represents the center point, the ordinate on the left in FIG. 1C represents the probability, and the abscissa and the ordinate on the right in FIG. 1C jointly represent the location of the center point.
其中,宽高分支是电子设备采用第二公式对第四卷积块训练得到的。Among them, the width and height branches are obtained by the electronic device using the second formula to train the fourth convolution block.
第二公式为:L 2=|f(x)-Y| 2 The second formula is: L 2 =|f(x)-Y| 2
所述f(x)和Y均为宽度或高度,L 2为宽度差的平方或高度差的平方。 The f(x) and Y are both width or height, and L 2 is the square of the width difference or the square of the height difference.
其中,宽高数值集包括宽度与宽度差的平方的对应关系和高度与高度差的平方的对应关系,具体如表1所示。Among them, the width and height value set includes the corresponding relationship between the width and the square of the width difference and the corresponding relationship between the height and the square of the height difference, as shown in Table 1.
表1Table 1
高度(mm)Height (mm) 高度差的平方(mm 2) Square of height difference (mm 2 ) 宽度(mm)Width(mm) 宽度差的平方(mm 2) Square of width difference (mm 2 )
h1h1 H1H1 k1k1 K1K1
h2h2 H2H2 k2k2 k2k2
……... ……... ……... ……...
其中,所述第三卷积块为第三卷积神经网络,所述第四卷积块为第四卷积神经网络。所述第三卷积神经网络和所述第四卷积神经网络的作用互不相同。Wherein, the third convolution block is a third convolutional neural network, and the fourth convolution block is a fourth convolutional neural network. The functions of the third convolutional neural network and the fourth convolutional neural network are different from each other.
其中,特征向量分支包括第一分支、第二分支和第三分支,所述第一分支是电子设备采用第三公式对第五卷积块训练得到的,第二分支是电子设备采用第四公式对第六卷积块训练得到的,第三分支是电子设备采用第五公式训练对第七卷积块训练得到的。Wherein, the feature vector branch includes a first branch, a second branch, and a third branch. The first branch is obtained by the electronic device using the third formula to train the fifth convolution block, and the second branch is the electronic device using the fourth formula It is obtained by training the sixth convolution block, and the third branch is obtained by training the seventh convolution block by the electronic device using the fifth formula.
其中,所述第五卷积块为第五卷积神经网络,所述第六卷积块为第六卷积神经网络,所述第七卷积块为第七卷积神经网络。所述第五卷积神经网络、所述第六卷积神经网络和所述第七卷积神经网络的作用互不相同。Wherein, the fifth convolutional block is a fifth convolutional neural network, the sixth convolutional block is a sixth convolutional neural network, and the seventh convolutional block is a seventh convolutional neural network. The functions of the fifth convolutional neural network, the sixth convolutional neural network, and the seventh convolutional neural network are different from each other.
所述第三公式:The third formula:
Figure PCTCN2020108990-appb-000002
Figure PCTCN2020108990-appb-000002
其中,所述
Figure PCTCN2020108990-appb-000003
为任意一个第一跟踪框的第二中心点的特征向量,所述
Figure PCTCN2020108990-appb-000004
为所述任意一个第一跟踪框对应的第二跟踪框的第一中心点的特征向量,所述e k为所述任意一个第一跟踪框的第二中心点的特征向量与其对应的第二跟踪框的第一中心点的特征向量的均值。
Among them, the
Figure PCTCN2020108990-appb-000003
Is the feature vector of the second center point of any first tracking frame, the
Figure PCTCN2020108990-appb-000004
Is the feature vector of the first center point of the second tracking frame corresponding to the any one of the first tracking frames, and the e k is the feature vector of the second center point of the any one of the first tracking frames and its corresponding second The mean value of the feature vector of the first center point of the tracking frame.
所述第四公式:The fourth formula:
Figure PCTCN2020108990-appb-000005
Figure PCTCN2020108990-appb-000005
其中,所述e k为所述N个第一跟踪框中其中一个第一跟踪框的第二中心点的特征向量,与所述其中一个第一跟踪框对应的第二跟踪框的第一中心点的特征向量的均值;所述e j为所述N个第一跟踪框中另一个第一跟踪框的第二中心点的特征向量与所述另一个第一跟踪框对应的第二跟踪框的第一中心点的特征向量的均值。所述Δ=1。 Wherein, the e k is the feature vector of the second center point of one of the N first tracking frames, and the first center of the second tracking frame corresponding to the one of the first tracking frames The mean value of the feature vector of the point; the e j is the second tracking frame corresponding to the second center point of the other first tracking frame in the N first tracking frames The mean value of the eigenvectors of the first center point. The Δ=1.
所述第五公式:The fifth formula:
d 12=||x 1-x 2|| d 12 =||x 1 -x 2 ||
所述x 1为第一中心点的特征向量,所述x 2为第二中心点的特征向量。 The x 1 is the feature vector of the first center point, and the x 2 is the feature vector of the second center point.
其中,特征向量集包括N个第二跟踪框的第一中心点的特征向量,具体如表2所示。Among them, the feature vector set includes the feature vectors of the first center points of the N second tracking frames, as shown in Table 2.
表2Table 2
第一中心点First center point 特征向量Feature vector
(a1,b1)(a1,b1) c1c1
(a2,b2)(a2,b2) 3c23c2
(a3,b3)(a3,b3) 1.5c31.5c3
……... ……...
其中,第二中心点(a1,b1)对应的特征向量为c1、第二中心点(a2,b2)对应的特征向量为3c2,第二中心点(a3,b3)对应的特征向量为1.5c3,c1,c2及c3均为基础解系,可相同也可不同。Among them, the eigenvector corresponding to the second center point (a1, b1) is c1, the eigenvector corresponding to the second center point (a2, b2) is 3c2, and the eigenvector corresponding to the second center point (a3, b3) is 1.5c3 , C1, c2 and c3 are all basic solution systems, which can be the same or different.
可以看出,在本申请实施例中,由于将目标特征图输入两个分支是并行执行的,降低了卷积操作所需的时间,进而提升计算效率。It can be seen that in the embodiment of the present application, since the input of the target feature map into the two branches is executed in parallel, the time required for the convolution operation is reduced, and the calculation efficiency is improved.
在本申请的一实现方式中,所述基于所述热力图和所述宽高数值集,确定N个第二跟踪框In an implementation manner of the present application, the N second tracking frames are determined based on the heat map and the width and height value set.
电子设备基于所述热力图确定所述N个第二跟踪框的第一中心点的第一位置;The electronic device determines the first position of the first center point of the N second tracking frames based on the heat map;
电子设备基于所述宽高数值集确定所述N个第二跟踪框的第一高度和所述N个第二跟踪框的第一宽度。The electronic device determines the first height of the N second tracking frames and the first width of the N second tracking frames based on the width and height value set.
其中,所述N个第一跟踪框中任意两个第一跟踪框的第一高度可以相等,也可以不相等,所述N个第一跟踪框中任意两个第一跟踪框的第一宽度可以相等,也可以不相等,所述N个第一跟踪框中任意两个第一跟踪框的第一中心点的位置不同。Wherein, the first heights of any two first tracking frames of the N first tracking frames may be equal or unequal, and the first widths of any two first tracking frames of the N first tracking frames It may be equal or unequal, and the positions of the first center points of any two first tracking frames of the N first tracking frames are different.
具体地,通过所述热力图,可得到所述M个特征点中每个特征点为第一中心点的概率,然后将M个特征点中前N个概率较大的N个特征点作为第一中心点,进而可得到N个第一中心点的第一位置。例如,如图1C所示,特征点1,特征点2及特征点3是图1C中显示的所有特征点中对应的3个概率较大的特征点,在热力图为图1C的情况下,第一中心点为特征点1、特征点2及特征点3。Specifically, through the heat map, the probability that each of the M feature points is the first center point can be obtained, and then the first N feature points with the highest probability among the M feature points are taken as the first center point. A center point, and then the first positions of N first center points can be obtained. For example, as shown in Figure 1C, feature point 1, feature point 2, and feature point 3 are the three corresponding feature points with higher probability among all the feature points shown in Figure 1C. In the case of the heat map shown in Figure 1C, The first center points are feature point 1, feature point 2, and feature point 3.
具体地,第一高度是已知的,通过表1即可得到第一高度对应的高度差的平方,然后再基于第二公式即可计算得到第二高度。举例来说,假设第一高度为C,第一高度对应的高度差的平方为c,那么
Figure PCTCN2020108990-appb-000006
Specifically, the first height is known, and the square of the height difference corresponding to the first height can be obtained from Table 1, and then the second height can be calculated based on the second formula. For example, suppose the first height is C, and the square of the height difference corresponding to the first height is c, then
Figure PCTCN2020108990-appb-000006
第一宽度是已知的,通过表1即可得到第一宽度对应的宽度差的平方,然 后再基于第二公式即可计算得到第二宽度。举例来说,假设第一宽度为D,第一宽度对应的宽度差的平方为d,那么
Figure PCTCN2020108990-appb-000007
The first width is known, and the square of the width difference corresponding to the first width can be obtained from Table 1, and then the second width can be calculated based on the second formula. For example, suppose the first width is D, and the square of the width difference corresponding to the first width is d, then
Figure PCTCN2020108990-appb-000007
在本申请的一实现方式中,电子设备基于所述N个第一跟踪框、所述N个第二跟踪框和所述特征向量集,确定N个第一对应关系,包括:In an implementation manner of the present application, the electronic device determines N first correspondences based on the N first tracking frames, the N second tracking frames, and the feature vector set, including:
根据通过所述特征向量集以及所述N个第一中心点的第一位置,确定所述N个第一中心点的特征向量;Determining the feature vectors of the N first center points according to the first positions passing through the feature vector set and the N first center points;
根据所述N个第一中心点的特征向量和所述N个第一跟踪框的第二中心点的特征向量,确定N个偏移量集,以及根据所述N个第一跟踪框和所述N个第二跟踪框,确定N个匹配度集,所述N个偏移量集与所述N个第一中心点的特征向量一一对应,每个所述偏移量集均包括N个偏移量,所述N个偏移量为其对应的所述第一中心点的特征向量,相对于任一所述第二中心点的特征向量在所述特征向量集的位置偏移,所述N个匹配度集与所述N个第一跟踪框一一对应,每个所述匹配度集均包括N个匹配度,所述N个匹配度为其对应的所述第一跟踪框与任一所述第二跟踪框的匹配度;Determine N offset sets according to the feature vectors of the N first center points and the feature vectors of the second center points of the N first tracking frames, and determine the N offset sets according to the N first tracking frames and the The N second tracking frames determine N matching degree sets, the N offset sets are in one-to-one correspondence with the feature vectors of the N first center points, and each offset set includes N Offsets, the N offsets are the corresponding feature vectors of the first center point, and the position offsets in the feature vector set relative to the feature vector of any of the second center points, The N matching degree sets have a one-to-one correspondence with the N first tracking frames, each of the matching degree sets includes N matching degrees, and the N matching degrees are the corresponding first tracking frames. The degree of matching with any of the second tracking frames;
根据所述N个偏移量集和所述N个匹配度集确定N个第二对应关系,所述N个第二对应关系用于表征所述N个第二中心点与所述N个第一中心点的一一对应关系;According to the N offset sets and the N matching degree sets, N second correspondences are determined, and the N second correspondences are used to characterize the N second center points and the N th One-to-one correspondence of a central point;
根据所述N个第二对应关系确定N个第一对应关系。Determine N first correspondences according to the N second correspondences.
其中,采用第六公式计算N个偏移量集。所述第六公式:Among them, the sixth formula is used to calculate N offset sets. The sixth formula:
Figure PCTCN2020108990-appb-000008
Figure PCTCN2020108990-appb-000008
其中,所述d a表示第一跟踪框a的第二中心点的特征向量,所述d b表示第二跟踪框b的的第一中心点的特征向量,所述
Figure PCTCN2020108990-appb-000009
表示第一跟踪框a的协方差矩阵,所述d (1)(a,b)表示第一跟踪框a的第一中心点的特征向量相对于第二跟踪框b的第二中心点的特征向量在在特征向量集的位置偏移。
Wherein d a represents the feature vector of the first and second center point of a tracking frame, D b represents the center point of the first feature vector b of a second tracking frame, the
Figure PCTCN2020108990-appb-000009
Represents the covariance matrix of the first tracking frame a, the d (1) (a, b) represents the feature vector of the first center point of the first tracking frame a relative to the second center point of the second tracking frame b The vector is offset in the position of the feature vector set.
其中,采用第七公式计算N个匹配度集。Among them, the seventh formula is used to calculate N matching degree sets.
所述第七公式:The seventh formula:
Figure PCTCN2020108990-appb-000010
Figure PCTCN2020108990-appb-000010
Figure PCTCN2020108990-appb-000011
Figure PCTCN2020108990-appb-000011
其中,所述L k为=100,所述
Figure PCTCN2020108990-appb-000012
为第一跟踪框a在第k帧中的第二中心点的特征向量,所述
Figure PCTCN2020108990-appb-000013
表示第二跟踪框b的第二中心点的特征向量的转置,所述R a为 在最近100帧中第一跟踪框a的第二中心点的特征向量的集合,所述d (2)(a,b)表示第一跟踪框a与第二跟踪框b在外观上的匹配度。
Wherein, the L k is=100, and the
Figure PCTCN2020108990-appb-000012
Is the feature vector of the second center point of the first tracking frame a in the k-th frame, the
Figure PCTCN2020108990-appb-000013
Represents a transpose vector of the second feature of the second center point b of the tracking frame, said R a is set in the most recent feature vector 100 in a second center point of a first track frame, the d (2) (a, b) represents the degree of match between the first tracking frame a and the second tracking frame b in appearance.
最后,采用第八公式对所述N个偏移量集中任意一个偏移量集和所述N个匹配度集中任意一个进行加权计算。Finally, the eighth formula is used to perform a weighted calculation on any offset set in the N offset sets and any one in the N matching degree sets.
所述第八公式:The eighth formula:
C a,b=λd (1)(a,b)+(1-λ)d (2)(a,b) C a,b =λd (1) (a,b)+(1-λ)d (2) (a,b)
其中,所述λ是固定值,不同情况下,所述λ的值可以不同;所述C a,b为加权和。 Wherein, the λ is a fixed value, and under different circumstances, the value of the λ may be different; the C a and b are weighted sums.
若通过所述N个偏移量集确定第一跟踪框o的第二中心点与第二跟踪框p的第一中心点之间的距离最短,且第一跟踪框o与第二跟踪框p的加权和大于第一数值,则第一跟踪框o与第二跟踪框p存在对应关系,第一跟踪框o为N个第一跟踪框中的其中一个,第二跟踪框p为N个第二跟踪框中的其中一个。If it is determined by the N offset sets that the distance between the second center point of the first tracking frame o and the first center point of the second tracking frame p is the shortest, and the first tracking frame o and the second tracking frame p If the weighted sum is greater than the first value, the first tracking frame o and the second tracking frame p have a corresponding relationship. The first tracking frame o is one of the N first tracking frames, and the second tracking frame p is the Nth tracking frame. One of the two tracking boxes.
可选地,通过第一中心点的特征向量和第二中心点的特征向量,可确定N个第一中心点与N个第二中心点的一一对应。例如,A1、A2为第一中心点,B1、B2为第二中心点,由于不能判断A1与B1、B2的关系,A2与B1、B2的关系,因此可能会出现两种情况:A1与B1对应,A2与B2对应;A1与B2对应,A2与B1对应。假设A1与B1对应,A2与B2对应,首先采用第三公式将A1与B1之间的距离拉近,然后采用第四公式将A1与B2之间的距离拉远,最后第五公式计算之间A1与B1之间的距离A1B1。假设A1与B2对应,A2与B1对应,首先采用第三公式将A1与B2之间的距离拉近,然后采用第四公式将A1与B1之间的距离拉远,最后第五公式计算之间A1与B2之间的距离A1B2。最后比较A1B1与A1B2之间的距离,若A1B1>A1B2,则A1与B1对应;若A1B1<A1B2,则A1与B2对应。Optionally, through the feature vector of the first center point and the feature vector of the second center point, the one-to-one correspondence between the N first center points and the N second center points can be determined. For example, A1 and A2 are the first center points, B1 and B2 are the second center points. Since the relationship between A1 and B1 and B2 cannot be judged, and the relationship between A2 and B1 and B2, there may be two situations: A1 and B1 Corresponding, A2 corresponds to B2; A1 corresponds to B2, and A2 corresponds to B1. Assuming that A1 corresponds to B1 and A2 corresponds to B2, first use the third formula to narrow the distance between A1 and B1, then use the fourth formula to widen the distance between A1 and B2, and finally the fifth formula to calculate the distance The distance between A1 and B1 is A1B1. Assuming that A1 corresponds to B2, and A2 corresponds to B1, first use the third formula to narrow the distance between A1 and B2, then use the fourth formula to widen the distance between A1 and B1, and finally the fifth formula to calculate the distance The distance between A1 and B2 is A1B2. Finally, compare the distance between A1B1 and A1B2, if A1B1>A1B2, then A1 and B1 correspond; if A1B1<A1B2, then A1 and B2 correspond.
可以看出,在本申请实施例中,在确定某一个第一跟踪框与某一个第二跟踪框的是否存在对应关系的过程中,当该某一个第一跟踪框的第一中心点与该某一个第二跟踪框的第二中心点之间的距离最近,该某一个第一跟踪框的第二中心点的特征向量相对于该某一个第二跟踪框的第一中心点的特征向量在特征向量分支中的位置偏移最小,以及该某一个第一跟踪框与该某一个第二跟踪框的匹配度的加权和大于第一数值,才确定该某一个第一跟踪框与该某一个第二跟踪框存在对应关系,提升确定跟踪框对应关系的准确度,进而提升确定跟踪 目标的准确度。It can be seen that, in the embodiment of the present application, in the process of determining whether there is a corresponding relationship between a certain first tracking frame and a certain second tracking frame, when the first center point of the certain first tracking frame is The distance between the second center points of a certain second tracking frame is the closest, and the feature vector of the second center point of the certain first tracking frame is relative to the feature vector of the first center point of the certain second tracking frame. The position offset in the feature vector branch is the smallest, and the weighted sum of the matching degree of the certain first tracking frame and the certain second tracking frame is greater than the first value, only then can the certain first tracking frame and the certain one be determined The second tracking frame has a corresponding relationship, which improves the accuracy of determining the corresponding relationship of the tracking frame, thereby improving the accuracy of determining the tracking target.
需要说明的是,本申请实施例提供的图1B和图1C仅用于举例,并不构成对本申请实施例的限定。It should be noted that FIG. 1B and FIG. 1C provided in the embodiments of the present application are only used as examples, and do not constitute a limitation to the embodiments of the present application.
与所述图1A所示的实施例一致的,请参阅图2A,图2A是本申请实施例提供的另一种跟踪目标确定的方法的流程示意图,应用于上述电子设备,具体包括以下步骤:Consistent with the embodiment shown in FIG. 1A, please refer to FIG. 2A. FIG. 2A is a schematic flowchart of another tracking target determination method provided by an embodiment of the present application, which is applied to the above-mentioned electronic device and specifically includes the following steps:
步骤201:电子设备在同一目标视频文件中获取第一图像和第二图像,并获取所述第一图像的N个第一跟踪框,其中,所述第一图像为所述第二图像的前预设帧图像,所述第一图像和所述第二图像均包括N个跟踪目标,所述N个第一跟踪框用于框选所述第一图像中的所述N个跟踪目标,所述N为大于1的整数。Step 201: The electronic device obtains a first image and a second image in the same target video file, and obtains N first tracking frames of the first image, where the first image is the front of the second image A preset frame image, the first image and the second image each include N tracking targets, the N first tracking frames are used to frame the N tracking targets in the first image, so Said N is an integer greater than 1.
步骤202:电子设备将所述第二图像输入沙漏网络模型进行特征提取,输出目标特征图。Step 202: The electronic device inputs the second image into the hourglass network model for feature extraction, and outputs a target feature map.
步骤203:电子设备将所述目标特征图输入到所述热力图分支,以输出热力图,以及将所述目标特征图输入到所述宽高分支,以输出宽高数值集。Step 203: The electronic device inputs the target feature map to the heat map branch to output a heat map, and inputs the target feature map to the width and height branch to output a width and height value set.
步骤204:电子设备将所述热力图输入所述特征向量分支,以输出特征向量集。Step 204: The electronic device inputs the heat map into the feature vector branch to output a feature vector set.
步骤205:电子设备基于所述热力图确定所述N个第二跟踪框的第一中心点的第一位置。Step 205: The electronic device determines the first positions of the first center points of the N second tracking frames based on the heat map.
步骤206:电子设备基于所述宽高数值集确定所述N个第二跟踪框的第一高度和所述N个第二跟踪框的第一宽度。Step 206: The electronic device determines the first height of the N second tracking frames and the first width of the N second tracking frames based on the width and height value set.
步骤207:电子设备根据通过所述特征向量集以及所述N个第一中心点的第一位置,确定所述N个第一中心点的特征向量。Step 207: The electronic device determines the feature vectors of the N first center points according to the first positions passing through the feature vector set and the N first center points.
步骤208:电子设备根据所述N个第一中心点的特征向量和所述N个第一跟踪框的第二中心点的特征向量,确定N个偏移量集,以及根据所述N个第一跟踪框和所述N个第二跟踪框,确定N个匹配度集,所述N个偏移量集与所述N个第一中心点的特征向量一一对应,每个所述偏移量集均包括N个偏移量,所述N个偏移量为其对应的所述第一中心点的特征向量,相对于任一所述第二中心点的特征向量在所述特征向量集的位置偏移,所述N个匹配度集与所述N 个第一跟踪框一一对应,每个所述匹配度集均包括N个匹配度,所述N个匹配度为其对应的所述第一跟踪框与任一所述第二跟踪框的匹配度。Step 208: The electronic device determines N offset sets according to the feature vectors of the N first center points and the feature vectors of the second center points of the N first tracking frames, and according to the Nth A tracking frame and the N second tracking frames determine N matching degree sets, the N offset sets correspond to the feature vectors of the N first center points one-to-one, and each offset The quantity sets each include N offsets, and the N offsets are the feature vector of the first center point corresponding to the feature vector of any one of the second center points in the feature vector set. , The N matching degree sets correspond to the N first tracking frames in a one-to-one correspondence, each of the matching degree sets includes N matching degrees, and the N matching degrees are all corresponding The degree of matching between the first tracking frame and any of the second tracking frames.
步骤209:电子设备根据所述N个偏移量集和所述N个匹配度集确定N个第二对应关系,所述N个第二对应关系用于表征所述N个第二中心点与所述N个第一中心点的一一对应关系。步骤210:电子设备根据所述N个第二对应关系确定N个第一对应关系。Step 209: The electronic device determines N second correspondences according to the N offset sets and the N matching degree sets, and the N second correspondences are used to characterize the relationship between the N second center points and the N second center points. The one-to-one correspondence of the N first center points. Step 210: The electronic device determines N first correspondences according to the N second correspondences.
步骤211:电子设备基于所述N个第一对应关系确定所述N个第二跟踪框框选的所述跟踪目标。Step 211: The electronic device determines the tracking target selected by the N second tracking frames based on the N first correspondences.
举例来说,如图2B所示,将包括跟踪目标S和跟踪目标D的第一图像输入沙漏网络模型,通过沙漏网络模型之后输入目标特征图,再然后将目标特征图分别输入预测模块的热力图分支和宽高分支,通过这2个分支之后,分别输出热力图和宽高数值集,再然后将热力图输入预测模块的特征向量分支,通过该分支之后,输出特征向量集,再然后结合N个第一跟踪框、热力图、宽高数值集确定N个第二跟踪框和N个第二跟踪框与N个第一跟踪框的一一对应关系,最后基于N个第二跟踪框与N个第一跟踪框的一一对应关系,即可得知该N个第二跟踪款所框选的跟踪目标具体是哪个,进而达到确定跟踪目标的目的。For example, as shown in FIG. 2B, the first image including the tracking target S and the tracking target D is input into the hourglass network model, and the target feature map is input through the hourglass network model, and then the target feature maps are input into the heat of the prediction module. The graph branch and the wide-height branch. After passing through these two branches, the heat map and the wide-height value set are respectively output, and then the heat map is input into the feature vector branch of the prediction module. After passing through the branch, the feature vector set is output, and then combined N first tracking frames, heat maps, width and height value sets determine the one-to-one correspondence between N second tracking frames and N second tracking frames and N first tracking frames, and finally based on N second tracking frames and According to the one-to-one correspondence of the N first tracking frames, it is possible to know which tracking target is selected by the N second tracking paragraphs, so as to achieve the purpose of determining the tracking target.
需要说明的是,本实施例的具体实现过程可参见上述方法实施例所述的具体实现过程,在此不再叙述。It should be noted that the specific implementation process of this embodiment can refer to the specific implementation process described in the foregoing method embodiment, which will not be described here.
与上述图1A和图2A所示的实施例一致的,请参阅图3,图3是本申请实施例提供的一种电子设备的结构示意图,如图所示,该电子设备包括处理器、存储器、通信接口以及一个或多个程序,其中,上述一个或多个程序被存储在上述存储器中,并且被配置由上述处理器执行,上述程序包括用于执行以下步骤的指令:Consistent with the embodiment shown in FIG. 1A and FIG. 2A, please refer to FIG. 3. FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. As shown in the figure, the electronic device includes a processor and a memory. , A communication interface, and one or more programs, wherein the one or more programs are stored in the memory and are configured to be executed by the processor, and the programs include instructions for executing the following steps:
在同一目标视频文件中获取第一图像和第二图像,并获取所述第一图像的N个第一跟踪框,其中,所述第一图像为所述第二图像的前预设帧图像,所述第一图像和所述第二图像均包括N个跟踪目标,所述N个第一跟踪框用于框选所述第一图像中的所述N个跟踪目标,所述N为大于1的整数;Acquiring a first image and a second image in the same target video file, and acquiring N first tracking frames of the first image, wherein the first image is the previous preset frame image of the second image, The first image and the second image both include N tracking targets, and the N first tracking frames are used to frame and select the N tracking targets in the first image, and the N is greater than 1. Integer
将所述第二图像输入沙漏网络模型进行特征提取,输出目标特征图;Input the second image into the hourglass network model for feature extraction, and output a target feature map;
将所述目标特征图输入到预测网络,以输出热力图、宽高数值集和特征向 量集;Input the target feature map to the prediction network to output a heat map, a set of width and height values, and a set of feature vectors;
基于所述热力图和所述宽高数值集,确定N个第二跟踪框,所述N个第二跟踪框用于框选所述第二图像中N个跟踪目标;Determining N second tracking frames based on the heat map and the width and height value set, where the N second tracking frames are used to frame select N tracking targets in the second image;
基于所述N个第一跟踪框、所述N个第二跟踪框和所述特征向量集,确定N个第一对应关系,所述N个第一对应关系用于表征所述N个第一跟踪框与所述N个第二跟踪框的一一对应关系;Based on the N first tracking frames, the N second tracking frames, and the feature vector set, N first correspondences are determined, and the N first correspondences are used to characterize the N first correspondences. A one-to-one correspondence between the tracking frame and the N second tracking frames;
基于所述N个第一对应关系确定所述N个第二跟踪框框选的所述跟踪目标。The tracking target selected by the N second tracking frames is determined based on the N first correspondences.
在本申请一实现方式中,在将所述沙漏网络模型由i个沙漏网络依次排列构成,第i个沙漏网络的输入图像为第i-1个沙漏网络的输入图像与输出图像合成得到的图像,所述i为大于或等于2的整数;In one implementation of the present application, the hourglass network model is constructed by sequentially arranging i hourglass networks, and the input image of the i-th hourglass network is an image obtained by synthesizing the input image and the output image of the i-1th hourglass network , The i is an integer greater than or equal to 2;
每经过一个所述沙漏网络均进行第一处理,在所述第一处理中:将输入图像通过沙漏网络的多个第一卷积块进行下采样,输出第一特征图;将所述第一特征图通过沙漏网络的多个第二卷积块进行上采样,输出第二特征图;将所述第二特征图与所述输入图像进行叠加,输出第三特征图。Each time the hourglass network passes through, the first processing is performed. In the first processing, the input image is down-sampled through multiple first convolution blocks of the hourglass network, and the first feature map is output; The feature map is up-sampled through a plurality of second convolution blocks of the hourglass network to output a second feature map; the second feature map is superimposed with the input image to output a third feature map.
在本申请的一实现方式中,所述预测网络包括热力图分支、宽高分支和特征向量分支;在将所述目标特征图输入到预测网络,以输出热力图、宽高数值集和特征向量集方面,上述程序包括用于执行以下步骤指令:In an implementation of the present application, the prediction network includes a heat map branch, a wide and high branch, and a feature vector branch; the target feature map is input to the prediction network to output the heat map, the wide and high value set, and the feature vector. In terms of set, the above program includes instructions for executing the following steps:
将所述目标特征图输入到所述热力图分支,以输出热力图,以及将所述目标特征图输入到所述宽高分支,以输出宽高数值集;Inputting the target feature map to the heat map branch to output a heat map, and inputting the target feature map to the width and height branch to output a width and height value set;
将所述热力图输入所述特征向量分支,以输出特征向量集。The heat map is input into the feature vector branch to output a feature vector set.
在本申请的一实现方式中,在基于所述热力图和所述宽高数值集,确定N个第二跟踪框方面,上述程序包括用于执行以下步骤指令:In an implementation manner of the present application, in terms of determining N second tracking frames based on the heat map and the width and height value set, the foregoing program includes instructions for executing the following steps:
基于所述热力图确定所述N个第二跟踪框的第一中心点的第一位置;Determining the first position of the first center point of the N second tracking frames based on the heat map;
基于所述宽高数值集确定所述N个第二跟踪框的第一高度和所述N个第二跟踪框的第一宽度。The first height of the N second tracking frames and the first width of the N second tracking frames are determined based on the width and height value set.
在本申请的一实现方式中,在基于所述N个第一跟踪框、所述N个第二跟踪框和所述特征向量集,确定N个第一对应关系方面,上述程序包括用于执行以下步骤指令:In an implementation manner of the present application, in terms of determining N first correspondences based on the N first tracking frames, the N second tracking frames, and the feature vector set, the above-mentioned program includes methods for executing The following steps instructions:
根据通过所述特征向量集以及所述N个第一中心点的第一位置,确定所述N个第一中心点的特征向量;Determining the feature vectors of the N first center points according to the first positions passing through the feature vector set and the N first center points;
根据所述N个第一中心点的特征向量和所述N个第一跟踪框的第二中心点的特征向量,确定N个偏移量集,以及根据所述N个第一跟踪框和所述N个第二跟踪框,确定N个匹配度集,所述N个偏移量集与所述N个第一中心点的特征向量一一对应,每个所述偏移量集均包括N个偏移量,所述N个偏移量为其对应的所述第一中心点的特征向量,相对于任一所述第二中心点的特征向量在所述特征向量集的位置偏移,所述N个匹配度集与所述N个第一跟踪框一一对应,每个所述匹配度集均包括N个匹配度,所述N个匹配度为其对应的所述第一跟踪框与任一所述第二跟踪框的匹配度;Determine N offset sets according to the feature vectors of the N first center points and the feature vectors of the second center points of the N first tracking frames, and determine the N offset sets according to the N first tracking frames and the The N second tracking frames determine N matching degree sets, the N offset sets are in one-to-one correspondence with the feature vectors of the N first center points, and each offset set includes N Offsets, the N offsets are the corresponding feature vectors of the first center point, and the position offsets in the feature vector set relative to the feature vector of any of the second center points, The N matching degree sets have a one-to-one correspondence with the N first tracking frames, each of the matching degree sets includes N matching degrees, and the N matching degrees are the corresponding first tracking frames. The degree of matching with any of the second tracking frames;
根据所述N个偏移量集和所述N个匹配度集确定N个第二对应关系,所述N个第二对应关系用于表征所述N个第二中心点与所述N个第一中心点的一一对应关系;According to the N offset sets and the N matching degree sets, N second correspondences are determined, and the N second correspondences are used to characterize the N second center points and the N th One-to-one correspondence of a central point;
根据所述N个第二对应关系确定N个第一对应关系。Determine N first correspondences according to the N second correspondences.
需要说明的是,本实施例的具体实现过程可参见上述方法实施例所述的具体实现过程,在此不再叙述。It should be noted that the specific implementation process of this embodiment can refer to the specific implementation process described in the foregoing method embodiment, which will not be described here.
请参阅图4,图4是本申请实施例提供的一种跟踪目标确定装置,应用于上述电子设备,该装置包括:Please refer to FIG. 4. FIG. 4 is a tracking target determination device provided by an embodiment of the present application, which is applied to the above electronic equipment, and the device includes:
信息获取单元401,用于在同一目标视频文件中获取第一图像和第二图像,并获取所述第一图像的N个第一跟踪框,其中,所述第一图像为所述第二图像的前预设帧图像,所述第一图像和所述第二图像均包括N个跟踪目标,所述N个第一跟踪框用于框选所述第一图像中的所述N个跟踪目标,所述N为大于1的整数;The information obtaining unit 401 is configured to obtain a first image and a second image in the same target video file, and obtain N first tracking frames of the first image, where the first image is the second image The first image and the second image each include N tracking targets, and the N first tracking frames are used to frame and select the N tracking targets in the first image , The N is an integer greater than 1;
特征提取单元402,用于将所述第二图像输入沙漏网络模型进行特征提取,输出目标特征图;The feature extraction unit 402 is configured to input the second image into the hourglass network model for feature extraction, and output a target feature map;
数据确定单元403,用于将所述目标特征图输入到预测网络,以输出热力图、宽高数值集和特征向量集;The data determining unit 403 is configured to input the target feature map to the prediction network to output the heat map, the width and height value set, and the feature vector set;
跟踪框确定单元404,用于基于所述热力图和所述宽高数值集,确定N个第二跟踪框;A tracking frame determination unit 404, configured to determine N second tracking frames based on the heat map and the width and height value set;
对应关系确定单元405,用于基于所述N个第一跟踪框、所述N个第二跟踪框和所述特征向量集,确定N个第一对应关系,所述N个第一对应关系用于 表征所述N个第一跟踪框与所述N个第二跟踪框的一一对应关系;The correspondence relationship determining unit 405 is configured to determine N first correspondence relationships based on the N first tracking frames, the N second tracking frames, and the feature vector set, and the N first correspondence relationships are used To characterize the one-to-one correspondence between the N first tracking frames and the N second tracking frames;
跟踪目标确定单元406,用于基于所述N个第一对应关系确定所述N个第二跟踪框框选的所述跟踪目标。The tracking target determining unit 406 is configured to determine the tracking target selected by the N second tracking frames based on the N first correspondences.
在本申请的一实现方式中,In an implementation manner of this application,
所述沙漏网络模型由i个沙漏网络依次排列构成,第i个沙漏网络的输入图像为第i-1个沙漏网络的输入图像与输出图像合成得到的图像,所述i为大于或等于2的整数;The hourglass network model is composed of i hourglass networks arranged in sequence, the input image of the i-th hourglass network is an image obtained by synthesizing the input image and the output image of the i-1th hourglass network, and the i is greater than or equal to 2 Integer
每经过一个所述沙漏网络均进行第一处理,在所述第一处理中:将输入图像通过沙漏网络的多个第一卷积块进行下采样,输出第一特征图;将所述第一特征图通过沙漏网络的多个第二卷积块进行上采样,输出第二特征图;将所述第二特征图与所述输入图像进行叠加,输出第三特征图。Each time the hourglass network passes through, the first processing is performed. In the first processing, the input image is down-sampled through multiple first convolution blocks of the hourglass network, and the first feature map is output; The feature map is up-sampled through a plurality of second convolution blocks of the hourglass network to output a second feature map; the second feature map is superimposed with the input image to output a third feature map.
在本申请的一实现方式中,在预测网络包括热力图分支、宽高分支和特征向量分支;所述将所述目标特征图输入到预测网络,以输出热力图、宽高数值集和特征向量集方面,所述数据确定单元403具体用于:In an implementation of the present application, the prediction network includes a heat map branch, a wide-height branch, and a feature vector branch; the target feature map is input to the prediction network to output the heat map, the wide-height value set, and the feature vector In terms of collection, the data determining unit 403 is specifically configured to:
将所述目标特征图输入到所述热力图分支,以输出热力图,以及将所述目标特征图输入到所述宽高分支,以输出宽高数值集;Inputting the target feature map to the heat map branch to output a heat map, and inputting the target feature map to the width and height branch to output a width and height value set;
将所述热力图输入所述特征向量分支,以输出特征向量集。The heat map is input into the feature vector branch to output a feature vector set.
在本申请的一实现方式中,所述基于所述热力图和所述宽高数值集,确定N个第二跟踪框,所述跟踪框确定单元404还用于:In an implementation manner of the present application, the N second tracking frames are determined based on the heat map and the width and height value set, and the tracking frame determining unit 404 is further configured to:
基于所述热力图确定所述N个第二跟踪框的第一中心点的第一位置;Determining the first position of the first center point of the N second tracking frames based on the heat map;
基于所述宽高数值集确定所述N个第二跟踪框的第一高度和所述N个第二跟踪框的第一宽度。The first height of the N second tracking frames and the first width of the N second tracking frames are determined based on the width and height value set.
在本申请的一实现方式中,所述基于所述N个第一跟踪框、所述N个第二跟踪框和所述特征向量集,确定N个第一对应关系,所述对应关系确定单元405还用于:In an implementation manner of the present application, the N first correspondences are determined based on the N first tracking frames, the N second tracking frames, and the feature vector set, and the correspondence determining unit 405 is also used for:
根据通过所述特征向量集以及所述N个第一中心点的第一位置,确定所述N个第一中心点的特征向量;Determining the feature vectors of the N first center points according to the first positions passing through the feature vector set and the N first center points;
根据所述N个第一中心点的特征向量和所述N个第一跟踪框的第二中心点的特征向量,确定N个偏移量集,以及根据所述N个第一跟踪框和所述N个第二跟踪框,确定N个匹配度集,所述N个偏移量集与所述N个第一中心点的特 征向量一一对应,每个所述偏移量集均包括N个偏移量,所述N个偏移量为其对应的所述第一中心点的特征向量,相对于任一所述第二中心点的特征向量在所述特征向量集的位置偏移,所述N个匹配度集与所述N个第一跟踪框一一对应,每个所述匹配度集均包括N个匹配度,所述N个匹配度为其对应的所述第一跟踪框与任一所述第二跟踪框的匹配度;Determine N offset sets according to the feature vectors of the N first center points and the feature vectors of the second center points of the N first tracking frames, and determine the N offset sets according to the N first tracking frames and the The N second tracking frames determine N matching degree sets, the N offset sets are in one-to-one correspondence with the feature vectors of the N first center points, and each offset set includes N Offsets, the N offsets are the corresponding feature vectors of the first center point, and the position offsets in the feature vector set relative to the feature vector of any of the second center points, The N matching degree sets have a one-to-one correspondence with the N first tracking frames, each of the matching degree sets includes N matching degrees, and the N matching degrees are the corresponding first tracking frames. The degree of matching with any of the second tracking frames;
根据所述N个偏移量集和所述N个匹配度集确定N个第二对应关系,所述N个第二对应关系用于表征所述N个第二中心点与所述N个第一中心点的一一对应关系;According to the N offset sets and the N matching degree sets, N second correspondences are determined, and the N second correspondences are used to characterize the N second center points and the N th One-to-one correspondence of a central point;
根据所述N个第二对应关系确定N个第一对应关系。Determine N first correspondences according to the N second correspondences.
需要说明的是,信息获取单元401、特征提取单元402、数据确定单元403、跟踪框确定单元404、对应关系确定单元405和跟踪目标确定单元406可通过处理器实现。It should be noted that the information acquisition unit 401, the feature extraction unit 402, the data determination unit 403, the tracking frame determination unit 404, the correspondence determination unit 405, and the tracking target determination unit 406 may be implemented by a processor.
本申请实施例还提供了一种计算机可读存储介质,其中,所述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如上述方法实施例中电子设备所描述的部分或全部步骤。The embodiment of the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes the computer to execute the electronic Part or all of the steps described by the device.
本申请实施例还提供了一种计算机程序产品,其中,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如上述方法中电子设备所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。The embodiments of the present application also provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the electronic Part or all of the steps described by the device. The computer program product may be a software installation package.
本申请实施例所描述的方法或者算法的步骤可以以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于接入网设备、目标网络设备或核心网设备中。当然,处理器和存储介质也可以作为分立组件存在于接入网设备、目标网络设备或核 心网设备中。The steps of the method or algorithm described in the embodiments of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions. Software instructions can be composed of corresponding software modules, which can be stored in random access memory (Random Access Memory, RAM), flash memory, read-only memory (Read Only Memory, ROM), and erasable programmable read-only memory ( Erasable Programmable ROM (EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), registers, hard disk, mobile hard disk, CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and write information to the storage medium. Of course, the storage medium may also be an integral part of the processor. The processor and the storage medium may be located in the ASIC. In addition, the ASIC may be located in an access network device, a target network device, or a core network device. Of course, the processor and the storage medium may also exist as discrete components in the access network device, the target network device, or the core network device.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请实施例所描述的功能可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(Digital Video Disc,DVD))、或者半导体介质(例如,固态硬盘(Solid State Disk,SSD))等。Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a Digital Video Disc (DVD)), or a semiconductor medium (for example, a Solid State Disk (SSD)) )Wait.
以上所述的具体实施方式,对本申请实施例的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请实施例的具体实施方式而已,并不用于限定本申请实施例的保护范围,凡在本申请实施例的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请实施例的保护范围之内。The specific implementations described above further describe the purpose, technical solutions, and beneficial effects of the embodiments of the application in detail. It should be understood that the foregoing descriptions are only specific implementations of the embodiments of the application, and are not used for To limit the protection scope of the embodiments of the application, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the application shall be included in the protection scope of the embodiments of the application.

Claims (10)

  1. 一种跟踪目标确定方法,其特征在于,应用于电子设备,所述方法包括:A tracking target determination method, characterized in that it is applied to an electronic device, and the method includes:
    在同一目标视频文件中获取第一图像和第二图像,并获取所述第一图像的N个第一跟踪框,其中,所述第一图像为所述第二图像的前预设帧图像,所述第一图像和所述第二图像均包括N个跟踪目标,所述N个第一跟踪框用于框选所述第一图像中的所述N个跟踪目标,所述N为大于1的整数;Acquiring a first image and a second image in the same target video file, and acquiring N first tracking frames of the first image, wherein the first image is the previous preset frame image of the second image, The first image and the second image both include N tracking targets, and the N first tracking frames are used to frame and select the N tracking targets in the first image, and the N is greater than 1. Integer
    将所述第二图像输入沙漏网络模型进行特征提取,输出目标特征图;Input the second image into the hourglass network model for feature extraction, and output a target feature map;
    将所述目标特征图输入到预测网络,以输出热力图、宽高数值集和特征向量集;Input the target feature map to the prediction network to output a heat map, a set of width and height values, and a set of feature vectors;
    基于所述热力图和所述宽高数值集,确定N个第二跟踪框,所述N个第二跟踪框用于框选所述第二图像中N个跟踪目标;Determining N second tracking frames based on the heat map and the width and height value set, where the N second tracking frames are used to frame select N tracking targets in the second image;
    基于所述N个第一跟踪框、所述N个第二跟踪框和所述特征向量集,确定N个第一对应关系,所述N个第一对应关系用于表征所述N个第一跟踪框与所述N个第二跟踪框的一一对应关系;Based on the N first tracking frames, the N second tracking frames, and the feature vector set, N first correspondences are determined, and the N first correspondences are used to characterize the N first correspondences. A one-to-one correspondence between the tracking frame and the N second tracking frames;
    基于所述N个第一对应关系确定所述N个第二跟踪框框选的所述跟踪目标。The tracking target selected by the N second tracking frames is determined based on the N first correspondences.
  2. 根据权利要求1所述的方法,其特征在于,所述沙漏网络模型由i个沙漏网络依次排列构成,第i个沙漏网络的输入图像为第i-1个沙漏网络的输入图像与输出图像合成得到的图像,所述i为大于或等于2的整数;The method according to claim 1, wherein the hourglass network model is composed of i hourglass networks arranged in sequence, and the input image of the i-th hourglass network is a composite of the input image and output image of the i-1th hourglass network In the obtained image, the i is an integer greater than or equal to 2;
    每经过一个所述沙漏网络均进行第一处理,在所述第一处理中:将输入图像通过沙漏网络的多个第一卷积块进行下采样,输出第一特征图;将所述第一特征图通过沙漏网络的多个第二卷积块进行上采样,输出第二特征图;将所述第二特征图与所述输入图像进行叠加,输出第三特征图。Each time the hourglass network passes through, the first processing is performed. In the first processing, the input image is down-sampled through multiple first convolution blocks of the hourglass network, and the first feature map is output; The feature map is up-sampled through a plurality of second convolution blocks of the hourglass network to output a second feature map; the second feature map is superimposed with the input image to output a third feature map.
  3. 根据权利要求1或2所述的方法,其特征在于,所述预测网络包括热力图分支、宽高分支和特征向量分支;所述将所述目标特征图输入到预测网络,以输出热力图、宽高数值集和特征向量集,包括:The method according to claim 1 or 2, wherein the prediction network includes a heat map branch, a wide and high branch, and a feature vector branch; the target feature map is input to the prediction network to output the heat map, The width and height value set and feature vector set, including:
    将所述目标特征图输入到所述热力图分支,以输出热力图,以及将所述目标特征图输入到所述宽高分支,以输出宽高数值集;Inputting the target feature map to the heat map branch to output a heat map, and inputting the target feature map to the width and height branch to output a width and height value set;
    将所述热力图输入所述特征向量分支,以输出特征向量集。The heat map is input into the feature vector branch to output a feature vector set.
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述热力图和所述宽高数值集,确定N个第二跟踪框,包括:The method according to claim 3, wherein the determining N second tracking frames based on the heat map and the width and height value set comprises:
    基于所述热力图确定所述N个第二跟踪框的第一中心点的第一位置;Determining the first position of the first center point of the N second tracking frames based on the heat map;
    基于所述宽高数值集确定所述N个第二跟踪框的第一高度和所述N个第二跟踪框的第一宽度。The first height of the N second tracking frames and the first width of the N second tracking frames are determined based on the width and height value set.
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述N个第一跟踪框、所述N个第二跟踪框和所述特征向量集,确定N个第一对应关系,包括:The method according to claim 4, wherein the determining N first correspondences based on the N first tracking frames, the N second tracking frames, and the feature vector set comprises:
    根据通过所述特征向量集以及所述N个第一中心点的第一位置,确定所述N个第一中心点的特征向量;Determining the feature vectors of the N first center points according to the first positions passing through the feature vector set and the N first center points;
    根据所述N个第一中心点的特征向量和所述N个第一跟踪框的第二中心点的特征向量,确定N个偏移量集,以及根据所述N个第一跟踪框和所述N个第二跟踪框,确定N个匹配度集,所述N个偏移量集与所述N个第一中心点的特征向量一一对应,每个所述偏移量集均包括N个偏移量,所述N个偏移量为其对应的所述第一中心点的特征向量,相对于任一所述第二中心点的特征向量在所述特征向量集的位置偏移,所述N个匹配度集与所述N个第一跟踪框一一对应,每个所述匹配度集均包括N个匹配度,所述N个匹配度为其对应的所述第一跟踪框与任一所述第二跟踪框的匹配度;Determine N offset sets according to the feature vectors of the N first center points and the feature vectors of the second center points of the N first tracking frames, and determine the N offset sets according to the N first tracking frames and the The N second tracking frames determine N matching degree sets, the N offset sets are in one-to-one correspondence with the feature vectors of the N first center points, and each offset set includes N Offsets, the N offsets are the corresponding feature vectors of the first center point, and the position offsets in the feature vector set relative to the feature vector of any of the second center points, The N matching degree sets have a one-to-one correspondence with the N first tracking frames, each of the matching degree sets includes N matching degrees, and the N matching degrees are the corresponding first tracking frames. The degree of matching with any of the second tracking frames;
    根据所述N个偏移量集和所述N个匹配度集确定N个第二对应关系,所述N个第二对应关系用于表征所述N个第二中心点与所述N个第一中心点的一一对应关系;According to the N offset sets and the N matching degree sets, N second correspondences are determined, and the N second correspondences are used to characterize the N second center points and the N th One-to-one correspondence of a central point;
    根据所述N个第二对应关系确定N个第一对应关系。Determine N first correspondences according to the N second correspondences.
  6. 一种跟踪目标确定装置,其特征在于,应用于电子设备,所述装置包括:A tracking target determination device, characterized in that it is applied to electronic equipment, and the device includes:
    信息获取单元,用于在同一目标视频文件中获取第一图像和第二图像,并获取所述第一图像的N个第一跟踪框,其中,所述第一图像为所述第二图像的前预设帧图像,所述第一图像和所述第二图像均包括N个跟踪目标,所述N个第一跟踪框用于框选所述第一图像中的所述N个跟踪目标,所述N为大于1的整数;The information acquiring unit is configured to acquire a first image and a second image in the same target video file, and acquire N first tracking frames of the first image, wherein the first image is the image of the second image The first preset frame image, the first image and the second image both include N tracking targets, and the N first tracking frames are used to frame and select the N tracking targets in the first image, The N is an integer greater than 1;
    特征提取单元,用于将所述第二图像输入沙漏网络模型进行特征提取,输出目标特征图;The feature extraction unit is configured to input the second image into the hourglass network model for feature extraction, and output a target feature map;
    数据确定单元,用于将所述目标特征图输入到预测网络,以输出热力图、宽高数值集和特征向量集;The data determining unit is used to input the target feature map to the prediction network to output the heat map, the width and height value set, and the feature vector set;
    跟踪框确定单元,用于基于所述热力图和所述宽高数值集,确定N个第二 跟踪框;A tracking frame determining unit, configured to determine N second tracking frames based on the heat map and the width and height value set;
    对应关系确定单元,用于基于所述N个第一跟踪框、所述N个第二跟踪框和所述特征向量集,确定N个第一对应关系,所述N个第一对应关系用于表征所述N个第一跟踪框与所述N个第二跟踪框的一一对应关系;The correspondence relationship determining unit is configured to determine N first correspondence relationships based on the N first tracking frames, the N second tracking frames, and the feature vector set, where the N first correspondence relationships are used for Characterize the one-to-one correspondence between the N first tracking frames and the N second tracking frames;
    跟踪目标确定单元,用于基于所述N个第一对应关系确定所述N个第二跟踪框框选的所述跟踪目标。The tracking target determining unit is configured to determine the tracking target selected by the N second tracking frames based on the N first correspondences.
  7. 根据权利要求6所述的装置,其特征在于,所述沙漏网络模型由i个沙漏网络依次排列构成,第i个沙漏网络的输入图像为第i-1个沙漏网络的输入图像与输出图像合成得到的图像,所述i为大于或等于2的整数;The device according to claim 6, wherein the hourglass network model is composed of i hourglass networks arranged in sequence, and the input image of the i-th hourglass network is a composite of the input image and output image of the i-1th hourglass network In the obtained image, the i is an integer greater than or equal to 2;
    每经过一个所述沙漏网络均进行第一处理,在所述第一处理中:将输入图像通过沙漏网络的多个第一卷积块进行下采样,输出第一特征图;将所述第一特征图通过沙漏网络的多个第二卷积块进行上采样,输出第二特征图;将所述第二特征图与所述输入图像进行叠加,输出第三特征图。Each time the hourglass network passes through, the first processing is performed. In the first processing, the input image is down-sampled through multiple first convolution blocks of the hourglass network, and the first feature map is output; The feature map is up-sampled through a plurality of second convolution blocks of the hourglass network to output a second feature map; the second feature map is superimposed with the input image to output a third feature map.
  8. 根据权利要求6或7所述的装置,其特征在于,所述预测网络包括热力图分支、宽高分支和特征向量分支;在将所述目标特征图输入到预测网络,以输出热力图、宽高数值集和特征向量集方面,所述数据确定单元具体用于:The device according to claim 6 or 7, wherein the prediction network includes a heat map branch, a wide-height branch, and a feature vector branch; the target feature map is input to the prediction network to output the heat map, wide-high branch, and feature vector branch; In terms of high value sets and feature vector sets, the data determining unit is specifically used for:
    将所述目标特征图输入到所述热力图分支,以输出热力图,以及将所述目标特征图输入到所述宽高分支,以输出宽高数值集;Inputting the target feature map to the heat map branch to output a heat map, and inputting the target feature map to the width and height branch to output a width and height value set;
    将所述热力图输入所述特征向量分支,以输出特征向量集。The heat map is input to the feature vector branch to output a feature vector set.
  9. 一种电子设备,其特征在于,所述电子设备包括处理器、存储器、通信接口,以及一个或多个程序,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,所述程序包括用于执行如权利要求1-5任一项所述的方法中的步骤的指令。An electronic device, characterized in that the electronic device includes a processor, a memory, a communication interface, and one or more programs, and the one or more programs are stored in the memory and configured by the Executed by the processor, the program includes instructions for executing the steps in the method according to any one of claims 1-5.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理执行如权利要求1-5任意一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, wherein the computer program is processed to execute the method according to any one of claims 1-5.
PCT/CN2020/108990 2019-09-27 2020-08-13 Tracked target determination method and related device WO2021057309A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910925725.4A CN110826403B (en) 2019-09-27 2019-09-27 Tracking target determination method and related equipment
CN201910925725.4 2019-09-27

Publications (1)

Publication Number Publication Date
WO2021057309A1 true WO2021057309A1 (en) 2021-04-01

Family

ID=69548326

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/108990 WO2021057309A1 (en) 2019-09-27 2020-08-13 Tracked target determination method and related device

Country Status (2)

Country Link
CN (1) CN110826403B (en)
WO (1) WO2021057309A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826403B (en) * 2019-09-27 2020-11-24 深圳云天励飞技术有限公司 Tracking target determination method and related equipment
CN111460926B (en) * 2020-03-16 2022-10-14 华中科技大学 Video pedestrian detection method fusing multi-target tracking clues
CN113763415B (en) * 2020-06-04 2024-03-08 北京达佳互联信息技术有限公司 Target tracking method, device, electronic equipment and storage medium
CN113313736B (en) * 2021-06-10 2022-05-17 厦门大学 Online multi-target tracking method for unified target motion perception and re-identification network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103679125A (en) * 2012-09-24 2014-03-26 致伸科技股份有限公司 Human face tracking method
CN105894538A (en) * 2016-04-01 2016-08-24 海信集团有限公司 Target tracking method and target tracking device
CN106250863A (en) * 2016-08-09 2016-12-21 北京旷视科技有限公司 object tracking method and device
CN109766887A (en) * 2019-01-16 2019-05-17 中国科学院光电技术研究所 A kind of multi-target detection method based on cascade hourglass neural network
CN110826403A (en) * 2019-09-27 2020-02-21 深圳云天励飞技术有限公司 Tracking target determination method and related equipment

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103985252A (en) * 2014-05-23 2014-08-13 江苏友上科技实业有限公司 Multi-vehicle projection locating method based on time domain information of tracked object
WO2018144537A1 (en) * 2017-01-31 2018-08-09 The Regents Of The University Of California Machine learning based driver assistance
CN108229455B (en) * 2017-02-23 2020-10-16 北京市商汤科技开发有限公司 Object detection method, neural network training method and device and electronic equipment
CN108229456B (en) * 2017-11-22 2021-05-18 深圳市商汤科技有限公司 Target tracking method and device, electronic equipment and computer storage medium
CN108830285B (en) * 2018-03-14 2021-09-21 江南大学 Target detection method for reinforcement learning based on fast-RCNN
CN108550161B (en) * 2018-03-20 2021-09-14 南京邮电大学 Scale self-adaptive kernel-dependent filtering rapid target tracking method
CN109146924B (en) * 2018-07-18 2020-09-08 苏州飞搜科技有限公司 Target tracking method and device based on thermodynamic diagram
CN109657595B (en) * 2018-12-12 2023-05-02 中山大学 Key feature region matching face recognition method based on stacked hourglass network
CN109858333B (en) * 2018-12-20 2023-01-17 腾讯科技(深圳)有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN109726659A (en) * 2018-12-21 2019-05-07 北京达佳互联信息技术有限公司 Detection method, device, electronic equipment and the readable medium of skeleton key point
CN109886998A (en) * 2019-01-23 2019-06-14 平安科技(深圳)有限公司 Multi-object tracking method, device, computer installation and computer storage medium
CN109948526B (en) * 2019-03-18 2021-10-29 北京市商汤科技开发有限公司 Image processing method and device, detection equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103679125A (en) * 2012-09-24 2014-03-26 致伸科技股份有限公司 Human face tracking method
CN105894538A (en) * 2016-04-01 2016-08-24 海信集团有限公司 Target tracking method and target tracking device
CN106250863A (en) * 2016-08-09 2016-12-21 北京旷视科技有限公司 object tracking method and device
CN109766887A (en) * 2019-01-16 2019-05-17 中国科学院光电技术研究所 A kind of multi-target detection method based on cascade hourglass neural network
CN110826403A (en) * 2019-09-27 2020-02-21 深圳云天励飞技术有限公司 Tracking target determination method and related equipment

Also Published As

Publication number Publication date
CN110826403A (en) 2020-02-21
CN110826403B (en) 2020-11-24

Similar Documents

Publication Publication Date Title
WO2021057309A1 (en) Tracked target determination method and related device
WO2021057315A1 (en) Multi-target tracking method and related device
US11244435B2 (en) Method and apparatus for generating vehicle damage information
US11443445B2 (en) Method and apparatus for depth estimation of monocular image, and storage medium
WO2021098362A1 (en) Video classification model construction method and apparatus, video classification method and apparatus, and device and medium
CN111627065B (en) Visual positioning method and device and storage medium
CN110838125B (en) Target detection method, device, equipment and storage medium for medical image
CN110910422A (en) Target tracking method and device, electronic equipment and readable storage medium
WO2017214968A1 (en) Method and apparatus for convolutional neural networks
CN113343982B (en) Entity relation extraction method, device and equipment for multi-modal feature fusion
CN111160229B (en) SSD network-based video target detection method and device
CN111860398A (en) Remote sensing image target detection method and system and terminal equipment
CN109885628B (en) Tensor transposition method and device, computer and storage medium
CN108921801B (en) Method and apparatus for generating image
CN111461113A (en) Large-angle license plate detection method based on deformed plane object detection network
CN112037267A (en) Method for generating panoramic graph of commodity placement position based on video target tracking
CN110555798A (en) Image deformation method and device, electronic equipment and computer readable storage medium
CN113298870B (en) Object posture tracking method and device, terminal equipment and storage medium
CN107730543B (en) Rapid iterative computation method for semi-dense stereo matching
CN111191555A (en) Target tracking method, medium and system combining high-low spatial frequency characteristics
CN106570911B (en) Method for synthesizing facial cartoon based on daisy descriptor
CN113255700B (en) Image feature map processing method and device, storage medium and terminal
CN114664410A (en) Video-based focus classification method and device, electronic equipment and medium
CN110070110B (en) Adaptive threshold image matching method
CN113112398A (en) Image processing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20867129

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20867129

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.10.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20867129

Country of ref document: EP

Kind code of ref document: A1