WO2018086607A1 - Target tracking method, electronic device, and storage medium - Google Patents

Target tracking method, electronic device, and storage medium Download PDF

Info

Publication number
WO2018086607A1
WO2018086607A1 PCT/CN2017/110577 CN2017110577W WO2018086607A1 WO 2018086607 A1 WO2018086607 A1 WO 2018086607A1 CN 2017110577 W CN2017110577 W CN 2017110577W WO 2018086607 A1 WO2018086607 A1 WO 2018086607A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
feature vector
candidate
region
Prior art date
Application number
PCT/CN2017/110577
Other languages
French (fr)
Chinese (zh)
Inventor
唐矗
Original Assignee
纳恩博(北京)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 纳恩博(北京)科技有限公司 filed Critical 纳恩博(北京)科技有限公司
Publication of WO2018086607A1 publication Critical patent/WO2018086607A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Definitions

  • the present invention relates to the field of electronic technologies, and in particular, to a target tracking method, an electronic device, and a storage medium.
  • the visual tracking technology based on online learning has become a hot spot of visual tracking after its rise in recent years.
  • Such a method extracts feature templates according to the specified tracking targets in the initial frame picture without any prior experience of offline learning.
  • the training model is used for tracking the target in subsequent videos.
  • the tracking process according to the tracking status Update the model to accommodate changes in the target's posture.
  • This type of method does not require any offline training, and can track any object specified by the user, which has high versatility.
  • the embodiment of the present invention solves the visual tracking method of online learning in the prior art by providing a target tracking method, an electronic device, and a storage medium, and it is impossible to determine whether the tracking target is It’s hard to get back the technical problems of tracking targets after losing them.
  • the present invention provides the following technical solutions through an embodiment of the present invention:
  • a target tracking method is applied to an electronic device, wherein the electronic device has an image capturing unit, and the image collecting unit is configured to collect image data, and the method includes:
  • a candidate target having the highest similarity with the tracking target among the plurality of candidate targets is determined as the tracking target.
  • the determining a tracking target in the initial frame image of the image data comprises:
  • the extracting a plurality of candidate targets in the subsequent frame image of the image data comprises:
  • the plurality of candidate targets are determined within the ith image block.
  • the calculating the similarity between each candidate target and the tracking target comprises:
  • the calculating the first color feature vector of the first candidate target and calculating the second color feature vector of the tracking target comprises:
  • the calculating a color feature vector of each region in the first mask image; and calculating a color feature vector of each region in the second mask image comprises:
  • W is a positive integer
  • the projection weight of the first pixel on each n primary colors is calculated based on the following equation:
  • the first pixel is any one of the first region or the second region, and the nth main color is any one of the W main colors, and w n is a projection weight of the first pixel on the nth main color, I r , I g , and I b are RGB values of the first pixel; R n , G n , B n are the The RGB values of the n main colors.
  • the calculating the similarity between each candidate target and the tracking target comprises:
  • the determining the plurality of candidate targets in the ith image block comprises:
  • the calculating the similarity between each candidate target and the tracking target comprises:
  • the present invention provides the following technical solutions through an embodiment of the present invention:
  • a first determining unit configured to determine a tracking target in an initial frame image of the image data
  • An extracting unit configured to extract a plurality of candidate targets in a subsequent frame image of the image data,
  • the subsequent frame image is any frame image subsequent to the initial frame image;
  • a calculating unit configured to calculate a similarity between each candidate target and the tracking target
  • the second determining unit is configured to determine, as the tracking target, a candidate target that has the highest similarity with the tracking target among the plurality of candidate targets.
  • the first determining unit includes:
  • a first determining subunit configured to acquire a user's selection operation after outputting the initial frame image through the display screen; determining the tracking target in the initial frame image based on a user's selection operation;
  • a second determining subunit configured to acquire feature information for describing the tracking target; and determining the tracking target in the initial frame image based on the feature information.
  • the extracting unit comprises:
  • a first determining subunit configured to determine an i-th bounding frame of the tracking target in the i-1th frame image, wherein the i-1th frame image belongs to the image data, and i is greater than or equal to 2
  • An integer of the i-th frame is the initial frame image when i is equal to 2;
  • a second determining subunit configured to determine an i-th image block in the i-th frame image based on the i-th bounding frame, wherein the i-th frame image is the subsequent frame image, the ith The center of the image block is the same as the center position of the i-1th bounding frame, and the area of the i-th image block is larger than the area of the i-th bounding frame;
  • a third determining subunit configured to determine the plurality of candidate targets within the ith image block.
  • the calculating unit comprises:
  • a first selection sub-unit configured to select a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
  • a first calculation subunit configured to calculate a first color feature vector of the first candidate target, and calculate a second color feature vector of the tracking target;
  • a second calculation subunit configured to calculate the first color feature vector and the second color The distance of the feature vector, wherein the distance is the similarity between the first candidate target and the tracking target.
  • the first calculating subunit is further configured to:
  • the first calculating subunit is further configured to:
  • a W main color Determining a W main color, W being a positive integer; calculating a projection weight of each pixel in the first region of the first mask image on each of the main colors, the first region being in the first mask image Any one of the M regions; and calculating a projection weight of each pixel in the second region of the second mask image on each of the main colors, the second region being in the second mask image Any one of the M regions; obtaining a W-dimensional color feature vector corresponding to each pixel in the first region based on a projection weight of each pixel in each of the first regions; and, based on a projection weight of each pixel in the second region on each of the main colors, obtaining a W-dimensional color feature vector corresponding to each pixel in the second region; and a W dimension corresponding to each pixel in the first region
  • the color feature vector is normalized to obtain a color feature vector of each pixel in the first region; and normalizing the W-dimensional color feature vector corresponding to each pixel in the second region to obtain the The color
  • the first calculating subunit is further configured to calculate a projection weight of the first pixel on each n main colors based on the following equation:
  • the first pixel is any one of the first region or the second region, and the nth main color is any one of the W main colors, and w n is a projection weight of the first pixel on the nth main color, I r , I g , and I b are RGB values of the first pixel; R n , G n , B n are the The RGB values of the n main colors.
  • the calculating unit comprises:
  • a second selection subunit configured to select a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
  • a normalization subunit configured to normalize an image of the first candidate target to an image of the tracking target to the same size
  • a first input subunit configured to input an image of the tracking target into a first convolutional network of a first depth neural network for feature calculation, to obtain a feature vector of the tracking target, wherein the first deep neural
  • the network is based on the Siamese structure
  • a second input subunit configured to input an image of the first candidate target into a second convolution network of the first depth neural network to perform feature calculation, to obtain a feature vector of the first candidate target,
  • the second convolution network and the first convolution network share a convolution layer parameter;
  • a third input subunit configured to input a feature vector of the tracking target and a feature vector of the first candidate target into a first fully connected network of the first depth neural network to perform a similarity calculation, to obtain the The similarity between the first candidate target and the tracking target.
  • the third determining subunit is further configured to:
  • the calculating unit comprises:
  • Extracting a subunit configured to extract a feature vector of the first candidate target from the feature vectors of the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
  • a fourth input subunit configured to input an image of the tracking target into a fourth convolution network of the second depth neural network to perform feature calculation, to obtain a feature vector of the tracking target, where the fourth The convolution network and the third convolution network share a convolution layer parameter;
  • a fifth input subunit configured to input a feature vector of the tracking target and a feature vector of the first candidate target into a second fully connected network of the second depth neural network to perform a similarity calculation, to obtain the The similarity between the first candidate target and the tracking target.
  • the present invention provides the following technical solutions through an embodiment of the present invention:
  • An electronic device comprising: a processor and a memory for storing a computer program executable on the processor, wherein the processor is operative to perform the steps of the method described above when the computer program is run.
  • the present invention provides the following technical solutions through an embodiment of the present invention:
  • a computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the method described above.
  • a tracking target is determined in an initial frame image of the image data; a plurality of candidate targets are extracted in a subsequent frame image of the image data; and a similarity between each candidate target and the tracking target is calculated; The highest candidate target is determined as the tracking target.
  • each The candidate target of one frame image is compared with the tracking target in the initial frame image, and the candidate target with the highest similarity among the candidate targets is determined as the tracking target, thereby implementing tracking of the tracking target.
  • the tracking method in the embodiment of the present invention can be regarded as determining whether the target is lost or not, and has a reliable judgment. Tracking whether the target is lost or not; and does not need to maintain the tracking template, avoiding the continuous update of the tracking template, causing the error to be continuously amplified, which is beneficial to recovering the lost tracking target, thereby improving the robustness of the tracking system.
  • FIG. 1 is a flowchart of a target tracking method according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of an initial frame image in an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of an initial tracking target in an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an image of a second frame in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of candidate objects determined in a second frame image according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a first deep neural network according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a second deep neural network according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
  • the embodiment of the present invention solves the prior art visual tracking method for online learning by providing a target tracking method and device, and has the technical problem that it is impossible to determine whether the tracking target is lost or not, and it is difficult to retrieve the tracking target after the lost.
  • a target tracking method is applied to an electronic device, wherein the electronic device has an image acquisition unit, and the image acquisition unit is configured to acquire image data, the method comprising: determining a tracking target in the initial frame image of the image data; Extracting a plurality of candidate targets in the subsequent frame image, the subsequent frame image is any frame image after the initial frame image; calculating the similarity between the candidate target and the tracking target; and maximizing the similarity between the candidate targets and the tracking target The candidate target is determined as the tracking target.
  • the embodiment provides a target tracking method, which is applied to an electronic device, and the electronic device may be: a ground robot (for example, a balance vehicle), or a drone (for example, a multi-rotor drone, or a fixed wing without
  • the electronic device is not limited to the specific embodiment of the device.
  • the electronic device has an image acquisition unit (for example, a camera), and the image acquisition unit is configured to collect image data.
  • the target tracking method includes:
  • Step S101 Determine a tracking target in the initial frame image of the image data.
  • step S101 includes:
  • an image acquired by the image acquisition unit may be acquired, and the image (for example, an initial frame image 300) is output through a display screen set on the electronic device, and a user executed is acquired.
  • Select an action for example, when the display is a touch screen, pass the touch
  • the touch screen acquires the user's selection operation, and then determines a tracking target (ie, initial tracking target 000) from the initial frame image 300 based on the selection operation.
  • the feature information for describing the tracking target is acquired, and the tracking target (ie, the initial tracking target 000) is determined in the initial frame image 300 in conjunction with a saliency detection or an object detection algorithm.
  • the image 311 of the initial tracking target 000 can be extracted and saved for backup, and the image 311 is the image in the first bounding frame 310.
  • Step S102 extracting a plurality of candidate targets in the subsequent frame image of the image data, and the subsequent frame image is any frame image subsequent to the initial frame image.
  • step S102 includes:
  • the i-1th frame image belongs to image data, i is an integer greater than or equal to 2; when i is equal to 2, the i-1th frame The image is the initial frame image); based on the i-1th bounding box, the i-th image block is determined in the i-th frame image, wherein the i-th frame image is the subsequent frame image, the center of the i-th image block and the i-th image 1 The center of the bounding box is the same, the area of the i-th image block is larger than the area of the i-1th bounding box; and a plurality of candidate targets are determined within the i-th image block.
  • FIG. 2 is an initial frame image including a plurality of person targets, and the tracking target to be tracked is a person in the first bounding box 310.
  • FIG. 4 is a second frame image in which the position or posture of each character object is changed.
  • the bounding frame (ie, the first bounding box 310) in the initial frame image 300 is determined (ie, the initial tracking target 000), and the bounding box is generally rectangular, and Can just surround the tracking target (ie: initial tracking target 000).
  • the bounding box is generally rectangular, and Can just surround the tracking target (ie: initial tracking target 000).
  • the position of the first bounding frame 310 based on the position of the first bounding frame 310 (the position of the first bounding frame 310 in the initial frame image 300 is the same as the position in the second frame image 400), one is determined in the second frame image 400.
  • the second image block 420 is the same as the center of the first bounding frame 310, but the second image block 420 is larger than the area of the first bounding frame 310, and is in the second image block.
  • Possible in 420 There are a plurality of targets, wherein the tracking target determined in the initial frame image 300 (ie, the initial tracking target 000) is in the second image block 420, where the second image can be utilized by methods such as saliency analysis or target detection.
  • the plurality of targets are determined in block 420 and determined as candidate targets (ie, candidate target 401, candidate target 402, candidate target 403, candidate target 404). Further, based on steps S103 to S104, the tracking target is determined from among the candidate targets, that is, the initial tracking target 000 is identified from the second frame image.
  • S103 to S104 specific embodiments of S103 to S104 will be described in detail later.
  • the bounding frame of the tracking target in the second frame image 400 (ie, the second bounding frame) is determined, based on the second surrounding. a frame, wherein an image block (ie, a third image block) is determined in the image of the third frame, and the third image block is the same as the center of the second frame, but the third image block is larger than the area of the second image block.
  • There may be multiple targets in the third image block wherein the tracking targets determined in the initial frame image are in these targets, and the method may be determined in the third image block by means of saliency analysis or target detection.
  • a plurality of targets and determining the plurality of targets as candidate targets. Further, based on steps S103 to S104, the tracking target is determined from among the candidate targets, that is, the initial tracking target 000 is identified from the third frame image.
  • the fourth image block is determined in the fourth frame image, and the plurality of candidate targets are determined in the fourth image block, and further, the steps are determined from the candidate targets based on steps S103 to S104.
  • Track the target ie: initial tracking target 000.
  • a plurality of candidate targets are determined in each frame image, and the tracking target is determined from the candidate targets based on steps S103 to S104 ( That is: the target 000) is initially tracked, thereby achieving the identification tracking of the tracking target.
  • images of each candidate target are extracted and saved for backup.
  • the image 421 of the candidate target 401, the image 422 of the candidate target 402, the image 423 of the candidate target 403, and the candidate mesh are extracted and saved.
  • Step S103 Calculate the similarity between the candidate target and the tracking target.
  • the similarity of each candidate target to the tracking target is calculated.
  • the tracking target is an initial tracking target 000 (shown in FIG. 3) determined in the initial frame image 300
  • the candidate target is from an ith image block in the ith frame image
  • the ith frame image is a Subsequent frame graphics (ie, any frame image after the initial frame graphic).
  • the candidate target includes the candidate target 401, the candidate target 402, the candidate target 403, and the candidate target 404 determined in the second frame image 400.
  • the target re-identification algorithm can be used to calculate the similarity between each candidate target and the tracking target.
  • the following three embodiments are available for step S103.
  • Manner 1 Calculate the similarity between each candidate target and the tracking target by using a color feature based target re-identification algorithm.
  • step S103 includes:
  • the color feature vector of the initial tracking target 000 is calculated, wherein the initial tracking target 000 is the tracking target determined in the initial frame image 300, as shown in FIG. 5, and then the candidate target is sequentially calculated.
  • the color feature vector of 401 and finally, the distance between the color feature vector of the initial tracking target 000 and the color feature vector of the candidate target 401 is calculated, which represents the similarity between the candidate target 401 and the initial tracking target 000.
  • the similarity between the candidate target 402, the candidate target 403, the candidate target 404, and the initial tracking target 000 is calculated separately.
  • the distance between the first color feature vector and the second color feature vector may be calculated based on the Euclidean distance formula.
  • the calculating the first color feature vector of the first candidate target and calculating the second color feature vector of the tracking target comprises:
  • Principal component segmentation is performed on the image of the first candidate target pair to obtain a first mask image; and the image of the tracking target is subjected to Saliency Segmentation to obtain a second mask image; the first mask image and the second mask are obtained The image is scaled to the same size; the first mask image is equally divided into M regions; and the second mask image is equally divided into M regions, M being a positive integer; the color feature vector of each region in the first mask image is calculated; Calculating a color feature vector of each region in the second mask image; sequentially connecting color feature vectors of each region in the first mask image to obtain a first color feature vector; and, for each region in the second mask image The color feature vectors are sequentially connected to obtain a second color feature vector.
  • the image 311 of the initial tracking target 000 may be first subjected to principal component segmentation to obtain a second mask.
  • Image in the mask image, only the principal component area keeps the pixel value consistent with the original image, and other regions have a pixel value of 0
  • the image 311 of the initial tracking target 000 is a rectangle and can immediately surround the initial tracking target 000
  • the second mask image is scaled to a preset size, and then the second mask image is equally divided into four regions (upper and lower halved, left and right halved), and then the color eigenvectors of each of the four regions are respectively calculated.
  • the color feature vectors of each of the four regions are sequentially connected (if the color feature vector of each region is a 10-dimensional vector, then a sequential connection obtains a 40-dimensional vector), and the tracking target is obtained after normalization ( That is: the color feature vector (ie, the second color feature vector) of the initial tracking target 000).
  • the image 421 of the candidate target 401 may be subjected to principal component segmentation to obtain a first mask image, wherein the candidate target image is 401.
  • the image block 421 is rectangular and can surround the candidate target 401, and then the first mask image is also scaled to a preset size, which is the same size as the second mask image, and the first mask image is equally divided into four regions (upper and lower Equally divided into two equal parts, and then calculate the color feature vector of each of the four regions separately, and finally connect the color feature vectors of each of the four regions sequentially (wherein, if the color of each region)
  • the feature vector is a 10-dimensional vector, and then a sequential connection obtains a 40-dimensional vector.
  • the color feature vector of the candidate target 401 is obtained.
  • the color feature vector of the candidate target 402, the color feature vector of the candidate target 403, and the color feature vector of the candidate target 404 are respectively calculated.
  • the calculating a color feature vector of each region in the first mask image; and calculating a color feature vector of each region in the second mask image includes:
  • W is a positive integer; calculating a projection weight of each pixel in the first region of the first mask image on each of the main colors, the first region being any of the M regions in the first mask image An area; and calculating a projection weight of each pixel in the second area of the second mask image on each of the main colors, the second area being any one of the M areas in the second mask image; a projection weight of each pixel in the region on each of the main colors, obtaining a W-dimensional color feature vector corresponding to each pixel in the first region; and, based on the projection weight of each pixel in each of the main colors in the second region Obtaining a W-dimensional color feature vector corresponding to each pixel in the second region; normalizing the W-dimensional color feature vector corresponding to each pixel in the first region to obtain a color feature vector of each pixel in the first region; And normalizing the W-dimensional color feature vector corresponding to each pixel in the second region to obtain a color feature vector of each pixel in the second region; adding
  • the first mask image is equally divided into four regions (upper and lower halving, left and right halving)
  • An area ie, the first area
  • the second mask image is equally divided into four regions (upper and lower halving, left and right halving)
  • the color feature vector of each region in the second mask image first, from the four regions Any one of the regions (ie, the second region) is selected, and the projection weight of each pixel in the second region on each of the main colors is calculated, and the projection weight of each of the pixels in the second region is obtained.
  • each pixel obtains the first 10-dimensional color feature vector, and then normalizes the 10-dimensional color feature vector, and as the color feature vector of the pixel, obtains the color features of all the pixels in the second region. After the vector, the color feature vectors of all the pixels are added, and finally, the color feature vector of the second region is obtained. Based on the method, the color feature vector of each of the four regions in the second mask image can be calculated.
  • the projection weight of the first pixel on every n main colors can be calculated based on the following equation:
  • the first pixel is any one of the first region or the second region
  • the nth main color is any one of the main colors of W
  • w n is the first pixel on the nth main color
  • the projection weights, I r , I g , and I b are the RGB values of the first pixel
  • R n , G n , B n are the RGB values of the nth main color.
  • n is the number of the above 10 main colors.
  • w 2 is the projection weight of the pixel on yellow
  • R 2 , G 2 , and B 2 are yellow RGB values
  • I r , I g , and I b are the RGB values of the pixel.
  • step S103 includes:
  • a first candidate target is selected from a plurality of candidate targets, wherein the first candidate target is any one of a plurality of candidate targets; and the image of the first candidate target and the image of the tracking target are returned
  • the image of the tracking target is input to the first convolution network 601 of the first depth neural network through the first input terminal 611 for feature calculation, and the feature vector of the tracking target is obtained, wherein the first depth neural network Based on the Siamese structure; the image of the first candidate target is input into the second convolution network 602 of the first depth neural network through the second input end 612 to perform feature calculation, and the feature vector of the first candidate target is obtained, wherein the second volume
  • the product network 602 and the first convolutional network 601 share the convolution layer parameters, that is, the volume base layer parameters are the same; the feature vector of the tracking target and the feature vector of the first candidate target are input to the first fully connected layer 603 of the first deep neural network.
  • the similarity of the targets is tracked, wherein the outputs of the first convolutional network 601 and the second convolutional network 602 are automatically entered as inputs to the first fully connected network 603.
  • the first deep neural network needs to be trained offline (as shown in FIG. 6), and the first deep neural network includes a first convolutional network 601, a second convolutional network 602, and a first fully connected network 603, An input terminal 611, a second input terminal 612, and a first output terminal 621, wherein the first convolutional network 601 and the second convolutional network 602 are bilateral deep neural networks adopting a Siamese structure, and each side of the network adopts AlexNet.
  • the network structure before FC6 in the network, the first convolutional network 601 and the second convolutional network 602 all contain a plurality of convolution layers, the convolutional layer in the first convolutional network 601 and the second convolutional network 602.
  • the convolutional layers are mutually shared convolutional layers with the same parameters.
  • the images input by the first convolutional network 601 and the second convolutional network 602 need to be normalized to the same size.
  • the image of the normalized tracking target is input into the first convolution network 601, and the feature vector of the tracking target can be obtained; and the image of the normalized first candidate target is input to the second convolution network.
  • a feature vector of the first candidate target can be obtained.
  • the first convolutional layer 601 and the second convolutional layer 602 are connected to the first fully connected network 603.
  • the first fully connected network 603 includes a plurality of fully connected layers for calculating the distance between the input feature vectors on both sides. The similarity between the first candidate target and the tracking target.
  • the parameters in the first deep neural network are obtained through offline learning, and the method of training the first deep neural network is consistent with the training method of the general convolutional neural network. After the offline training is finished, the first deep nerve can be obtained.
  • the network network is used in the tracking system.
  • the image 421 of the candidate target 401 and the image 311 of the initial tracking target 000 may be first normalized to the same size;
  • the image 311 of the initial tracking target 000 is input into the first convolution network 601 to obtain the feature vector of the initial tracking target 000, and the image 421 of the candidate target 401 is used in the second convolution network 602 to obtain the feature vector of the candidate target 401;
  • the feature vector of the initial tracking target 000 and the feature vector of the candidate target 401 are input to the first full connection.
  • the network 603 is connected to obtain the similarity between the candidate target 401 and the initial tracking target 000.
  • the image 422 of the candidate target 402 is normalized to the image 311 corresponding to the initial tracking target 000
  • the image 311 of the initial tracking target 000 is input into the first convolution network 601
  • the image of the candidate target 402 is taken.
  • the 422 is input to the second convolutional network 602 to obtain the similarity between the candidate target 402 and the initial tracking target 000.
  • the similarity of the candidate target 403 and the initial tracking target 000, and the similarity of the candidate target 404 and the initial tracking target 000 can be obtained.
  • Manner 3 Using a deep neural network, simultaneously generating candidate targets and calculating the similarity between each candidate target and the tracking target.
  • a second method as shown in FIG. 7 may be utilized. Deep neural network.
  • the second deep neural network may be trained offline, the second deep neural network is based on the Siamese structure, and the second deep neural network includes a third convolutional network 604, a fourth convolutional network 605, and an RPN ( Region Proposal Network, network 607 and second fully connected network 606, third input 613, fourth input 614, and second output 622.
  • the output of the third convolutional network 604 is input to the RPN network 607, and the fourth convolutional network 605 and the RPN network 607 are simultaneously connected to the second fully connected network 606.
  • the third convolutional network 604 includes a plurality of convolution layers for performing feature calculation on the i-th image block, and the third convolution network 604 is used to obtain a feature map of the i-th image block, and the RPN network 607 is configured to A feature map of the i-th image block, a plurality of candidate targets are extracted from the i-th image block, and feature vectors of each candidate target are calculated.
  • the second deep neural network shown in FIG. 7 is mainly different from the first deep neural network shown in FIG. 6 in the lower half of FIG.
  • the third convolutional network 604 in FIG. 7 takes the i-th image block as an input, and additionally adds an RPN network 607, which is the i-th image.
  • the candidate target is extracted on the feature map obtained after the block is calculated by the third convolution network 604.
  • the RPN network 607 directly uses the feature map calculated by the third convolution network 604 to perform calculation, and directly finds the candidate target in the feature after the calculation.
  • the corresponding position on the map directly acquires the feature vector of each candidate target on the feature map, and then the feature vector corresponding to the initial tracking target 000 is input to the second fully connected network 606 to calculate the similarity.
  • the ith image block may be input into the third convolution network 604 of the second depth neural network through the fourth input terminal 614 to perform feature calculation, and obtain a feature map of the ith image block;
  • the feature map of the block is input to the RPN network 607 of the second depth neural network for feature calculation, a plurality of candidate targets are extracted, and feature vectors of each candidate target are calculated.
  • the second image block 420 can be input into the third convolution network 604 of the second depth neural network to obtain a feature map of the second image block 420, and the feature image of the second image block 420 is input to the second image block 420.
  • a plurality of candidate targets ie, candidate target 401, candidate target 402, candidate target 404, candidate target 404 are extracted, and feature vectors of each candidate target can also be obtained.
  • step S103 includes:
  • the convolutional layer and the third convolutional network 604 in the product network 605 share the convolutional layer parameters, i.e., the volume base layer parameters are the same.
  • the second deep neural network includes a third convolution
  • the network 604 and the RPN network 607 further include a fourth convolution network 605 and a second fully connected network 606.
  • the RPN network 704 is configured to extract a plurality of candidate targets based on the feature map output by the third convolution network 604. And calculating the feature vector of each candidate target, and inputting the feature vector of each candidate target into the second fully connected network 606 in sequence, and the fourth convolution network 605 is configured to calculate the feature vector of the tracking target and output to the second full connection.
  • the network 606, the second fully connected network 606 is configured to calculate the similarity between the first candidate target and the tracking target based on the feature vector of the first candidate target and the feature vector of the tracking target.
  • the candidate can be obtained by the calculation of the third convolutional network 604 and the RPN network 607.
  • the image 311 corresponding to the initial tracking target 000 is input to the fourth convolution network 605 of the second deep neural network, and the similarity between the candidate target 401 and the initial tracking target 000 can be calculated by the second fully connected network 606.
  • Step S104 Determine a candidate target having the highest similarity with the tracking target among the plurality of candidate targets as the tracking target.
  • the candidate with the highest similarity can be used as the tracking target.
  • the candidate target 402 continues to be tracked as the tracking target.
  • the above mainly takes the second frame image 400 as an example, and for each candidate target in the second image block 420 in the second frame image 400, the similarity between each candidate target and the initial tracking target 000 is calculated separately, and is similar.
  • the candidate with the highest degree is used as the tracking target in the image of the second frame.
  • subsequent frame images for example, the third frame image, the fourth frame image, the fifth frame image, (7), The same is true, the similarity between each candidate target and the initial tracking target 000 in each frame image is calculated, and the candidate object with the highest similarity is used as the tracking target in the frame image.
  • the target tracking method in the embodiment of the present invention can be regarded as determining whether the target is lost or not, and the processing is reliable. It is not necessary to maintain the tracking template, and the tracking template is not required to be maintained, so that the error is continuously amplified, which is beneficial to recovering the tracking target, thereby improving the robustness of the tracking system.
  • the embodiment provides an electronic device, which has an image acquisition unit, and the image acquisition unit is configured to collect image data.
  • the electronic device includes:
  • the first determining unit 801 is configured to determine a tracking target in the initial frame image of the image data
  • the extracting unit 802 is configured to extract a plurality of candidate targets in the subsequent frame image of the image data, where the subsequent frame image is any frame image subsequent to the initial frame image;
  • the calculating unit 803 is configured to calculate a similarity between the candidate target and the tracking target;
  • the second determining unit 804 is configured to determine a candidate target that has the highest similarity with the tracking target among the plurality of candidate targets as the tracking target.
  • the first determining unit 801 includes:
  • a first determining subunit configured to acquire a user's selection operation after outputting the initial frame image through the display screen; determining a tracking target in the initial frame image based on the user's selection operation;
  • a second determining subunit configured to acquire feature information for describing the tracking target; and determining a tracking target in the initial frame image based on the feature information.
  • the extracting unit 802 includes:
  • a first determining subunit configured to determine an i-1 bounding frame of the tracking target in the i-1th frame image, wherein the i-1th frame image belongs to image data, and i is an integer greater than or equal to 2; 2, the image of the i-1th frame is the initial frame image;
  • a second determining subunit configured to determine an i-th image block in the i-th frame image based on the i-th bounding frame, wherein the i-th frame image is a subsequent frame image, the center of the i-th image block and the i-th image 1
  • the center position of the bounding frame is the same, and the area of the i-th image block is larger than the area of the i-1th bounding frame;
  • a third determining subunit configured to determine a plurality of candidate targets within the ith image block.
  • the calculating unit 803 includes:
  • a first selection sub-unit configured to select a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
  • a first calculation subunit configured to calculate a first color feature vector of the first candidate target and calculate a second color feature vector of the tracking target
  • a second calculating subunit configured to calculate a distance between the first color feature vector and the second color feature vector, wherein the distance is the similarity between the first candidate target and the tracking target.
  • the first computing subunit is further configured to:
  • the first computing subunit is further configured to:
  • W is a positive integer; calculating a projection weight of each pixel in the first region of the first mask image on each of the main colors, the first region being any of the M regions in the first mask image An area; and calculating a projection weight of each pixel in the second area of the second mask image on each of the main colors, the second area being any one of the M areas in the second mask image; a projection weight of each pixel in the region on each of the main colors, obtaining a W-dimensional color feature vector corresponding to each pixel in the first region; and, based on the projection weight of each pixel in each of the main colors in the second region Obtaining a W-dimensional color feature vector corresponding to each pixel in the second region; normalizing the W-dimensional color feature vector corresponding to each pixel in the first region to obtain a color feature vector of each pixel in the first region; And normalizing the W-dimensional color feature vector corresponding to each pixel in the second region to obtain a color feature vector of each pixel in the second region; adding
  • the first calculating subunit is further configured to calculate a projection weight of the first pixel on each n main colors based on the following equation:
  • the first pixel is any of the first or second area of a pixel
  • the primary colors of n kinds of primary colors W is any one of a primary color
  • the projection weights, I r , I g , and I b are the RGB values of the first pixel; R n , G n , B n are the RGB values of the nth main color.
  • the calculating unit 803 includes:
  • a second selection subunit configured to select a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
  • a normalized subunit configured to normalize an image of the first candidate target to an image of the tracking target to the same size
  • a first input subunit configured to input an image of the tracking target into a first convolution network of the first depth neural network for feature calculation, to obtain a feature vector of the tracking target, wherein the first depth neural network is based on the Siamese structure;
  • a second input subunit configured to input an image of the first candidate target into a second convolution network of the first depth neural network to perform feature calculation, to obtain a feature vector of the first candidate target;
  • a third input subunit configured to input the feature vector of the tracking target and the feature vector of the first candidate target into the first fully connected network of the first depth neural network for similarity calculation, to obtain the first candidate target and the tracking target Similarity.
  • the third determining subunit is further configured to:
  • the ith image block is input into a third convolution network of the second depth neural network to perform feature calculation, and the feature map of the ith image block is obtained, wherein the second depth neural network is based on the Siamese structure; and the feature of the ith image block is The map is input to the RPN network of the second deep neural network, and a plurality of candidate targets are extracted, and feature vectors of the plurality of candidate targets are obtained.
  • the calculating unit 803 includes:
  • Extracting a subunit configured to extract a feature vector of the first candidate target from the feature vectors of the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
  • a fourth input subunit configured to input an image of the tracking target into a fourth convolution network of the second depth neural network for feature calculation, to obtain a feature vector of the tracking target;
  • a fifth input subunit configured to input the feature vector of the tracking target and the feature vector of the first candidate target into the second fully connected network of the second depth neural network to perform similarity calculation, to obtain the first candidate target and the tracking target Similarity.
  • the electronic device introduced in this embodiment is an electronic device used in the method for implementing the target tracking method in the embodiment of the present invention. Therefore, those skilled in the art can understand the method based on the target tracking method introduced in the embodiment of the present invention.
  • the specific embodiment of the electronic device of the embodiment and various variations thereof, so how to implement the invention for the electronic device The method in the example is not described in detail.
  • the electronic device used in the method of the subject tracking method in the embodiments of the present invention is within the scope of the present invention.
  • the first determining unit 801, the extracting unit 802, the calculating unit 803, and the second determining unit 804 may all run on an electronic device, and may be a central processing unit (CPU) or a microprocessor located on the electronic device. (MPU), or digital signal processor (DSP), or programmable gate array (FPGA) implementation.
  • CPU central processing unit
  • MPU microprocessor located on the electronic device.
  • DSP digital signal processor
  • FPGA programmable gate array
  • the candidate target of each subsequent frame image is compared with the tracking target in the initial frame image, the candidate object with the highest similarity among the candidate targets is determined as the tracking target, thereby implementing tracking of the tracking target.
  • the processing of each frame after the initial frame can be regarded as determining whether the target is lost or not. It has the advantage of reliably judging whether the tracking target is lost or not; and it does not need to maintain the tracking template, which avoids the continuous updating of the tracking template, so that the error is continuously amplified, which is beneficial to recovering the tracking target, thereby improving the robustness of the tracking system. Sex.
  • the electronic device includes: a processor and a memory for storing a computer program executable on the processor, wherein the processor is configured to execute the computer program when The steps of the method.
  • the memory may be implemented by any type of volatile or non-volatile storage device, or a combination thereof.
  • the non-volatile memory may be a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), or an Erasable Programmable Read (EPROM). Only Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM), Ferromagnetic Random Access Memory (FRAM), Flash Memory, Magnetic Surface Memory , CD, or CD-ROM (CD-ROM, Compact Disc) Read-Only Memory); the magnetic surface memory can be a disk storage or a tape storage.
  • the volatile memory can be a random access memory (RAM) that acts as an external cache.
  • RAM Random Access Memory
  • SRAM Static Random Access Memory
  • SSRAM Synchronous Static Random Access Memory
  • SSRAM Dynamic Random Access
  • DRAM Dynamic Random Access Memory
  • SDRAM Synchronous Dynamic Random Access Memory
  • DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • ESDRAM enhancement Enhanced Synchronous Dynamic Random Access Memory
  • SLDRAM Synchronous Dynamic Random Access Memory
  • DRRAM Direct Memory Bus Random Access Memory
  • the processor may be an integrated circuit chip with signal processing capabilities.
  • each step of the above method may be completed by an integrated logic circuit of hardware in a processor or an instruction in a form of software.
  • the above processor may be a general purpose processor, a digital signal processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or the like.
  • the processor may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present invention.
  • a general purpose processor can be a microprocessor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiment of the present invention may be directly implemented as a hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a storage medium, the storage medium being located in the memory, the processor reading the information in the memory, and completing the steps of the foregoing methods in combination with the hardware thereof.
  • Embodiments of the present invention also provide a computer readable storage medium, including, for example, a computer A memory of the program, which may be executed by a processor of the electronic device described above to perform the steps described in the foregoing methods.
  • the computer readable storage medium may be a memory such as FRAM, ROM, programmable read only memory PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM; or may include one or any combination of the above memories.
  • Various equipment may be used to store data into a computer readable storage medium.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • the embodiment of the invention has the advantages of reliably determining whether the tracking target is lost or not, and does not need to maintain the tracking template, and avoids the continuous updating of the tracking template, so that the error is continuously amplified, which is beneficial to recovering the tracking target and the tracking target, thereby improving the tracking.
  • the robustness of the system is the advantages of reliably determining whether the tracking target is lost or not, and does not need to maintain the tracking template, and avoids the continuous updating of the tracking template, so that the error is continuously amplified, which is beneficial to recovering the tracking target and the tracking target, thereby improving the tracking.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

A target tracking method, an electronic device, and a storage medium. The electronic device comprises an image collection unit. The image collection unit is configured to collect image data. The method is applied to the electronic device, comprising: determining, in an initial frame of image of the image data, a target to be tracked (S101); extracting a plurality of candidate targets from a subsequent frame of image of the image data, wherein the subsequent frame of image is any frame of image following the initial frame of image (S102); calculating the similarities between the candidate targets and the target to be tracked (S103); and determining a candidate target in the plurality of candidate targets that has a highest similarity with the target to be tracked as the target to be tracked (S104). The method resolves the technical problems in the prior art that it cannot be determined whether a target to be tracked is lost and it is difficult to find a target to be tracked after the target to be tracked is lost in a visual tracking method of online learning.

Description

一种目标跟踪方法及电子设备、存储介质Target tracking method, electronic device and storage medium
相关申请的交叉引用Cross-reference to related applications
本申请基于申请号为201611041675.6、申请日为2016年11月11日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。The present application is filed on the basis of the Chinese Patent Application Serial No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No
技术领域Technical field
本发明涉及电子技术领域,尤其涉及一种目标跟踪方法及电子设备、存储介质。The present invention relates to the field of electronic technologies, and in particular, to a target tracking method, an electronic device, and a storage medium.
背景技术Background technique
基于在线学习的视觉跟踪技术在近年来兴起之后,成为视觉跟踪的一个热点。此类方法在没有任何离线学习的先验经验的前提下,根据初始帧画面中指定的跟踪目标提取特征模板,训练模型用于后续视频中对于该目标的跟踪,在跟踪过程中,根据跟踪状态更新模型,以适应目标的姿态变化。该类方法不需要任何的离线训练,可以对用户指定的任何物体进行跟踪,具有较高的通用性。The visual tracking technology based on online learning has become a hot spot of visual tracking after its rise in recent years. Such a method extracts feature templates according to the specified tracking targets in the initial frame picture without any prior experience of offline learning. The training model is used for tracking the target in subsequent videos. In the tracking process, according to the tracking status Update the model to accommodate changes in the target's posture. This type of method does not require any offline training, and can track any object specified by the user, which has high versatility.
但是,由于跟踪目标的特征及模板单一,在目标的跟踪过程中,很难判断目标是否跟丢;并且在目标跟丢之后,跟踪模板的持续更新会使误差被持续放大,导致目标难以找回,难以形成稳定的跟踪系统。However, due to the single feature of the tracking target and the single template, it is difficult to judge whether the target is lost or not during the tracking process of the target; and after the target is lost, the continuous update of the tracking template will continue to enlarge the error, making the target difficult to retrieve. It is difficult to form a stable tracking system.
发明内容Summary of the invention
本发明实施例通过提供一种目标跟踪方法及电子设备、存储介质,解决了现有技术中的在线学习的视觉跟踪方法,存在无法判断跟踪目标是否 跟丢,以及跟丢后难以找回跟踪目标的技术问题。The embodiment of the present invention solves the visual tracking method of online learning in the prior art by providing a target tracking method, an electronic device, and a storage medium, and it is impossible to determine whether the tracking target is It’s hard to get back the technical problems of tracking targets after losing them.
一方面,本发明通过本发明的一实施例提供如下技术方案:In one aspect, the present invention provides the following technical solutions through an embodiment of the present invention:
一种目标跟踪方法,应用于电子设备中,所述电子设备具有图像采集单元,所述图像采集单元用于采集图像数据,所述方法包括:A target tracking method is applied to an electronic device, wherein the electronic device has an image capturing unit, and the image collecting unit is configured to collect image data, and the method includes:
在所述图像数据的初始帧图像中确定一跟踪目标;Determining a tracking target in an initial frame image of the image data;
在所述图像数据的后续帧图像中提取多个候选目标,所述后续帧图像是所述初始帧图像之后的任一帧图像;Extracting a plurality of candidate targets in a subsequent frame image of the image data, the subsequent frame images being any frame image subsequent to the initial frame image;
计算出每个候选目标与所述跟踪目标的相似度;Calculating the similarity between each candidate target and the tracking target;
将所述多个候选目标中的与所述跟踪目标的相似度最高的候选目标确定为所述跟踪目标。A candidate target having the highest similarity with the tracking target among the plurality of candidate targets is determined as the tracking target.
优选地,所述在图像数据的初始帧图像中确定一跟踪目标,包括:Preferably, the determining a tracking target in the initial frame image of the image data comprises:
在通过显示屏输出所述初始帧图像后,获取用户的选择操作;基于用户的选择操作,在所述初始帧图像中确定所述跟踪目标;或者After outputting the initial frame image through the display screen, acquiring a user's selection operation; determining the tracking target in the initial frame image based on a user's selection operation; or
获取用于描述所述跟踪目标的特征信息;基于所述特征信息,在所述初始帧图像中确定所述跟踪目标。Obtaining feature information for describing the tracking target; determining the tracking target in the initial frame image based on the feature information.
优选地,所述在图像数据的后续帧图像中提取多个候选目标,包括:Preferably, the extracting a plurality of candidate targets in the subsequent frame image of the image data comprises:
确定所述跟踪目标在第i-1帧图像中的第i-1包围框,其中,所述第i-1帧图像属于所述图像数据,i为大于等于2的整数;在i等于2时,所述第i-1帧图像即为所述初始帧图像;Determining an i-th bounding frame of the tracking target in the i-1th frame image, wherein the i-1th frame image belongs to the image data, and i is an integer greater than or equal to 2; when i is equal to 2 The image of the i-1th frame is the initial frame image;
基于所述第i-1包围框,在第i帧图像中确定第i图像块,其中,所述第i帧图像即为所述后续帧图像,所述第i图像块的中心与所述第i-1包围框的中心位置相同,所述第i图像块的面积大于所述第i-1包围框的面积;Determining, in the ith frame image, an ith image block, wherein the ith frame image is the subsequent frame image, a center of the ith image block, and the first The center position of the i-1 enclosing frame is the same, and the area of the i-th image block is larger than the area of the i-th enclosing frame;
在所述第i图像块内确定所述多个候选目标。The plurality of candidate targets are determined within the ith image block.
优选地,所述计算出每个候选目标与所述跟踪目标的相似度,包括:Preferably, the calculating the similarity between each candidate target and the tracking target comprises:
从所述多个候选目标中选出第一候选目标,其中,所述第一候选目标 是所述多个候选目标中的任一候选目标;Selecting a first candidate target from the plurality of candidate targets, wherein the first candidate target Is any one of the plurality of candidate targets;
计算所述第一候选目标的第一颜色特征向量,以及计算所述跟踪目标的第二颜色特征向量;Calculating a first color feature vector of the first candidate target, and calculating a second color feature vector of the tracking target;
计算所述第一颜色特征向量和所述第二颜色特征向量的距离,其中,所述距离即为所述第一候选目标与所述跟踪目标的相似度。Calculating a distance between the first color feature vector and the second color feature vector, wherein the distance is a similarity between the first candidate target and the tracking target.
优选地,所述计算所述第一候选目标的第一颜色特征向量,以及计算所述跟踪目标的第二颜色特征向量,包括:Preferably, the calculating the first color feature vector of the first candidate target and calculating the second color feature vector of the tracking target comprises:
将所述第一候选目标的图像进行主成分分割,获得第一mask图像;以及,将所述跟踪目标的图像进行主成分分割,获得第二mask图像;Performing main component segmentation on the image of the first candidate target to obtain a first mask image; and performing principal component segmentation on the image of the tracking target to obtain a second mask image;
将所述第一mask图像和所述第二mask图像缩放至相同大小;Scaling the first mask image and the second mask image to the same size;
将所述第一mask图像平均分成M个区域;以及,将所述第二mask图像平均分成M个区域,M为正整数;And dividing the first mask image into M regions; and dividing the second mask image into M regions, where M is a positive integer;
计算所述第一mask图像中每个区域的颜色特征向量;以及,计算所述第二mask图像中每个区域的颜色特征向量;Calculating a color feature vector of each region in the first mask image; and calculating a color feature vector of each region in the second mask image;
将所述第一mask图像中每个区域的颜色特征向量顺序连接,获得所述第一颜色特征向量;以及,将所述第二mask图像中每个区域的颜色特征向量顺序连接,获得所述第二颜色特征向量。And sequentially connecting color feature vectors of each region in the first mask image to obtain the first color feature vector; and sequentially connecting color feature vectors of each region in the second mask image to obtain the The second color feature vector.
优选地,所述计算所述第一mask图像中每个区域的颜色特征向量;以及,计算所述第二mask图像中每个区域的颜色特征向量,包括:Preferably, the calculating a color feature vector of each region in the first mask image; and calculating a color feature vector of each region in the second mask image comprises:
确定W种主颜色,W为正整数;Determine the W main color, W is a positive integer;
计算所述第一mask图像中第一区域中每个像素在每种主颜色上的投影权重,所述第一区域是所述第一mask图像中的M个区域中的任一区域;以及,计算所述第二mask图像中第二区域中每个像素在每种主颜色上的投影权重,所述第二区域是所述第二mask图像中的M个区域中的任一区域;Calculating a projection weight of each pixel in the first region of the first mask image on each of the main colors, the first region being any one of the M regions in the first mask image; Calculating a projection weight of each pixel in the second region of the second mask image on each of the main colors, the second region being any one of the M regions in the second mask image;
基于所述第一区域中每个像素在每种主颜色上的投影权重,获得所述 第一区域中每个像素对应的W维颜色特征向量;以及,基于所述第二区域中每个像素在每种主颜色上的投影权重,获得所述第二区域中每个像素对应W维颜色特征向量;Obtaining the projection weight based on each of the primary colors in each of the primary colors a W-dimensional color feature vector corresponding to each pixel in the first region; and, based on a projection weight of each pixel in each of the second colors in the second region, obtaining a W-dimension corresponding to each pixel in the second region Color feature vector
对所述第一区域中每个像素对应的W维颜色特征向量进行归一化,获得所述第一区域中每个像素的颜色特征向量;以及,对所述第二区域中每个像素对应的W维颜色特征向量进行归一化,获得所述第二区域中每个像素的颜色特征向量;Normalizing a W-dimensional color feature vector corresponding to each pixel in the first region to obtain a color feature vector of each pixel in the first region; and corresponding to each pixel in the second region And normalizing the W-dimensional color feature vector to obtain a color feature vector of each pixel in the second region;
将所述第一区域中每个像素的颜色特征向量相加,获得所述第一区域的颜色特征向量;以及,将所述第二区域中每个像素的颜色特征向量相加,获得所述第二区域的颜色特征向量。Adding color feature vectors of each pixel in the first region to obtain a color feature vector of the first region; and adding color feature vectors of each pixel in the second region to obtain the The color feature vector of the second region.
优选地,基于如下等式,计算第一像素在每n种主颜色上的投影权重:Preferably, the projection weight of the first pixel on each n primary colors is calculated based on the following equation:
Figure PCTCN2017110577-appb-000001
Figure PCTCN2017110577-appb-000001
其中,所述第一像素为所述第一区域或所述第二区域中的任一像素,所述第n种主颜色是所述W种主颜色中的任意一种主颜色,wn为所述第一像素在所述第n种主颜色上的投影权重,Ir,、Ig,、Ib为所述第一像素的RGB值;Rn、Gn、Bn为所述第n种主颜色的RGB值。The first pixel is any one of the first region or the second region, and the nth main color is any one of the W main colors, and w n is a projection weight of the first pixel on the nth main color, I r , I g , and I b are RGB values of the first pixel; R n , G n , B n are the The RGB values of the n main colors.
优选地,所述计算出每个候选目标与所述跟踪目标的相似度,包括:Preferably, the calculating the similarity between each candidate target and the tracking target comprises:
从所述多个候选目标中选出第一候选目标,其中,所述第一候选目标是所述多个候选目标中的任一候选目标;Selecting a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
将所述第一候选目标的图像与所述跟踪目标的图像归一化至相同大小;Normalizing an image of the first candidate target with an image of the tracking target to the same size;
将所述跟踪目标的图像输入至第一深度神经网络的第一卷积网络中进行特征计算,获得所述跟踪目标的特征向量,其中,所述第一深度神经网络基于Siamese结构;Entering an image of the tracking target into a first convolutional network of a first depth neural network for feature calculation to obtain a feature vector of the tracking target, wherein the first depth neural network is based on a Siamese structure;
将所述第一候选目标的图像输入至所述第一深度神经网络的第二卷积 网络中进行特征计算,获得所述第一候选目标的特征向量;Inputting an image of the first candidate target to a second convolution of the first depth neural network Performing feature calculation in the network to obtain a feature vector of the first candidate target;
将所述跟踪目标的特征向量和所述第一候选目标的特征向量输入至所述第一深度神经网络的第一全连接网络中进行相似度计算,获得所述第一候选目标与所述跟踪目标的相似度。Inputting a feature vector of the tracking target and a feature vector of the first candidate target into a first fully connected network of the first depth neural network for similarity calculation, obtaining the first candidate target and the tracking The similarity of the target.
优选地,所述在所述第i图像块内确定所述多个候选目标,包括:Preferably, the determining the plurality of candidate targets in the ith image block comprises:
将所述第i图像块输入至第二深度神经网络的第三卷积网络中进行特征计算,获得所述第i图像块的特征图,其中,所述第二深度神经网络基于Siamese结构;Inputting the ith image block into a third convolution network of the second depth neural network to perform feature calculation, to obtain a feature map of the ith image block, wherein the second depth neural network is based on a Siamese structure;
将所述第i图像块的特征图输入至所述深度神经网络的RPN网络中,获得所述多个候选目标以及所述多个候选目标的特征向量。And inputting a feature map of the ith image block into an RPN network of the deep neural network to obtain feature numbers of the plurality of candidate targets and the plurality of candidate targets.
优选地,所述计算出每个候选目标与所述跟踪目标的相似度,包括:Preferably, the calculating the similarity between each candidate target and the tracking target comprises:
从所述多个候选目标的特征向量中提取第一候选目标的特征向量,其中,所述第一候选目标为所述多个候选目标中的任一候选目标;Extracting a feature vector of the first candidate target from the feature vectors of the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
将所述跟踪目标的图像输入至所述第二深度神经网络的第四卷积网络中进行特征计算,获得所述跟踪目标的特征向量,所述第四卷积网络和所述第三卷积网络共享卷积层参数;Inputting an image of the tracking target into a fourth convolution network of the second depth neural network for feature calculation, obtaining a feature vector of the tracking target, the fourth convolution network and the third convolution Network shared convolutional layer parameters;
将所述跟踪目标的特征向量和所述第一候选目标的特征向量输入至所述第二深度神经网络的第二全连接网络中进行相似度计算,获得所述第一候选目标与所述跟踪目标的相似度。Inputting a feature vector of the tracking target and a feature vector of the first candidate target into a second fully connected network of the second depth neural network to perform a similarity calculation, obtaining the first candidate target and the tracking The similarity of the target.
另一方面,本发明通过本发明的一实施例,提供如下技术方案:In another aspect, the present invention provides the following technical solutions through an embodiment of the present invention:
一种电子设备,所述电子设备具有图像采集单元,所述图像采集单元用于采集图像数据,所述电子设备,包括:An electronic device having an image acquisition unit, the image acquisition unit is configured to collect image data, and the electronic device includes:
第一确定单元,配置为在所述图像数据的初始帧图像中确定一跟踪目标;a first determining unit, configured to determine a tracking target in an initial frame image of the image data;
提取单元,配置为在所述图像数据的后续帧图像中提取多个候选目标, 所述后续帧图像是所述初始帧图像之后的任一帧图像;An extracting unit configured to extract a plurality of candidate targets in a subsequent frame image of the image data, The subsequent frame image is any frame image subsequent to the initial frame image;
计算单元,配置为计算出每个候选目标与所述跟踪目标的相似度;a calculating unit configured to calculate a similarity between each candidate target and the tracking target;
第二确定单元,配置为将所述多个候选目标中的与所述跟踪目标的相似度最高的候选目标确定为所述跟踪目标。The second determining unit is configured to determine, as the tracking target, a candidate target that has the highest similarity with the tracking target among the plurality of candidate targets.
优选地,所述第一确定单元,包括:Preferably, the first determining unit includes:
第一确定子单元,配置为在通过显示屏输出所述初始帧图像后,获取用户的选择操作;基于用户的选择操作,在所述初始帧图像中确定所述跟踪目标;或者,a first determining subunit configured to acquire a user's selection operation after outputting the initial frame image through the display screen; determining the tracking target in the initial frame image based on a user's selection operation; or
第二确定子单元,配置为获取用于描述所述跟踪目标的特征信息;基于所述特征信息,在所述初始帧图像中确定所述跟踪目标。a second determining subunit configured to acquire feature information for describing the tracking target; and determining the tracking target in the initial frame image based on the feature information.
优选地,所述提取单元,包括:Preferably, the extracting unit comprises:
第一确定子单元,配置为确定所述跟踪目标在第i-1帧图像中的第i-1包围框,其中,所述第i-1帧图像属于所述图像数据,i为大于等于2的整数;在i等于2时,所述第i-1帧图像即为所述初始帧图像;a first determining subunit, configured to determine an i-th bounding frame of the tracking target in the i-1th frame image, wherein the i-1th frame image belongs to the image data, and i is greater than or equal to 2 An integer of the i-th frame is the initial frame image when i is equal to 2;
第二确定子单元,配置为基于所述第i-1包围框,在第i帧图像中确定第i图像块,其中,所述第i帧图像即为所述后续帧图像,所述第i图像块的中心与所述第i-1包围框的中心位置相同,所述第i图像块的面积大于所述第i-1包围框的面积;a second determining subunit, configured to determine an i-th image block in the i-th frame image based on the i-th bounding frame, wherein the i-th frame image is the subsequent frame image, the ith The center of the image block is the same as the center position of the i-1th bounding frame, and the area of the i-th image block is larger than the area of the i-th bounding frame;
第三确定子单元,配置为在所述第i图像块内确定所述多个候选目标。a third determining subunit configured to determine the plurality of candidate targets within the ith image block.
优选地,所述计算单元,包括:Preferably, the calculating unit comprises:
第一选择子单元,配置为从所述多个候选目标中选出第一候选目标,其中,所述第一候选目标是所述多个候选目标中的任一候选目标;a first selection sub-unit configured to select a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
第一计算子单元,配置为计算所述第一候选目标的第一颜色特征向量,以及计算所述跟踪目标的第二颜色特征向量;a first calculation subunit configured to calculate a first color feature vector of the first candidate target, and calculate a second color feature vector of the tracking target;
第二计算子单元,配置为计算所述第一颜色特征向量和所述第二颜色 特征向量的距离,其中,所述距离即为所述第一候选目标与所述跟踪目标的相似度。a second calculation subunit configured to calculate the first color feature vector and the second color The distance of the feature vector, wherein the distance is the similarity between the first candidate target and the tracking target.
优选地,所述第一计算子单元,还配置为:Preferably, the first calculating subunit is further configured to:
将所述第一候选目标的图像进行主成分分割,获得第一mask图像;以及,将所述跟踪目标的图像进行主成分分割,获得第二mask图像;将所述第一mask图像和所述第二mask图像缩放至相同大小;将所述第一mask图像平均分成M个区域;以及,将所述第二mask图像平均分成M个区域,M为正整数;计算所述第一mask图像中每个区域的颜色特征向量;以及,计算所述第二mask图像中每个区域的颜色特征向量;将所述第一mask图像中每个区域的颜色特征向量顺序连接,获得所述第一颜色特征向量;以及,将所述第二mask图像中每个区域的颜色特征向量顺序连接,获得所述第二颜色特征向量。Performing main component segmentation on the image of the first candidate target to obtain a first mask image; and performing principal component segmentation on the image of the tracking target to obtain a second mask image; and the first mask image and the first mask image Dividing the second mask image to the same size; dividing the first mask image into M regions; and dividing the second mask image into M regions, M is a positive integer; calculating the first mask image a color feature vector of each region; and calculating a color feature vector of each region in the second mask image; sequentially connecting color feature vectors of each region in the first mask image to obtain the first color a feature vector; and, sequentially connecting the color feature vectors of each of the regions in the second mask image to obtain the second color feature vector.
优选地,所述第一计算子单元,还配置为:Preferably, the first calculating subunit is further configured to:
确定W种主颜色,W为正整数;计算所述第一mask图像中第一区域中每个像素在每种主颜色上的投影权重,所述第一区域是所述第一mask图像中的M个区域中的任一区域;以及,计算所述第二mask图像中第二区域中每个像素在每种主颜色上的投影权重,所述第二区域是所述第二mask图像中的M个区域中的任一区域;基于所述第一区域中每个像素在每种主颜色上的投影权重,获得所述第一区域中每个像素对应的W维颜色特征向量;以及,基于所述第二区域中每个像素在每种主颜色上的投影权重,获得所述第二区域中每个像素对应W维颜色特征向量;对所述第一区域中每个像素对应的W维颜色特征向量进行归一化,获得所述第一区域中每个像素的颜色特征向量;以及,对所述第二区域中每个像素对应的W维颜色特征向量进行归一化,获得所述第二区域中每个像素的颜色特征向量;将所述第一区域中每个像素的颜色特征向量相加,获得所述第一区域的颜色特 征向量;以及,将所述第二区域中每个像素的颜色特征向量相加,获得所述第二区域的颜色特征向量。Determining a W main color, W being a positive integer; calculating a projection weight of each pixel in the first region of the first mask image on each of the main colors, the first region being in the first mask image Any one of the M regions; and calculating a projection weight of each pixel in the second region of the second mask image on each of the main colors, the second region being in the second mask image Any one of the M regions; obtaining a W-dimensional color feature vector corresponding to each pixel in the first region based on a projection weight of each pixel in each of the first regions; and, based on a projection weight of each pixel in the second region on each of the main colors, obtaining a W-dimensional color feature vector corresponding to each pixel in the second region; and a W dimension corresponding to each pixel in the first region The color feature vector is normalized to obtain a color feature vector of each pixel in the first region; and normalizing the W-dimensional color feature vector corresponding to each pixel in the second region to obtain the The color of each pixel in the second area Eigenvectors; region of the first feature vector of each pixel color is added to obtain the first color region Laid a eigenvector; and summing the color feature vectors of each pixel in the second region to obtain a color feature vector of the second region.
优选地,所述第一计算子单元,还配置为基于如下等式,计算第一像素在每n种主颜色上的投影权重:Preferably, the first calculating subunit is further configured to calculate a projection weight of the first pixel on each n main colors based on the following equation:
Figure PCTCN2017110577-appb-000002
Figure PCTCN2017110577-appb-000002
其中,所述第一像素为所述第一区域或所述第二区域中的任一像素,所述第n种主颜色是所述W种主颜色中的任意一种主颜色,wn为所述第一像素在所述第n种主颜色上的投影权重,Ir,、Ig,、Ib为所述第一像素的RGB值;Rn、Gn、Bn为所述第n种主颜色的RGB值。The first pixel is any one of the first region or the second region, and the nth main color is any one of the W main colors, and w n is a projection weight of the first pixel on the nth main color, I r , I g , and I b are RGB values of the first pixel; R n , G n , B n are the The RGB values of the n main colors.
优选地,所述计算单元,包括:Preferably, the calculating unit comprises:
第二选择子单元,配置为从所述多个候选目标中选出第一候选目标,其中,所述第一候选目标是所述多个候选目标中的任一候选目标;a second selection subunit, configured to select a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
归一化子单元,配置为将所述第一候选目标的图像与所述跟踪目标的图像归一化至相同大小;a normalization subunit configured to normalize an image of the first candidate target to an image of the tracking target to the same size;
第一输入子单元,配置为将所述跟踪目标的图像输入至第一深度神经网络的第一卷积网络中进行特征计算,获得所述跟踪目标的特征向量,其中,所述第一深度神经网络基于Siamese结构;a first input subunit, configured to input an image of the tracking target into a first convolutional network of a first depth neural network for feature calculation, to obtain a feature vector of the tracking target, wherein the first deep neural The network is based on the Siamese structure;
第二输入子单元,配置为将所述第一候选目标的图像输入至所述第一深度神经网络的第二卷积网络中进行特征计算,获得所述第一候选目标的特征向量,所述第二卷积网络和所述第一卷积网络共享卷积层参数;a second input subunit, configured to input an image of the first candidate target into a second convolution network of the first depth neural network to perform feature calculation, to obtain a feature vector of the first candidate target, The second convolution network and the first convolution network share a convolution layer parameter;
第三输入子单元,配置为将所述跟踪目标的特征向量和所述第一候选目标的特征向量输入至所述第一深度神经网络的第一全连接网络中进行相似度计算,获得所述第一候选目标与所述跟踪目标的相似度。a third input subunit, configured to input a feature vector of the tracking target and a feature vector of the first candidate target into a first fully connected network of the first depth neural network to perform a similarity calculation, to obtain the The similarity between the first candidate target and the tracking target.
优选地,所述第三确定子单元,还配置为:Preferably, the third determining subunit is further configured to:
将所述第i图像块输入至第二深度神经网络的第三卷积网络中进行特 征计算,获得所述第i图像块的特征图,其中,所述第二深度神经网络基于Siamese结构;将所述第i图像块的特征图输入至所述第二深度神经网络的RPN网络中,获得所述多个候选目标以及所述多个候选目标的特征向量。Inputting the ith image block into a third convolution network of the second deep neural network Calculating a feature map of the ith image block, wherein the second depth neural network is based on a Siamese structure; and importing a feature map of the ith image block into an RPN network of the second deep neural network Obtaining the plurality of candidate targets and feature vectors of the plurality of candidate targets.
优选地,所述计算单元,包括:Preferably, the calculating unit comprises:
提取子单元,配置为从所述多个候选目标的特征向量中提取第一候选目标的特征向量,其中,所述第一候选目标为所述多个候选目标中的任一候选目标;Extracting a subunit, configured to extract a feature vector of the first candidate target from the feature vectors of the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
第四输入子单元,配置为将所述跟踪目标的图像输入至所述第二深度神经网络的第四卷积网络中进行特征计算,获得所述跟踪目标的特征向量,其中,所述第四卷积网络和所述第三卷积网络共享卷积层参数;a fourth input subunit, configured to input an image of the tracking target into a fourth convolution network of the second depth neural network to perform feature calculation, to obtain a feature vector of the tracking target, where the fourth The convolution network and the third convolution network share a convolution layer parameter;
第五输入子单元,配置为将所述跟踪目标的特征向量和所述第一候选目标的特征向量输入至所述第二深度神经网络的第二全连接网络中进行相似度计算,获得所述第一候选目标与所述跟踪目标的相似度。a fifth input subunit, configured to input a feature vector of the tracking target and a feature vector of the first candidate target into a second fully connected network of the second depth neural network to perform a similarity calculation, to obtain the The similarity between the first candidate target and the tracking target.
再一方面,本发明通过本发明的一实施例,提供如下技术方案:In still another aspect, the present invention provides the following technical solutions through an embodiment of the present invention:
一种电子设备,包括:处理器和用于存储能够在处理器上运行的计算机程序的存储器,其中,所述处理器用于运行所述计算机程序时,执行以上所述方法的步骤。An electronic device comprising: a processor and a memory for storing a computer program executable on the processor, wherein the processor is operative to perform the steps of the method described above when the computer program is run.
再一方面,本发明通过本发明的一实施例,提供如下技术方案:In still another aspect, the present invention provides the following technical solutions through an embodiment of the present invention:
一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现以上所述方法的步骤。A computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the method described above.
本发明实施例中提供的一个或多个技术方案,至少具有如下技术效果或优点:One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:
在本发明实施例中,在图像数据的初始帧图像中确定一跟踪目标;在图像数据的后续帧图像中提取多个候选目标;计算出每个候选目标与跟踪目标的相似度;将相似度最高的候选目标确定为跟踪目标。由于将后续每 一帧图像的候选目标与初始帧图像中的跟踪目标进行比较,将候选目标中相似度最高的候选目标确定为跟踪目标,从而实现了对跟踪目标的跟踪。本发明实施例中的跟踪方法与现有技术中的在线学习的视觉跟踪方法相比,对于初始帧之后的每一帧的处理,都可以看作是在判断目标是否跟丢,具有可靠地判断跟踪目标是否跟丢的优点;并且不需要维持跟踪模板,避免了跟踪模板的持续更新导致误差被持续放大,有利于找回跟丢的跟踪目标,从而提高了跟踪系统的鲁棒性。In the embodiment of the present invention, a tracking target is determined in an initial frame image of the image data; a plurality of candidate targets are extracted in a subsequent frame image of the image data; and a similarity between each candidate target and the tracking target is calculated; The highest candidate target is determined as the tracking target. As will be followed by each The candidate target of one frame image is compared with the tracking target in the initial frame image, and the candidate target with the highest similarity among the candidate targets is determined as the tracking target, thereby implementing tracking of the tracking target. Compared with the online tracking visual tracking method in the prior art, the tracking method in the embodiment of the present invention can be regarded as determining whether the target is lost or not, and has a reliable judgment. Tracking whether the target is lost or not; and does not need to maintain the tracking template, avoiding the continuous update of the tracking template, causing the error to be continuously amplified, which is beneficial to recovering the lost tracking target, thereby improving the robustness of the tracking system.
附图说明DRAWINGS
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are some embodiments of the present invention. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.
图1为本发明实施例中一种目标跟踪方法的流程图;1 is a flowchart of a target tracking method according to an embodiment of the present invention;
图2为本发明实施例中初始帧图像的示意图;2 is a schematic diagram of an initial frame image in an embodiment of the present invention;
图3为本发明实施例中初始跟踪目标的示意图;3 is a schematic diagram of an initial tracking target in an embodiment of the present invention;
图4为本发明实施例中第2帧图像的示意图;4 is a schematic diagram of an image of a second frame in an embodiment of the present invention;
图5为本发明实施例中在第2帧图像中确定的候选目标的示意图;FIG. 5 is a schematic diagram of candidate objects determined in a second frame image according to an embodiment of the present invention; FIG.
图6为本发明实施例中第一深度神经网络的示意图;6 is a schematic diagram of a first deep neural network according to an embodiment of the present invention;
图7为本发明实施例中第二深度神经网络的示意图;FIG. 7 is a schematic diagram of a second deep neural network according to an embodiment of the present invention; FIG.
图8为本发明实施例中一种电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
具体实施方式detailed description
本发明实施例通过提供一种目标跟踪方法及装置,解决了现有技术中的在线学习的视觉跟踪方法,存在无法判断跟踪目标是否跟丢,以及跟丢后难以找回跟踪目标的技术问题。 The embodiment of the present invention solves the prior art visual tracking method for online learning by providing a target tracking method and device, and has the technical problem that it is impossible to determine whether the tracking target is lost or not, and it is difficult to retrieve the tracking target after the lost.
本发明实施例的技术方案为解决上述技术问题,总体思路如下:The technical solution of the embodiment of the present invention is to solve the above technical problem, and the general idea is as follows:
一种目标跟踪方法,应用于电子设备中,电子设备具有图像采集单元,图像采集单元用于采集图像数据,所述方法包括:在图像数据的初始帧图像中确定一跟踪目标;在图像数据的后续帧图像中提取多个候选目标,后续帧图像是初始帧图像之后的任一帧图像;计算出候选目标与跟踪目标的相似度;将多个候选目标中的与跟踪目标的相似度最高的候选目标确定为所述跟踪目标。A target tracking method is applied to an electronic device, wherein the electronic device has an image acquisition unit, and the image acquisition unit is configured to acquire image data, the method comprising: determining a tracking target in the initial frame image of the image data; Extracting a plurality of candidate targets in the subsequent frame image, the subsequent frame image is any frame image after the initial frame image; calculating the similarity between the candidate target and the tracking target; and maximizing the similarity between the candidate targets and the tracking target The candidate target is determined as the tracking target.
为了更好的理解上述技术方案,下面将结合说明书附图以及具体的实施方式对上述技术方案进行详细的说明。In order to better understand the above technical solutions, the above technical solutions will be described in detail below in conjunction with the drawings and specific embodiments.
实施例一Embodiment 1
本实施例提供了一种目标跟踪方法,应用于电子设备中,所述电子设备可以是:地面机器人(例如:平衡车)、或无人机(例如:多旋翼无人机、或固定翼无人机)、或电动汽车等设备,此处,对于所述电子设备具体是何种设备,本实施例不做具体限定。其中,在电子设备具有图像采集单元(例如:摄像头),图像采集单元用于采集图像数据。The embodiment provides a target tracking method, which is applied to an electronic device, and the electronic device may be: a ground robot (for example, a balance vehicle), or a drone (for example, a multi-rotor drone, or a fixed wing without The device is not limited to the specific embodiment of the device. Wherein, the electronic device has an image acquisition unit (for example, a camera), and the image acquisition unit is configured to collect image data.
如图1所示,所述的目标跟踪方法,包括:As shown in FIG. 1, the target tracking method includes:
步骤S101:在图像数据的初始帧图像中确定一跟踪目标。Step S101: Determine a tracking target in the initial frame image of the image data.
作为一种可选的实施例,步骤S101,包括:As an optional embodiment, step S101 includes:
在通过显示屏输出初始帧图像后,获取用户的选择操作;基于用户的选择操作,在初始帧图像中确定跟踪目标;或者After outputting the initial frame image through the display screen, acquiring a user's selection operation; determining a tracking target in the initial frame image based on the user's selection operation; or
获取用于描述跟踪目标的特征信息;基于特征信息,在初始帧图像中确定跟踪目标。Obtaining feature information for describing the tracking target; determining the tracking target in the initial frame image based on the feature information.
在具体实施过程中,如图2所示,可以获取图像采集单元采集到的图像,并通过设置在电子设备上的显示屏输出该图像(例如:初始帧图像300),并获取用户执行的一选择操作(例如:在该显示屏为触摸屏时,通过该触 摸屏获取用户的选择操作),再基于该选择操作从初始帧图像300中确定一跟踪目标(即:初始跟踪目标000)。或者,获取用于描述跟踪目标的特征信息,结合显著性分析(saliency detection)或目标检测(object detection)算法,在初始帧图300像中确定跟踪目标(即:初始跟踪目标000)。此处,如图3所示,可以提取并保存初始跟踪目标000的图像311以作备用,图像311即为第1包围框310中的图像。In a specific implementation process, as shown in FIG. 2, an image acquired by the image acquisition unit may be acquired, and the image (for example, an initial frame image 300) is output through a display screen set on the electronic device, and a user executed is acquired. Select an action (for example, when the display is a touch screen, pass the touch The touch screen acquires the user's selection operation, and then determines a tracking target (ie, initial tracking target 000) from the initial frame image 300 based on the selection operation. Alternatively, the feature information for describing the tracking target is acquired, and the tracking target (ie, the initial tracking target 000) is determined in the initial frame image 300 in conjunction with a saliency detection or an object detection algorithm. Here, as shown in FIG. 3, the image 311 of the initial tracking target 000 can be extracted and saved for backup, and the image 311 is the image in the first bounding frame 310.
步骤S102:在图像数据的后续帧图像中提取多个候选目标,后续帧图像是初始帧图像之后的任一帧图像。Step S102: extracting a plurality of candidate targets in the subsequent frame image of the image data, and the subsequent frame image is any frame image subsequent to the initial frame image.
作为一种可选的实施例,步骤S102,包括:As an optional embodiment, step S102 includes:
确定跟踪目标在第i-1帧图像中的第i-1包围框(其中,第i-1帧图像属于图像数据,i为大于等于2的整数;在i等于2时,第i-1帧图像即为初始帧图像);基于第i-1包围框,在第i帧图像中确定第i图像块,其中,第i帧图像即为后续帧图像,第i图像块的中心与第i-1包围框的中心位置相同,第i图像块的面积大于第i-1包围框的面积;在第i图像块内确定多个候选目标。Determining an i-th bounding frame of the tracking target in the i-1th frame image (wherein the i-1th frame image belongs to image data, i is an integer greater than or equal to 2; when i is equal to 2, the i-1th frame The image is the initial frame image); based on the i-1th bounding box, the i-th image block is determined in the i-th frame image, wherein the i-th frame image is the subsequent frame image, the center of the i-th image block and the i-th image 1 The center of the bounding box is the same, the area of the i-th image block is larger than the area of the i-1th bounding box; and a plurality of candidate targets are determined within the i-th image block.
举例来讲,如图2所示,图2为初始帧图像,其中包含多个人物目标,需要进行跟踪的跟踪目标为第1包围框310内的人物。如图4所示,图4为第2帧图像,其中各个人物目标的位置或姿态发生了变化。For example, as shown in FIG. 2, FIG. 2 is an initial frame image including a plurality of person targets, and the tracking target to be tracked is a person in the first bounding box 310. As shown in FIG. 4, FIG. 4 is a second frame image in which the position or posture of each character object is changed.
在i等于2时,如图3所示,确定跟踪目标(即:初始跟踪目标000)在初始帧图像300中的包围框(即:第1包围框310),该包围框通常为矩形,且能够恰好包围跟踪目标(即:初始跟踪目标000)。如图4所示,基于第1包围框310的位置(第1包围框310在初始帧图像300中的位置和在第2帧图像400中的位置相同),在第2帧图像400中确定一图像块(即:第2图像块420),第2图像块420与第1包围框310的中心相同,但是第2图像块420要比第1包围框310的面积大一些,在第2图像块420中可能 有多个目标,其中,在初始帧图像300中确定的跟踪目标(即:初始跟踪目标000)就在第2图像块420内,此处可以利用显著性分析或目标检测等方法在第2图像块420中确定所述多个目标,并将这些目标确定为候选目标(即:候选目标401、候选目标402、候选目标403、候选目标404)。进一步,再基于步骤S103~步骤S104,从这些候选目标中确定所述跟踪目标,也就是从第2帧图像中识别出初始跟踪目标000。其中,关于S103~步骤S104的具体实施方式,在后文中有详细介绍。When i is equal to 2, as shown in FIG. 3, the bounding frame (ie, the first bounding box 310) in the initial frame image 300 is determined (ie, the initial tracking target 000), and the bounding box is generally rectangular, and Can just surround the tracking target (ie: initial tracking target 000). As shown in FIG. 4, based on the position of the first bounding frame 310 (the position of the first bounding frame 310 in the initial frame image 300 is the same as the position in the second frame image 400), one is determined in the second frame image 400. In the image block (ie, the second image block 420), the second image block 420 is the same as the center of the first bounding frame 310, but the second image block 420 is larger than the area of the first bounding frame 310, and is in the second image block. Possible in 420 There are a plurality of targets, wherein the tracking target determined in the initial frame image 300 (ie, the initial tracking target 000) is in the second image block 420, where the second image can be utilized by methods such as saliency analysis or target detection. The plurality of targets are determined in block 420 and determined as candidate targets (ie, candidate target 401, candidate target 402, candidate target 403, candidate target 404). Further, based on steps S103 to S104, the tracking target is determined from among the candidate targets, that is, the initial tracking target 000 is identified from the second frame image. Here, specific embodiments of S103 to S104 will be described in detail later.
同理,在i等于3时,在从第2帧图像400中识别出跟踪目标后,则确定跟踪目标在第2帧图像400中的包围框(即:第2包围框),基于第2包围框,在第3帧图像中确定一图像块(即:第3图像块),第3图像块与第2包围框的中心相同,但是第3图像块要比第2图像块的面积大一些,在第3图像块中可能有多个目标,其中,在初始帧图像中确定的跟踪目标就在这些目标中,此处可以利用显著性分析或目标检测等方法在第3图像块中确定所述多个目标,并将所述多个目标确定为候选目标。进一步,再基于步骤S103~步骤S104,从这些候选目标中确定所述跟踪目标,也就是从第3帧图像中识别出初始跟踪目标000。Similarly, when i is equal to 3, after the tracking target is recognized from the second frame image 400, the bounding frame of the tracking target in the second frame image 400 (ie, the second bounding frame) is determined, based on the second surrounding. a frame, wherein an image block (ie, a third image block) is determined in the image of the third frame, and the third image block is the same as the center of the second frame, but the third image block is larger than the area of the second image block. There may be multiple targets in the third image block, wherein the tracking targets determined in the initial frame image are in these targets, and the method may be determined in the third image block by means of saliency analysis or target detection. A plurality of targets and determining the plurality of targets as candidate targets. Further, based on steps S103 to S104, the tracking target is determined from among the candidate targets, that is, the initial tracking target 000 is identified from the third frame image.
同理,在i等于4时,在第4帧图像中确定第4图像块,在第4图像块中确定多个候选目标,进一步,基于步骤S103~步骤S104,从这些候选目标中确定所述跟踪目标(即:初始跟踪目标000)。以此类推,在i等于5、6、7、8……时,在其中的每帧图像中确定多个候选目标,再基于步骤S103~步骤S104,从这些候选目标中确定所述跟踪目标(即:初始跟踪目标000),从而实现对跟踪目标的识别跟踪。Similarly, when i is equal to 4, the fourth image block is determined in the fourth frame image, and the plurality of candidate targets are determined in the fourth image block, and further, the steps are determined from the candidate targets based on steps S103 to S104. Track the target (ie: initial tracking target 000). And so on, when i is equal to 5, 6, 7, 8, ..., a plurality of candidate targets are determined in each frame image, and the tracking target is determined from the candidate targets based on steps S103 to S104 ( That is: the target 000) is initially tracked, thereby achieving the identification tracking of the tracking target.
在具体实施过程中,在从在第i图像块内确定多个候选目标后,提取并保存每个候选目标的图像以作备用。如图5所示,提取并保存候选目标401的图像421、候选目标402的图像422、候选目标403的图像423、候选目 标404的图像424。In a specific implementation process, after determining a plurality of candidate targets from within the ith image block, images of each candidate target are extracted and saved for backup. As shown in FIG. 5, the image 421 of the candidate target 401, the image 422 of the candidate target 402, the image 423 of the candidate target 403, and the candidate mesh are extracted and saved. Image 424 of the label 404.
步骤S103:计算出候选目标与跟踪目标的相似度。Step S103: Calculate the similarity between the candidate target and the tracking target.
比如,在一具体示例中,计算出每个候选目标与跟踪目标的相似度。For example, in a specific example, the similarity of each candidate target to the tracking target is calculated.
在具体实施过程中,需要第一计算出每个候选目标与跟踪目标的相似度。其中,所述跟踪目标是在初始帧图像300中确定的初始跟踪目标000(如图3所示),所述候选目标来自于第i帧图像中的第i图像块,第i帧图像是一后续帧图形(即:初始帧图形之后的任一帧图像)。例如,如图4所示,所述候选目标包括第2帧图像400中确定的候选目标401、候选目标402、候选目标403、候选目标404。In a specific implementation process, it is required to first calculate the similarity between each candidate target and the tracking target. Wherein, the tracking target is an initial tracking target 000 (shown in FIG. 3) determined in the initial frame image 300, the candidate target is from an ith image block in the ith frame image, and the ith frame image is a Subsequent frame graphics (ie, any frame image after the initial frame graphic). For example, as shown in FIG. 4, the candidate target includes the candidate target 401, the candidate target 402, the candidate target 403, and the candidate target 404 determined in the second frame image 400.
在具体实施过程中,可以利用目标再识别算法,计算出每个候选目标与跟踪目标的相似度。此处,对于步骤S103可以有以下三种实施方式。In the specific implementation process, the target re-identification algorithm can be used to calculate the similarity between each candidate target and the tracking target. Here, the following three embodiments are available for step S103.
方式一:利用基于颜色特征的目标再识别算法,计算出每个候选目标与所述跟踪目标的相似度。Manner 1: Calculate the similarity between each candidate target and the tracking target by using a color feature based target re-identification algorithm.
作为一种可选的实施例,步骤S103,包括:As an optional embodiment, step S103 includes:
从多个候选目标中选出第一候选目标,其中,第一候选目标是多个候选目标中的任一候选目标;计算第一候选目标的第一颜色特征向量,以及计算跟踪目标的第二颜色特征向量;计算第一颜色特征向量和第二颜色特征向量的距离,其中,该距离即为第一候选目标与跟踪目标的相似度。Selecting a first candidate target from among a plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets; calculating a first color feature vector of the first candidate target, and calculating a second of the tracking target a color feature vector; calculating a distance between the first color feature vector and the second color feature vector, wherein the distance is the similarity between the first candidate target and the tracking target.
举例来讲,如图3所示,计算初始跟踪目标000的颜色特征向量,其中,初始跟踪目标000即是在初始帧图像300中确定的跟踪目标,如图5所示,再依次计算候选目标401的颜色特征向量,最后,计算初始跟踪目标000的颜色特征向量与候选目标401的颜色特征向量之间的距离,该距离值即表示候选目标401与初始跟踪目标000的相似度。同理,再分别计算出候选目标402、候选目标403、候选目标404与初始跟踪目标000的相似度。 For example, as shown in FIG. 3, the color feature vector of the initial tracking target 000 is calculated, wherein the initial tracking target 000 is the tracking target determined in the initial frame image 300, as shown in FIG. 5, and then the candidate target is sequentially calculated. The color feature vector of 401, and finally, the distance between the color feature vector of the initial tracking target 000 and the color feature vector of the candidate target 401 is calculated, which represents the similarity between the candidate target 401 and the initial tracking target 000. Similarly, the similarity between the candidate target 402, the candidate target 403, the candidate target 404, and the initial tracking target 000 is calculated separately.
在具体实施过程中,可以基于欧几里得距离公式,计算出第一颜色特征向量和第二颜色特征向量的距离。In a specific implementation process, the distance between the first color feature vector and the second color feature vector may be calculated based on the Euclidean distance formula.
作为一种可选的实施例,更详细地,所述计算第一候选目标的第一颜色特征向量,以及计算跟踪目标的第二颜色特征向量,包括:As an optional embodiment, in more detail, the calculating the first color feature vector of the first candidate target and calculating the second color feature vector of the tracking target comprises:
将第一候选目标对的图像进行主成分分割,获得第一mask图像;以及,将跟踪目标的图像进行主成分分割(Saliency Segmentation),获得第二mask图像;将第一mask图像和第二mask图像缩放至相同大小;将第一mask图像平均分成M个区域;以及,将第二mask图像平均分成M个区域,M为正整数;计算第一mask图像中每个区域的颜色特征向量;以及,计算第二mask图像中每个区域的颜色特征向量;将第一mask图像中每个区域的颜色特征向量顺序连接,获得第一颜色特征向量;以及,将第二mask图像中每个区域的颜色特征向量顺序连接,获得第二颜色特征向量。Principal component segmentation is performed on the image of the first candidate target pair to obtain a first mask image; and the image of the tracking target is subjected to Saliency Segmentation to obtain a second mask image; the first mask image and the second mask are obtained The image is scaled to the same size; the first mask image is equally divided into M regions; and the second mask image is equally divided into M regions, M being a positive integer; the color feature vector of each region in the first mask image is calculated; Calculating a color feature vector of each region in the second mask image; sequentially connecting color feature vectors of each region in the first mask image to obtain a first color feature vector; and, for each region in the second mask image The color feature vectors are sequentially connected to obtain a second color feature vector.
举例来讲,在计算跟踪目标(即:初始跟踪目标000)的颜色特征向量(即:第二颜色特征向量)时,可以先将初始跟踪目标000的图像311进行主成分分割,获得第二mask图像(在mask图像中,只有主成分区域保持像素值与原图像一致,其他区域像素值为0),其中,初始跟踪目标000的图像311为矩形,且能够恰好包围初始跟踪目标000,然后将第二mask图像缩放至一预设大小,再将第二mask图像平均分成4个区域(上下二等分,左右二等分),再分别计算这4个区域中每个区域的颜色特征向量,最后将这4个区域中每个区域的颜色特征向量顺序连接(若每个区域的颜色特征向量为一个10维向量,则顺序连接则获得一个40维向量),归一化后获得跟踪目标(即:初始跟踪目标000)的颜色特征向量(即:第二颜色特征向量)。For example, when calculating the color feature vector (ie, the second color feature vector) of the tracking target (ie, the initial tracking target 000), the image 311 of the initial tracking target 000 may be first subjected to principal component segmentation to obtain a second mask. Image (in the mask image, only the principal component area keeps the pixel value consistent with the original image, and other regions have a pixel value of 0), wherein the image 311 of the initial tracking target 000 is a rectangle and can immediately surround the initial tracking target 000, and then The second mask image is scaled to a preset size, and then the second mask image is equally divided into four regions (upper and lower halved, left and right halved), and then the color eigenvectors of each of the four regions are respectively calculated. Finally, the color feature vectors of each of the four regions are sequentially connected (if the color feature vector of each region is a 10-dimensional vector, then a sequential connection obtains a 40-dimensional vector), and the tracking target is obtained after normalization ( That is: the color feature vector (ie, the second color feature vector) of the initial tracking target 000).
同理,在计算候选目标401的颜色特征向量时,可以先将候选目标401的图像421进行主成分分割,获得第一mask图像,其中,候选目标401图 像块421为矩形,且能够恰好包围候选目标401,然后将第一mask图像也缩放至一预设大小,与第二mask图像大小相同,再将第一mask图像平均分成4个区域(上下二等分,左右二等分),再分别计算这4个区域中每个区域的颜色特征向量,最后将这4个区域中每个区域的颜色特征向量顺序连接(其中,若每个区域的颜色特征向量为一个10维向量,则顺序连接则获得一个40维向量),归一化后获得候选目标401的颜色特征向量。同理,分别计算出候选目标402的颜色特征向量、候选目标403的颜色特征向量、候选目标404的颜色特征向量。Similarly, when calculating the color feature vector of the candidate target 401, the image 421 of the candidate target 401 may be subjected to principal component segmentation to obtain a first mask image, wherein the candidate target image is 401. The image block 421 is rectangular and can surround the candidate target 401, and then the first mask image is also scaled to a preset size, which is the same size as the second mask image, and the first mask image is equally divided into four regions (upper and lower Equally divided into two equal parts, and then calculate the color feature vector of each of the four regions separately, and finally connect the color feature vectors of each of the four regions sequentially (wherein, if the color of each region) The feature vector is a 10-dimensional vector, and then a sequential connection obtains a 40-dimensional vector. After normalization, the color feature vector of the candidate target 401 is obtained. Similarly, the color feature vector of the candidate target 402, the color feature vector of the candidate target 403, and the color feature vector of the candidate target 404 are respectively calculated.
作为一种可选的实施例,更详细地,所述计算第一mask图像中每个区域的颜色特征向量;以及,计算第二mask图像中每个区域的颜色特征向量,包括:As an optional embodiment, in more detail, the calculating a color feature vector of each region in the first mask image; and calculating a color feature vector of each region in the second mask image includes:
确定W种主颜色,W为正整数;计算第一mask图像中第一区域中每个像素在每种主颜色上的投影权重,第一区域是第一mask图像中的M个区域中的任一区域;以及,计算第二mask图像中第二区域中每个像素在每种主颜色上的投影权重,第二区域是第二mask图像中的M个区域中的任一区域;基于第一区域中每个像素在每种主颜色上的投影权重,获得第一区域中每个像素对应的W维颜色特征向量;以及,基于第二区域中每个像素在每种主颜色上的投影权重,获得第二区域中每个像素对应W维颜色特征向量;对第一区域中每个像素对应的W维颜色特征向量进行归一化,获得第一区域中每个像素的颜色特征向量;以及,对第二区域中每个像素对应的W维颜色特征向量进行归一化,获得第二区域中每个像素的颜色特征向量;将第一区域中每个像素的颜色特征向量相加,获得第一区域的颜色特征向量;以及,将第二区域中每个像素的颜色特征向量相加,获得第二区域的颜色特征向量。Determining W main colors, W is a positive integer; calculating a projection weight of each pixel in the first region of the first mask image on each of the main colors, the first region being any of the M regions in the first mask image An area; and calculating a projection weight of each pixel in the second area of the second mask image on each of the main colors, the second area being any one of the M areas in the second mask image; a projection weight of each pixel in the region on each of the main colors, obtaining a W-dimensional color feature vector corresponding to each pixel in the first region; and, based on the projection weight of each pixel in each of the main colors in the second region Obtaining a W-dimensional color feature vector corresponding to each pixel in the second region; normalizing the W-dimensional color feature vector corresponding to each pixel in the first region to obtain a color feature vector of each pixel in the first region; And normalizing the W-dimensional color feature vector corresponding to each pixel in the second region to obtain a color feature vector of each pixel in the second region; adding the color feature vectors of each pixel in the first region to obtain First Color feature vector area; and, in the second region of each pixel of the color feature vector addition, to obtain the color feature vector of the second region.
举例来讲,可以定义10种主颜色,分别是红色、黄色、蓝色、绿色、 青色、紫色、橙色、白色、黑色、灰色,并用1至10依次编号(即:红色为1号,黄色为2号,蓝色为3号,……,灰色为10号),然后记录每一种颜色的对应的RGB值,具体表示为:Rn,Gn,Bn,n代表这10种主颜色编号(例如:R1即代表红色的R值,G2即代表黄色的G值,B10即代表灰色的B值)。For example, you can define 10 main colors, which are red, yellow, blue, green, cyan, purple, orange, white, black, gray, and numbered sequentially from 1 to 10 (ie: red is number 1, yellow is No. 2, blue is No. 3, ..., gray is No. 10), and then the corresponding RGB values of each color are recorded, specifically expressed as: R n , G n , B n , n represent these 10 main colors Number (for example: R 1 represents the R value of red, G 2 represents the G value of yellow, and B 10 represents the B value of gray).
在将第一mask图像平均分成4个区域(上下二等分,左右二等分)后,在计算第一mask图像中每个区域的颜色特征向量时,首先,从这4个区域中任选一个区域(即:第一区域),计算第一区域中每个像素在每种主颜色上的投影权重,获得第一区域中每个像素点在这10个主颜色的投影权重,其中,每个像素获得一个10维颜色特征向量,然后,对这个10维颜色特征向量归一化后,作为这个像素点的颜色特征向量,在获得第一区域中全部像素点的颜色特征向量后,将全部像素点的颜色特征向量相加,最后,获得第一区域的颜色特征向量。基于该方法,即可计算出第一mask图像中4个区域中每个区域的颜色特征向量。After the first mask image is equally divided into four regions (upper and lower halving, left and right halving), when calculating the color feature vector of each region in the first mask image, first, from among the four regions An area (ie, the first area) calculates a projection weight of each pixel in the first area on each of the main colors, and obtains a projection weight of each of the pixels in the first area in the 10 main colors, wherein each Each pixel obtains a 10-dimensional color feature vector, and then, after normalizing the 10-dimensional color feature vector, as the color feature vector of the pixel, after obtaining the color feature vector of all the pixels in the first region, all will be The color feature vectors of the pixels are added, and finally, the color feature vector of the first region is obtained. Based on the method, the color feature vector of each of the four regions in the first mask image can be calculated.
同理,在将第二mask图像平均分成4个区域(上下二等分,左右二等分)后,在计算第二mask图像中每个区域的颜色特征向量时,首先,从这4个区域中任选一个区域(即:第二区域),计算第二区域中每个像素在每种主颜色上的投影权重,获得第二区域中每个像素点在这10个主颜色的投影权重,其中,每个像素获得第一个10维颜色特征向量,然后,对这个10维颜色特征向量归一化后,作为这个像素点的颜色特征向量,在获得第二区域中全部像素点的颜色特征向量后,将全部像素点的颜色特征向量相加,最后,获得第二区域的颜色特征向量。基于该方法,即可计算出第二mask图像中4个区域中每个区域的颜色特征向量。Similarly, after the second mask image is equally divided into four regions (upper and lower halving, left and right halving), when calculating the color feature vector of each region in the second mask image, first, from the four regions Any one of the regions (ie, the second region) is selected, and the projection weight of each pixel in the second region on each of the main colors is calculated, and the projection weight of each of the pixels in the second region is obtained. Wherein, each pixel obtains the first 10-dimensional color feature vector, and then normalizes the 10-dimensional color feature vector, and as the color feature vector of the pixel, obtains the color features of all the pixels in the second region. After the vector, the color feature vectors of all the pixels are added, and finally, the color feature vector of the second region is obtained. Based on the method, the color feature vector of each of the four regions in the second mask image can be calculated.
作为一种可选的实施例,更详细地,可以基于如下等式,计算第一像素在每n种主颜色上的投影权重: As an alternative embodiment, in more detail, the projection weight of the first pixel on every n main colors can be calculated based on the following equation:
Figure PCTCN2017110577-appb-000003
Figure PCTCN2017110577-appb-000003
其中,第一像素为第一区域或第二区域中的任一像素,第n种主颜色是W种主颜色中的任意一种主颜色,wn为第一像素在第n种主颜色上的投影权重,Ir,、Ig,、Ib为所述第一像素的RGB值;Rn、Gn、Bn为所述第n种主颜色的RGB值。Wherein, the first pixel is any one of the first region or the second region, and the nth main color is any one of the main colors of W, and w n is the first pixel on the nth main color The projection weights, I r , I g , and I b are the RGB values of the first pixel; R n , G n , B n are the RGB values of the nth main color.
举例来讲,n为上述10种主颜色的编号,在计算第一区域或第二区域中的某个像素点在黄色(编号为2)上的投影权重时,可以基于如下等式计算:For example, n is the number of the above 10 main colors. When calculating the projection weight of a certain pixel in the first region or the second region on yellow (number 2), it can be calculated based on the following equation:
Figure PCTCN2017110577-appb-000004
Figure PCTCN2017110577-appb-000004
其中,w2即为该像素点在黄色上的投影权重,R2、G2、B2为黄色的RGB值,Ir、Ig、Ib即为该像素点的RGB值。Where w 2 is the projection weight of the pixel on yellow, R 2 , G 2 , and B 2 are yellow RGB values, and I r , I g , and I b are the RGB values of the pixel.
方式二:利用基于深度神经网络的目标再识别算法,计算出每个候选目标与所述跟踪目标的相似度。Manner 2: Using a target re-recognition algorithm based on a deep neural network, the similarity between each candidate target and the tracking target is calculated.
作为一种可选的实施例,步骤S103,包括:As an optional embodiment, step S103 includes:
如图6所示,从多个候选目标中选出第一候选目标,其中,第一候选目标是多个候选目标中的任一候选目标;将第一候选目标的图像与跟踪目标的图像归一化至相同大小;将跟踪目标的图像通过第一输入端611输入至第一深度神经网络的第一卷积网络601中进行特征计算,获得跟踪目标的特征向量,其中,第一深度神经网络基于Siamese结构;将第一候选目标的图像通过第二输入端612输入至第一深度神经网络的第二卷积网络602中进行特征计算,获得第一候选目标的特征向量,其中,第二卷积网络602和第一卷积网络601共享卷积层参数,即卷基层参数相同;将跟踪目标的特征向量和第一候选目标的特征向量输入至第一深度神经网络的第一全连接层603中进行相似度计算,最终在第一输出端621获得第一候选目标与 跟踪目标的相似度,其中,第一卷积网络601和第二卷积网络602的输出自动作为第一全连接网络603的输入。As shown in FIG. 6, a first candidate target is selected from a plurality of candidate targets, wherein the first candidate target is any one of a plurality of candidate targets; and the image of the first candidate target and the image of the tracking target are returned The image of the tracking target is input to the first convolution network 601 of the first depth neural network through the first input terminal 611 for feature calculation, and the feature vector of the tracking target is obtained, wherein the first depth neural network Based on the Siamese structure; the image of the first candidate target is input into the second convolution network 602 of the first depth neural network through the second input end 612 to perform feature calculation, and the feature vector of the first candidate target is obtained, wherein the second volume The product network 602 and the first convolutional network 601 share the convolution layer parameters, that is, the volume base layer parameters are the same; the feature vector of the tracking target and the feature vector of the first candidate target are input to the first fully connected layer 603 of the first deep neural network. Performing a similarity calculation, and finally obtaining the first candidate target at the first output 621 The similarity of the targets is tracked, wherein the outputs of the first convolutional network 601 and the second convolutional network 602 are automatically entered as inputs to the first fully connected network 603.
在具体实施过程中,需要离线训练第一深度神经网络(如图6所示),第一深度神经网络包括第一卷积网络601、第二卷积网络602和第一全连接网络603、第一输入端611、第二输入端612、第一输出端621,其中,第一卷积网络601和第二卷积网络602是采用了Siamese结构的双边深度神经网络,每一边的网络采用了AlexNet网络中的FC6之前的网络结构,第一卷积网络601和第二卷积网络602中都包含多个卷积层,第一卷积网络601中的卷积层和第二卷积网络602中的卷积层是互为共享卷积层,其参数相同。第一卷积网络601和第二卷积网络602输入的图像需要归一化至相同大小。此处,将归一化后的跟踪目标的图像输入至第一卷积网络601中,可以获得跟踪目标的特征向量;将归一化后的第一候选目标的图像输入至第二卷积网络602中,可以获得第一候选目标的特征向量。第一卷积层601和第二卷积层602共同接入第一全连接网络603,第一全连接网络603中包含多个全连接层,用于计算两边输入特征向量的距离,即可获得第一候选目标与跟踪目标的相似度。其中,第一深度神经网络中的参数是通过离线学习获得的,训练第一深度神经网络的方法与一般的卷积神经网络的训练方式一致,在离线训练结束后,即可将第一深度神经网络网络应用于跟踪系统中。In a specific implementation process, the first deep neural network needs to be trained offline (as shown in FIG. 6), and the first deep neural network includes a first convolutional network 601, a second convolutional network 602, and a first fully connected network 603, An input terminal 611, a second input terminal 612, and a first output terminal 621, wherein the first convolutional network 601 and the second convolutional network 602 are bilateral deep neural networks adopting a Siamese structure, and each side of the network adopts AlexNet. The network structure before FC6 in the network, the first convolutional network 601 and the second convolutional network 602 all contain a plurality of convolution layers, the convolutional layer in the first convolutional network 601 and the second convolutional network 602. The convolutional layers are mutually shared convolutional layers with the same parameters. The images input by the first convolutional network 601 and the second convolutional network 602 need to be normalized to the same size. Here, the image of the normalized tracking target is input into the first convolution network 601, and the feature vector of the tracking target can be obtained; and the image of the normalized first candidate target is input to the second convolution network. In 602, a feature vector of the first candidate target can be obtained. The first convolutional layer 601 and the second convolutional layer 602 are connected to the first fully connected network 603. The first fully connected network 603 includes a plurality of fully connected layers for calculating the distance between the input feature vectors on both sides. The similarity between the first candidate target and the tracking target. Among them, the parameters in the first deep neural network are obtained through offline learning, and the method of training the first deep neural network is consistent with the training method of the general convolutional neural network. After the offline training is finished, the first deep nerve can be obtained. The network network is used in the tracking system.
举例来讲,在利用第一深度神经网络计算候选目标401和初始跟踪目标000的相似度时,可以先将候选目标401的图像421与初始跟踪目标000的图像311归一化至相同大小;然后将初始跟踪目标000的图像311输入至第一卷积网络601中,获得初始跟踪目标000的特征向量,将候选目标401的图像421第二卷积网络602中,获得候选目标401的特征向量;最后初始跟踪目标000的特征向量和候选目标401的特征向量输入至第一全连 接网络603的中,从而获得候选目标401和初始跟踪目标000的相似度。For example, when calculating the similarity between the candidate target 401 and the initial tracking target 000 by using the first depth neural network, the image 421 of the candidate target 401 and the image 311 of the initial tracking target 000 may be first normalized to the same size; The image 311 of the initial tracking target 000 is input into the first convolution network 601 to obtain the feature vector of the initial tracking target 000, and the image 421 of the candidate target 401 is used in the second convolution network 602 to obtain the feature vector of the candidate target 401; Finally, the feature vector of the initial tracking target 000 and the feature vector of the candidate target 401 are input to the first full connection. The network 603 is connected to obtain the similarity between the candidate target 401 and the initial tracking target 000.
同理,将候选目标402的图像422与初始跟踪目标000对应的图像311归一化后,将初始跟踪目标000的图像311输入至第一卷积网络601中,同时,将候选目标402的图像422输入至第二卷积网络602中,即可获得候选目标402和初始跟踪目标000的相似度。以此类推,即可获得候选目标403和初始跟踪目标000的相似度,以及,候选目标404和初始跟踪目标000的相似度。Similarly, after the image 422 of the candidate target 402 is normalized to the image 311 corresponding to the initial tracking target 000, the image 311 of the initial tracking target 000 is input into the first convolution network 601, and at the same time, the image of the candidate target 402 is taken. The 422 is input to the second convolutional network 602 to obtain the similarity between the candidate target 402 and the initial tracking target 000. By analogy, the similarity of the candidate target 403 and the initial tracking target 000, and the similarity of the candidate target 404 and the initial tracking target 000 can be obtained.
方式三:利用深度神经网络,同时实现候选目标的生成和计算出每个候选目标与所述跟踪目标的相似度。Manner 3: Using a deep neural network, simultaneously generating candidate targets and calculating the similarity between each candidate target and the tracking target.
作为一种可选的实施例,在执行所述在第i图像块内确定多个候选目标时,除了可以利用显著性分析或目标检测等方法以外,还可以利用如图7所示的第二深度神经网络。As an optional embodiment, when performing the determining a plurality of candidate targets in the ith image block, in addition to using a method such as saliency analysis or target detection, a second method as shown in FIG. 7 may be utilized. Deep neural network.
具体来讲,如图7所示,可以离线训练第二深度神经网络,第二深度神经网络基于Siamese结构,第二深度神经网络包括第三卷积网络604、第四卷积网络605、RPN(Region Proposal Network,候选区域提取网络)网络607和第二全连接网络606、第三输入端613、第四输入端614、第二输出端622。其中,第三卷积网络604的输出作为RPN网络607的输入,第四卷积网络605和RPN网络607同时接入至第二全连接网络606。其中,第三卷积网络604中包含多个卷积层,用于对第i图像块进行特征计算,利用第三卷积网络604可以获得第i图像块的特征图,RPN网络607用于根据第i图像块的特征图,从第i图像块中提取出多个候选目标,并计算出每个候选目标的特征向量。Specifically, as shown in FIG. 7, the second deep neural network may be trained offline, the second deep neural network is based on the Siamese structure, and the second deep neural network includes a third convolutional network 604, a fourth convolutional network 605, and an RPN ( Region Proposal Network, network 607 and second fully connected network 606, third input 613, fourth input 614, and second output 622. The output of the third convolutional network 604 is input to the RPN network 607, and the fourth convolutional network 605 and the RPN network 607 are simultaneously connected to the second fully connected network 606. The third convolutional network 604 includes a plurality of convolution layers for performing feature calculation on the i-th image block, and the third convolution network 604 is used to obtain a feature map of the i-th image block, and the RPN network 607 is configured to A feature map of the i-th image block, a plurality of candidate targets are extracted from the i-th image block, and feature vectors of each candidate target are calculated.
图7所示的第二深度神经网络与图6所示的第一深度神经网络的主要不同之处在于图7中的下半部分。图7中的第三卷积网络604以第i图像块作为输入,并额外增加了一个RPN网络607,RPN网络607是在第i图像 块经过第三卷积网络604计算后获得的特征图上进行候选目标的提取,RPN网络607直接利用的是第三卷积网络604计算得到的特征图进行计算,计算后直接找到候选目标在特征图上对应的位置,直接在特征图上获取每个候选目标的特征向量,再与初始跟踪目标000对应的特征向量逐对输入至第二全连接网络606计算相似度。The second deep neural network shown in FIG. 7 is mainly different from the first deep neural network shown in FIG. 6 in the lower half of FIG. The third convolutional network 604 in FIG. 7 takes the i-th image block as an input, and additionally adds an RPN network 607, which is the i-th image. The candidate target is extracted on the feature map obtained after the block is calculated by the third convolution network 604. The RPN network 607 directly uses the feature map calculated by the third convolution network 604 to perform calculation, and directly finds the candidate target in the feature after the calculation. The corresponding position on the map directly acquires the feature vector of each candidate target on the feature map, and then the feature vector corresponding to the initial tracking target 000 is input to the second fully connected network 606 to calculate the similarity.
在具体实施过程中,可以将第i图像块通过第四输入端614输入至第二深度神经网络的第三卷积网络604中进行特征计算,获得第i图像块的特征图;将第i图像块的特征图输入至第二深度神经网络的RPN网络607中进行特征计算,提取出多个候选目标,并计算出每个候选目标的特征向量。In a specific implementation process, the ith image block may be input into the third convolution network 604 of the second depth neural network through the fourth input terminal 614 to perform feature calculation, and obtain a feature map of the ith image block; The feature map of the block is input to the RPN network 607 of the second depth neural network for feature calculation, a plurality of candidate targets are extracted, and feature vectors of each candidate target are calculated.
举例来讲,可以将第2图像块420输入至第二深度神经网络的第三卷积网络604中,获得第2图像块420的特征图,将第2图像块420的特征图输入至第二深度神经网络的RPN网络607中,提取出多个候选目标(即:候选目标401、候选目标402、候选目标404、候选目标404),并且还可以获得每个候选目标的特征向量。For example, the second image block 420 can be input into the third convolution network 604 of the second depth neural network to obtain a feature map of the second image block 420, and the feature image of the second image block 420 is input to the second image block 420. In the RPN network 607 of the deep neural network, a plurality of candidate targets (ie, candidate target 401, candidate target 402, candidate target 404, candidate target 404) are extracted, and feature vectors of each candidate target can also be obtained.
作为一种可选的实施例,步骤S103,包括:As an optional embodiment, step S103 includes:
从多个候选目标的特征向量中提取第一候选目标的特征向量,其中,第一候选目标为多个候选目标中的任一候选目标;将跟踪目标的图像通过第三输入端613输入至第二深度神经网络的第四卷积网络605中进行特征计算,获得跟踪目标的特征向量,其中,第四卷积网络605和第三卷积网络604中都包含多个卷积层,第四卷积网络605中的卷积层和第三卷积网络604共享卷积层参数,即卷基层参数相同。将跟踪目标的特征向量和第一候选目标的特征向量输入至第二深度神经网络的第二全连接网络606中进行相似度计算,最终在第二输出端622获得第一候选目标与跟踪目标的相似度。Extracting a feature vector of the first candidate target from the feature vectors of the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets; and inputting the image of the tracking target to the third input terminal 613 Feature calculation is performed in the fourth convolutional network 605 of the two-depth neural network to obtain the feature vector of the tracking target, wherein the fourth convolutional network 605 and the third convolutional network 604 both contain multiple convolutional layers, the fourth volume The convolutional layer and the third convolutional network 604 in the product network 605 share the convolutional layer parameters, i.e., the volume base layer parameters are the same. Inputting the feature vector of the tracking target and the feature vector of the first candidate target into the second fully connected network 606 of the second depth neural network for similarity calculation, and finally obtaining the first candidate target and the tracking target at the second output 622 Similarity.
在具体实施过程中,如图7所示,第二深度神经网络在包括第三卷积 网络604和RPN网络607的基础上,还包括第四卷积网络605和第二全连接网络606,RPN网络704用于基于第三卷积网络604输出的特征图,提取出多个候选目标,并计算出每个候选目标的特征向量,在将每个候选目标的特征向量依次输入第二全连接网络606,第四卷积网络605用于计算跟踪目标的特征向量并输出至第二全连接网络606,第二全连接网络606用于基于第一候选目标的特征向量和跟踪目标的特征向量,计算第一候选目标与跟踪目标的相似度。In a specific implementation process, as shown in FIG. 7, the second deep neural network includes a third convolution The network 604 and the RPN network 607 further include a fourth convolution network 605 and a second fully connected network 606. The RPN network 704 is configured to extract a plurality of candidate targets based on the feature map output by the third convolution network 604. And calculating the feature vector of each candidate target, and inputting the feature vector of each candidate target into the second fully connected network 606 in sequence, and the fourth convolution network 605 is configured to calculate the feature vector of the tracking target and output to the second full connection. The network 606, the second fully connected network 606 is configured to calculate the similarity between the first candidate target and the tracking target based on the feature vector of the first candidate target and the feature vector of the tracking target.
举例来讲,如前文所述,在将第2图像块420输入至第二深度神经网络的第三卷积网络604后,通过第三卷积网络604和RPN网络607的计算,即可获得候选目标421的特征向量、候选目标422的特征向量、候选目标424的特征向量、候选目标424的特征向量。与此同时,将初始跟踪目标000对应的图像311输入至第二深度神经网络的第四卷积网络605,即可通过第二全连接网络606计算出候选目标401与初始跟踪目标000的相似度、候选目标402与初始跟踪目标000的相似度、候选目标403与初始跟踪目标000的相似度、候选目标404与初始跟踪目标000的相似度。For example, as described above, after the second image block 420 is input to the third convolutional network 604 of the second deep neural network, the candidate can be obtained by the calculation of the third convolutional network 604 and the RPN network 607. The feature vector of the target 421, the feature vector of the candidate target 422, the feature vector of the candidate target 424, and the feature vector of the candidate target 424. At the same time, the image 311 corresponding to the initial tracking target 000 is input to the fourth convolution network 605 of the second deep neural network, and the similarity between the candidate target 401 and the initial tracking target 000 can be calculated by the second fully connected network 606. The similarity between the candidate target 402 and the initial tracking target 000, the similarity between the candidate target 403 and the initial tracking target 000, and the similarity between the candidate target 404 and the initial tracking target 000.
步骤S104:将多个候选目标中的与跟踪目标的相似度最高的候选目标确定为跟踪目标。Step S104: Determine a candidate target having the highest similarity with the tracking target among the plurality of candidate targets as the tracking target.
在具体实施过程中,在计算出每个候选目标与跟踪目标的相似度后,即可将相似度最高的候选目标作为跟踪目标。In the specific implementation process, after calculating the similarity between each candidate target and the tracking target, the candidate with the highest similarity can be used as the tracking target.
举例来讲,若候选目标402与初始跟踪目标000的相似度最高,则将候选目标402作为跟踪目标继续进行跟踪。For example, if the similarity between the candidate target 402 and the initial tracking target 000 is the highest, the candidate target 402 continues to be tracked as the tracking target.
上文主要以第2帧图像400为例,对于第2帧图像400中的第2图像块420中的每个候选目标,分别计算每个候选目标与初始跟踪目标000的相似度,并将相似度最高的候选目标作为第2帧图像中的跟踪目标。同理,对于后续其它帧图像(例如:第3帧图像、第4帧图像、第5帧图像,……), 也是一样的,计算每帧图像中每个候选目标与初始跟踪目标000的相似度,并将相似度最高的候选目标作为该帧图像中的跟踪目标。The above mainly takes the second frame image 400 as an example, and for each candidate target in the second image block 420 in the second frame image 400, the similarity between each candidate target and the initial tracking target 000 is calculated separately, and is similar. The candidate with the highest degree is used as the tracking target in the image of the second frame. Similarly, for subsequent frame images (for example, the third frame image, the fourth frame image, the fifth frame image, ...), The same is true, the similarity between each candidate target and the initial tracking target 000 in each frame image is calculated, and the candidate object with the highest similarity is used as the tracking target in the frame image.
上述本发明实施例中的技术方案,至少具有如下的技术效果或优点:The technical solutions in the foregoing embodiments of the present invention have at least the following technical effects or advantages:
由于将后续每一帧图像的候选目标与初始帧图像中的跟踪目标进行比较,将候选目标中相似度最高的候选目标确定为跟踪目标,从而实现了对跟踪目标的跟踪。本发明实施例中的目标跟踪方法与现有技术中的在线学习的视觉跟踪方法相比,对于初始帧之后的每一帧的处理,都可以看作是在判断目标是否跟丢,具有可靠地判断跟踪目标是否跟丢的优点;并且不需要维持跟踪模板,避免了跟踪模板的持续更新导致误差被持续放大,有利于找回跟丢的跟踪目标,从而提高了跟踪系统的鲁棒性。Since the candidate target of each subsequent frame image is compared with the tracking target in the initial frame image, the candidate object with the highest similarity among the candidate targets is determined as the tracking target, thereby implementing tracking of the tracking target. Compared with the online tracking visual tracking method in the prior art, the target tracking method in the embodiment of the present invention can be regarded as determining whether the target is lost or not, and the processing is reliable. It is not necessary to maintain the tracking template, and the tracking template is not required to be maintained, so that the error is continuously amplified, which is beneficial to recovering the tracking target, thereby improving the robustness of the tracking system.
实施例二Embodiment 2
本实施例提供了一种电子设备,该电子设备具有图像采集单元,图像采集单元用于采集图像数据,如图8所示,该电子设备,包括:The embodiment provides an electronic device, which has an image acquisition unit, and the image acquisition unit is configured to collect image data. As shown in FIG. 8 , the electronic device includes:
第一确定单元801,配置为在图像数据的初始帧图像中确定一跟踪目标;The first determining unit 801 is configured to determine a tracking target in the initial frame image of the image data;
提取单元802,配置为在图像数据的后续帧图像中提取多个候选目标,后续帧图像是初始帧图像之后的任一帧图像;The extracting unit 802 is configured to extract a plurality of candidate targets in the subsequent frame image of the image data, where the subsequent frame image is any frame image subsequent to the initial frame image;
计算单元803,配置为计算出候选目标与跟踪目标的相似度;The calculating unit 803 is configured to calculate a similarity between the candidate target and the tracking target;
第二确定单元804,配置为将多个候选目标中的与跟踪目标的相似度最高的候选目标确定为跟踪目标。The second determining unit 804 is configured to determine a candidate target that has the highest similarity with the tracking target among the plurality of candidate targets as the tracking target.
作为一种可选的实施例,第一确定单元801,包括:As an optional embodiment, the first determining unit 801 includes:
第一确定子单元,配置为在通过显示屏输出初始帧图像后,获取用户的选择操作;基于用户的选择操作,在初始帧图像中确定跟踪目标;或者a first determining subunit configured to acquire a user's selection operation after outputting the initial frame image through the display screen; determining a tracking target in the initial frame image based on the user's selection operation; or
第二确定子单元,配置为获取用于描述跟踪目标的特征信息;基于特征信息,在初始帧图像中确定跟踪目标。 a second determining subunit configured to acquire feature information for describing the tracking target; and determining a tracking target in the initial frame image based on the feature information.
作为一种可选的实施例,提取单元802,包括:As an optional embodiment, the extracting unit 802 includes:
第一确定子单元,配置为确定跟踪目标在第i-1帧图像中的第i-1包围框,其中,第i-1帧图像属于图像数据,i为大于等于2的整数;在i等于2时,第i-1帧图像即为初始帧图像;a first determining subunit configured to determine an i-1 bounding frame of the tracking target in the i-1th frame image, wherein the i-1th frame image belongs to image data, and i is an integer greater than or equal to 2; 2, the image of the i-1th frame is the initial frame image;
第二确定子单元,配置为基于第i-1包围框,在第i帧图像中确定第i图像块,其中,第i帧图像即为后续帧图像,第i图像块的中心与第i-1包围框的中心位置相同,第i图像块的面积大于第i-1包围框的面积;a second determining subunit configured to determine an i-th image block in the i-th frame image based on the i-th bounding frame, wherein the i-th frame image is a subsequent frame image, the center of the i-th image block and the i-th image 1 The center position of the bounding frame is the same, and the area of the i-th image block is larger than the area of the i-1th bounding frame;
第三确定子单元,配置为在第i图像块内确定多个候选目标。A third determining subunit configured to determine a plurality of candidate targets within the ith image block.
作为一种可选的实施例,计算单元803,包括:As an optional embodiment, the calculating unit 803 includes:
第一选择子单元,配置为从多个候选目标中选出第一候选目标,其中,第一候选目标是多个候选目标中的任一候选目标;a first selection sub-unit configured to select a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
第一计算子单元,配置为计算第一候选目标的第一颜色特征向量,以及计算跟踪目标的第二颜色特征向量;a first calculation subunit configured to calculate a first color feature vector of the first candidate target and calculate a second color feature vector of the tracking target;
第二计算子单元,配置为计算第一颜色特征向量和第二颜色特征向量的距离,其中,距离即为第一候选目标与跟踪目标的相似度。And a second calculating subunit configured to calculate a distance between the first color feature vector and the second color feature vector, wherein the distance is the similarity between the first candidate target and the tracking target.
作为一种可选的实施例,第一计算子单元,还配置为:As an optional embodiment, the first computing subunit is further configured to:
将第一候选目标图像进行主成分分割,获得第一mask图像;以及,将跟踪目标的图像进行主成分分割,获得第二mask图像;将第一mask图像和第二mask图像缩放至相同大小;将第一mask图像平均分成M个区域;以及,将第二mask图像平均分成M个区域,M为正整数;计算第一mask图像中每个区域的颜色特征向量;以及,计算第二mask图像中每个区域的颜色特征向量;将第一mask图像中每个区域的颜色特征向量顺序连接,获得第一颜色特征向量;以及,将第二mask图像中每个区域的颜色特征向量顺序连接,获得第二颜色特征向量。Performing a principal component segmentation on the first candidate target image to obtain a first mask image; and performing principal component segmentation on the image of the tracking target to obtain a second mask image; and scaling the first mask image and the second mask image to the same size; Dividing the first mask image into M regions equally; and dividing the second mask image into M regions, M being a positive integer; calculating a color feature vector of each region in the first mask image; and calculating a second mask image a color feature vector of each of the regions; sequentially connecting the color feature vectors of each region in the first mask image to obtain a first color feature vector; and sequentially connecting the color feature vectors of each region in the second mask image, A second color feature vector is obtained.
作为一种可选的实施例,第一计算子单元,还配置为: As an optional embodiment, the first computing subunit is further configured to:
确定W种主颜色,W为正整数;计算第一mask图像中第一区域中每个像素在每种主颜色上的投影权重,第一区域是第一mask图像中的M个区域中的任一区域;以及,计算第二mask图像中第二区域中每个像素在每种主颜色上的投影权重,第二区域是第二mask图像中的M个区域中的任一区域;基于第一区域中每个像素在每种主颜色上的投影权重,获得第一区域中每个像素对应的W维颜色特征向量;以及,基于第二区域中每个像素在每种主颜色上的投影权重,获得第二区域中每个像素对应W维颜色特征向量;对第一区域中每个像素对应的W维颜色特征向量进行归一化,获得第一区域中每个像素的颜色特征向量;以及,对第二区域中每个像素对应的W维颜色特征向量进行归一化,获得第二区域中每个像素的颜色特征向量;将第一区域中每个像素的颜色特征向量相加,获得第一区域的颜色特征向量;以及,将第二区域中每个像素的颜色特征向量相加,获得第二区域的颜色特征向量。Determining W main colors, W is a positive integer; calculating a projection weight of each pixel in the first region of the first mask image on each of the main colors, the first region being any of the M regions in the first mask image An area; and calculating a projection weight of each pixel in the second area of the second mask image on each of the main colors, the second area being any one of the M areas in the second mask image; a projection weight of each pixel in the region on each of the main colors, obtaining a W-dimensional color feature vector corresponding to each pixel in the first region; and, based on the projection weight of each pixel in each of the main colors in the second region Obtaining a W-dimensional color feature vector corresponding to each pixel in the second region; normalizing the W-dimensional color feature vector corresponding to each pixel in the first region to obtain a color feature vector of each pixel in the first region; And normalizing the W-dimensional color feature vector corresponding to each pixel in the second region to obtain a color feature vector of each pixel in the second region; adding the color feature vectors of each pixel in the first region to obtain First Color feature vector area; and, in the second region of each pixel of the color feature vector addition, to obtain the color feature vector of the second region.
作为一种可选的实施例,第一计算子单元,还配置为基于如下等式,计算第一像素在每n种主颜色上的投影权重:As an optional embodiment, the first calculating subunit is further configured to calculate a projection weight of the first pixel on each n main colors based on the following equation:
Figure PCTCN2017110577-appb-000005
Figure PCTCN2017110577-appb-000005
其中,第一像素为第一区域或第二区域中的任一像素,第n种主颜色是W种主颜色中的任意一种主颜色,wn为第一像素在第n种主颜色上的投影权重,Ir,、Ig,、Ib为所述第一像素的RGB值;Rn、Gn、Bn为所述第n种主颜色的RGB值。Wherein the first pixel is any of the first or second area of a pixel, the primary colors of n kinds of primary colors W is any one of a primary color, the first pixel W n on the n kinds of primary colors The projection weights, I r , I g , and I b are the RGB values of the first pixel; R n , G n , B n are the RGB values of the nth main color.
作为一种可选的实施例,计算单元803,包括:As an optional embodiment, the calculating unit 803 includes:
第二选择子单元,配置为从多个候选目标中选出第一候选目标,其中,第一候选目标是多个候选目标中的任一候选目标;a second selection subunit, configured to select a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
归一化子单元,配置为将第一候选目标的图像与跟踪目标的图像归一化至相同大小; a normalized subunit configured to normalize an image of the first candidate target to an image of the tracking target to the same size;
第一输入子单元,配置为将跟踪目标的图像输入至第一深度神经网络的第一卷积网络中进行特征计算,获得跟踪目标的特征向量,其中,第一深度神经网络基于Siamese结构;a first input subunit, configured to input an image of the tracking target into a first convolution network of the first depth neural network for feature calculation, to obtain a feature vector of the tracking target, wherein the first depth neural network is based on the Siamese structure;
第二输入子单元,配置为将第一候选目标的图像输入至第一深度神经网络的第二卷积网络中进行特征计算,获得第一候选目标的特征向量;a second input subunit, configured to input an image of the first candidate target into a second convolution network of the first depth neural network to perform feature calculation, to obtain a feature vector of the first candidate target;
第三输入子单元,配置为将跟踪目标的特征向量和第一候选目标的特征向量输入至第一深度神经网络的第一全连接网络中进行相似度计算,获得第一候选目标与跟踪目标的相似度。a third input subunit, configured to input the feature vector of the tracking target and the feature vector of the first candidate target into the first fully connected network of the first depth neural network for similarity calculation, to obtain the first candidate target and the tracking target Similarity.
作为一种可选的实施例,第三确定子单元,还配置为:As an optional embodiment, the third determining subunit is further configured to:
将第i图像块输入至第二深度神经网络的第三卷积网络中进行特征计算,获得第i图像块的特征图,其中,第二深度神经网络基于Siamese结构;将第i图像块的特征图输入至第二深度神经网络的RPN网络中,提取出多个候选目标,并获得多个候选目标的特征向量。The ith image block is input into a third convolution network of the second depth neural network to perform feature calculation, and the feature map of the ith image block is obtained, wherein the second depth neural network is based on the Siamese structure; and the feature of the ith image block is The map is input to the RPN network of the second deep neural network, and a plurality of candidate targets are extracted, and feature vectors of the plurality of candidate targets are obtained.
作为一种可选的实施例,计算单元803,包括:As an optional embodiment, the calculating unit 803 includes:
提取子单元,配置为从多个候选目标的特征向量中提取第一候选目标的特征向量,其中,第一候选目标为多个候选目标中的任一候选目标;Extracting a subunit, configured to extract a feature vector of the first candidate target from the feature vectors of the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
第四输入子单元,配置为将跟踪目标的图像输入至第二深度神经网络的第四卷积网络中进行特征计算,获得跟踪目标的特征向量;a fourth input subunit configured to input an image of the tracking target into a fourth convolution network of the second depth neural network for feature calculation, to obtain a feature vector of the tracking target;
第五输入子单元,配置为将跟踪目标的特征向量和第一候选目标的特征向量输入至第二深度神经网络的第二全连接网络中进行相似度计算,获得第一候选目标与跟踪目标的相似度。a fifth input subunit, configured to input the feature vector of the tracking target and the feature vector of the first candidate target into the second fully connected network of the second depth neural network to perform similarity calculation, to obtain the first candidate target and the tracking target Similarity.
由于本实施例所介绍的电子设备为实施本发明实施例中目标跟踪方法的方法所采用的电子设备,故而基于本发明实施例中所介绍的目标跟踪方法的方法,本领域所属技术人员能够了解本实施例的电子设备的具体实施方式以及其各种变化形式,所以在此对于该电子设备如何实现本发明实施 例中的方法不再详细介绍。只要本领域所属技术人员实施本发明实施例中目标跟踪方法的方法所采用的电子设备,都属于本发明所欲保护的范围。The electronic device introduced in this embodiment is an electronic device used in the method for implementing the target tracking method in the embodiment of the present invention. Therefore, those skilled in the art can understand the method based on the target tracking method introduced in the embodiment of the present invention. The specific embodiment of the electronic device of the embodiment and various variations thereof, so how to implement the invention for the electronic device The method in the example is not described in detail. The electronic device used in the method of the subject tracking method in the embodiments of the present invention is within the scope of the present invention.
实际应用中,所述第一确定单元801、提取单元802、计算单元803以及第二确定单元804均可以运行于电子设备上,可由位于电子设备上的中央处理器(CPU)、或微处理器(MPU)、或数字信号处理器(DSP)、或可编程门阵列(FPGA)实现。In an actual application, the first determining unit 801, the extracting unit 802, the calculating unit 803, and the second determining unit 804 may all run on an electronic device, and may be a central processing unit (CPU) or a microprocessor located on the electronic device. (MPU), or digital signal processor (DSP), or programmable gate array (FPGA) implementation.
上述本发明实施例中的技术方案,至少具有如下的技术效果或优点:The technical solutions in the foregoing embodiments of the present invention have at least the following technical effects or advantages:
由于将后续每一帧图像的候选目标与初始帧图像中的跟踪目标进行比较,将候选目标中相似度最高的候选目标确定为跟踪目标,从而实现了对跟踪目标的跟踪。本发明实施例中的电子设备与现有技术中的利用在线学习的视觉跟踪方法的电子设备相比,对于初始帧之后的每一帧的处理,都可以看作是在判断目标是否跟丢,具有可靠地判断跟踪目标是否跟丢的优点;并且不需要维持跟踪模板,避免了跟踪模板的持续更新导致误差被持续放大,有利于找回跟丢的跟踪目标,从而提高了跟踪系统的鲁棒性。Since the candidate target of each subsequent frame image is compared with the tracking target in the initial frame image, the candidate object with the highest similarity among the candidate targets is determined as the tracking target, thereby implementing tracking of the tracking target. Compared with the electronic device in the prior art that uses the online learning visual tracking method, the processing of each frame after the initial frame can be regarded as determining whether the target is lost or not. It has the advantage of reliably judging whether the tracking target is lost or not; and it does not need to maintain the tracking template, which avoids the continuous updating of the tracking template, so that the error is continuously amplified, which is beneficial to recovering the tracking target, thereby improving the robustness of the tracking system. Sex.
在一具体实施例中,所述电子设备,包括:处理器和用于存储能够在处理器上运行的计算机程序的存储器,其中,所述处理器用于运行所述计算机程序时,执行以上所述方法的步骤。In a specific embodiment, the electronic device includes: a processor and a memory for storing a computer program executable on the processor, wherein the processor is configured to execute the computer program when The steps of the method.
这里,实际应用中,存储器可以由任何类型的易失性或非易失性存储设备、或者它们的组合来实现。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM,Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM,Ferromagnetic Random Access Memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM,Compact Disc  Read-Only Memory);磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM,Random Access Memory),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM,Static Random Access Memory)、同步静态随机存取存储器(SSRAM,Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM,Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM,Synchronous Dynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM,Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM,Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM,SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM,Direct Rambus Random Access Memory)。本发明实施例描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。Here, in practical applications, the memory may be implemented by any type of volatile or non-volatile storage device, or a combination thereof. The non-volatile memory may be a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), or an Erasable Programmable Read (EPROM). Only Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM), Ferromagnetic Random Access Memory (FRAM), Flash Memory, Magnetic Surface Memory , CD, or CD-ROM (CD-ROM, Compact Disc) Read-Only Memory); the magnetic surface memory can be a disk storage or a tape storage. The volatile memory can be a random access memory (RAM) that acts as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access (SSRAM). DRAM (Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), enhancement Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory Bus Random Access Memory (DRRAM) ). The memories described in the embodiments of the present invention are intended to include, but are not limited to, these and any other suitable types of memory.
所述处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本发明实施例所公开的方法的步骤,可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中,该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成前述方法的步骤。The processor may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in a processor or an instruction in a form of software. The above processor may be a general purpose processor, a digital signal processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or the like. The processor may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present invention. A general purpose processor can be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the present invention may be directly implemented as a hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor. The software module can be located in a storage medium, the storage medium being located in the memory, the processor reading the information in the memory, and completing the steps of the foregoing methods in combination with the hardware thereof.
本发明的实施例还提供了一种计算机可读存储介质,例如包括计算机 程序的存储器,上述计算机程序可由以上所述电子设备的处理器执行,以完成前述方法所述步骤。计算机可读存储介质可以是FRAM、ROM、可编程只读存储器PROM、EPROM、EEPROM、Flash Memory、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备。Embodiments of the present invention also provide a computer readable storage medium, including, for example, a computer A memory of the program, which may be executed by a processor of the electronic device described above to perform the steps described in the foregoing methods. The computer readable storage medium may be a memory such as FRAM, ROM, programmable read only memory PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM; or may include one or any combination of the above memories. Various equipment.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机 实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on the computer or other programmable device to produce the computer The implemented processing, such as instructions executed on a computer or other programmable device, provides steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。While the preferred embodiment of the invention has been described, it will be understood that Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and the modifications and
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and modifications of the invention
工业实用性Industrial applicability
本发明实施例具有可靠地判断跟踪目标是否跟丢的优点,并且不需要维持跟踪模板,避免了跟踪模板的持续更新导致误差被持续放大,有利于找回跟丢的跟踪目标,从而提高了跟踪系统的鲁棒性。 The embodiment of the invention has the advantages of reliably determining whether the tracking target is lost or not, and does not need to maintain the tracking template, and avoids the continuous updating of the tracking template, so that the error is continuously amplified, which is beneficial to recovering the tracking target and the tracking target, thereby improving the tracking. The robustness of the system.

Claims (22)

  1. 一种目标跟踪方法,应用于电子设备中,所述电子设备具有图像采集单元,所述图像采集单元用于采集图像数据,所述方法包括:A target tracking method is applied to an electronic device, wherein the electronic device has an image capturing unit, and the image collecting unit is configured to collect image data, and the method includes:
    在所述图像数据的初始帧图像中确定一跟踪目标;Determining a tracking target in an initial frame image of the image data;
    在所述图像数据的后续帧图像中提取多个候选目标,所述后续帧图像是所述初始帧图像之后的任一帧图像;Extracting a plurality of candidate targets in a subsequent frame image of the image data, the subsequent frame images being any frame image subsequent to the initial frame image;
    计算出候选目标与所述跟踪目标的相似度;Calculating the similarity between the candidate target and the tracking target;
    将所述多个候选目标中的与所述跟踪目标的相似度最高的候选目标确定为所述跟踪目标。A candidate target having the highest similarity with the tracking target among the plurality of candidate targets is determined as the tracking target.
  2. 如权利要求1所述的目标跟踪方法,其中,所述在图像数据的初始帧图像中确定一跟踪目标,包括:The target tracking method according to claim 1, wherein said determining a tracking target in the initial frame image of the image data comprises:
    在通过显示屏输出所述初始帧图像后,获取用户的选择操作;基于用户的选择操作,在所述初始帧图像中确定所述跟踪目标;或者,After outputting the initial frame image through the display screen, acquiring a user's selection operation; determining the tracking target in the initial frame image based on a user's selection operation; or
    获取用于描述所述跟踪目标的特征信息;基于所述特征信息,在所述初始帧图像中确定所述跟踪目标。Obtaining feature information for describing the tracking target; determining the tracking target in the initial frame image based on the feature information.
  3. 如权利要求1所述的目标跟踪方法,其中,所述在图像数据的后续帧图像中提取多个候选目标,包括:The target tracking method according to claim 1, wherein the extracting a plurality of candidate targets in a subsequent frame image of the image data comprises:
    确定所述跟踪目标在第i-1帧图像中的第i-1包围框,其中,所述第i-1帧图像属于所述图像数据,i为大于等于2的整数;在i等于2时,所述第i-1帧图像即为所述初始帧图像;Determining an i-th bounding frame of the tracking target in the i-1th frame image, wherein the i-1th frame image belongs to the image data, and i is an integer greater than or equal to 2; when i is equal to 2 The image of the i-1th frame is the initial frame image;
    基于所述第i-1包围框,在第i帧图像中确定第i图像块,其中,所述第i帧图像即为所述后续帧图像,所述第i图像块的中心与所述第i-1包围框的中心位置相同,所述第i图像块的面积大于所述第i-1包围框的面积;Determining, in the ith frame image, an ith image block, wherein the ith frame image is the subsequent frame image, a center of the ith image block, and the first The center position of the i-1 enclosing frame is the same, and the area of the i-th image block is larger than the area of the i-th enclosing frame;
    在所述第i图像块内确定所述多个候选目标。The plurality of candidate targets are determined within the ith image block.
  4. 如权利要求1所述的目标跟踪方法,其中,所述计算出候选目标与 所述跟踪目标的相似度,包括:The target tracking method according to claim 1, wherein said calculating a candidate target and The similarity of the tracking target includes:
    从所述多个候选目标中选出第一候选目标,其中,所述第一候选目标是所述多个候选目标中的任一候选目标;Selecting a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
    计算所述第一候选目标的第一颜色特征向量,以及计算所述跟踪目标的第二颜色特征向量;Calculating a first color feature vector of the first candidate target, and calculating a second color feature vector of the tracking target;
    计算所述第一颜色特征向量和所述第二颜色特征向量的距离,其中,所述距离即为所述第一候选目标与所述跟踪目标的相似度。Calculating a distance between the first color feature vector and the second color feature vector, wherein the distance is a similarity between the first candidate target and the tracking target.
  5. 如权利要求4所述的目标跟踪方法,其中,所述计算所述第一候选目标的第一颜色特征向量,以及计算所述跟踪目标的第二颜色特征向量,包括:The target tracking method according to claim 4, wherein the calculating the first color feature vector of the first candidate target and calculating the second color feature vector of the tracking target comprises:
    将所述第一候选目标的图像进行主成分分割,获得第一mask图像;以及,将所述跟踪目标的图像进行主成分分割,获得第二mask图像;Performing main component segmentation on the image of the first candidate target to obtain a first mask image; and performing principal component segmentation on the image of the tracking target to obtain a second mask image;
    将所述第一mask图像和所述第二mask图像缩放至相同大小;Scaling the first mask image and the second mask image to the same size;
    将所述第一mask图像平均分成M个区域;以及,将所述第二mask图像平均分成M个区域,M为正整数;And dividing the first mask image into M regions; and dividing the second mask image into M regions, where M is a positive integer;
    计算所述第一mask图像中每个区域的颜色特征向量;以及,计算所述第二mask图像中每个区域的颜色特征向量;Calculating a color feature vector of each region in the first mask image; and calculating a color feature vector of each region in the second mask image;
    将所述第一mask图像中每个区域的颜色特征向量顺序连接,获得所述第一颜色特征向量;以及,将所述第二mask图像中每个区域的颜色特征向量顺序连接,获得所述第二颜色特征向量。And sequentially connecting color feature vectors of each region in the first mask image to obtain the first color feature vector; and sequentially connecting color feature vectors of each region in the second mask image to obtain the The second color feature vector.
  6. 如权利要求5所述的目标跟踪方法,其中,所述计算所述第一mask图像中每个区域的颜色特征向量;以及,计算所述第二mask图像中每个区域的颜色特征向量,包括:The target tracking method according to claim 5, wherein said calculating a color feature vector of each of said first mask images; and calculating a color feature vector of each of said second mask images, including :
    确定W种主颜色,W为正整数;Determine the W main color, W is a positive integer;
    计算所述第一mask图像中第一区域中每个像素在每种主颜色上的投影 权重,所述第一区域是所述第一mask图像中的M个区域中的任一区域;以及,计算所述第二mask图像中第二区域中每个像素在每种主颜色上的投影权重,所述第二区域是所述第二mask图像中的M个区域中的任一区域;Calculating a projection of each pixel in the first region of the first mask image on each of the main colors a weight, the first area being any one of the M areas in the first mask image; and calculating a projection of each pixel in the second area of the second mask image on each of the main colors Weighted, the second area is any one of M areas in the second mask image;
    基于所述第一区域中每个像素在每种主颜色上的投影权重,获得所述第一区域中每个像素对应的W维颜色特征向量;以及,基于所述第二区域中每个像素在每种主颜色上的投影权重,获得所述第二区域中每个像素对应W维颜色特征向量;Obtaining a W-dimensional color feature vector corresponding to each pixel in the first region based on a projection weight of each pixel in the first region on each primary color; and, based on each pixel in the second region a projection weight on each primary color, obtaining a corresponding W-dimensional color feature vector for each pixel in the second region;
    对所述第一区域中每个像素对应的W维颜色特征向量进行归一化,获得所述第一区域中每个像素的颜色特征向量;以及,对所述第二区域中每个像素对应的W维颜色特征向量进行归一化,获得所述第二区域中每个像素的颜色特征向量;Normalizing a W-dimensional color feature vector corresponding to each pixel in the first region to obtain a color feature vector of each pixel in the first region; and corresponding to each pixel in the second region And normalizing the W-dimensional color feature vector to obtain a color feature vector of each pixel in the second region;
    将所述第一区域中每个像素的颜色特征向量相加,获得所述第一区域的颜色特征向量;以及,将所述第二区域中每个像素的颜色特征向量相加,获得所述第二区域的颜色特征向量。Adding color feature vectors of each pixel in the first region to obtain a color feature vector of the first region; and adding color feature vectors of each pixel in the second region to obtain the The color feature vector of the second region.
  7. 如权利要求6所述的目标跟踪方法,其中,基于如下等式,计算第一像素在每n种主颜色上的投影权重:The target tracking method according to claim 6, wherein the projection weight of the first pixel on each n main colors is calculated based on the following equation:
    Figure PCTCN2017110577-appb-100001
    Figure PCTCN2017110577-appb-100001
    其中,所述第一像素为所述第一区域或所述第二区域中的任一像素,所述第n种主颜色是所述W种主颜色中的任意一种主颜色,wn为所述第一像素在所述第n种主颜色上的投影权重,Ir,、Ig,、Ib为所述第一像素的RGB值;Rn、Gn、Bn为所述第n种主颜色的RGB值。The first pixel is any one of the first region or the second region, and the nth main color is any one of the W main colors, and w n is a projection weight of the first pixel on the nth main color, I r , I g , and I b are RGB values of the first pixel; R n , G n , B n are the The RGB values of the n main colors.
  8. 如权利要求1所述的目标跟踪方法,其中,所述计算出候选目标与所述跟踪目标的相似度,包括:The target tracking method according to claim 1, wherein the calculating the similarity between the candidate target and the tracking target comprises:
    从所述多个候选目标中选出第一候选目标,其中,所述第一候选目标是所述多个候选目标中的任一候选目标; Selecting a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
    将所述第一候选目标的图像与所述跟踪目标的图像归一化至相同大小;Normalizing an image of the first candidate target with an image of the tracking target to the same size;
    将所述跟踪目标的图像输入至第一深度神经网络的第一卷积网络中进行特征计算,获得所述跟踪目标的特征向量,其中,所述第一深度神经网络基于Siamese结构;Entering an image of the tracking target into a first convolutional network of a first depth neural network for feature calculation to obtain a feature vector of the tracking target, wherein the first depth neural network is based on a Siamese structure;
    将所述第一候选目标的图像输入至所述第一深度神经网络的第二卷积网络中进行特征计算,获得所述第一候选目标的特征向量,所述第二卷积网络和所述第一卷积网络共享卷积层参数;Inputting an image of the first candidate target into a second convolution network of the first depth neural network for feature calculation, obtaining a feature vector of the first candidate target, the second convolution network, and the The first convolutional network shares the convolutional layer parameters;
    将所述跟踪目标的特征向量和所述第一候选目标的特征向量输入至所述第一深度神经网络的第一全连接网络中进行相似度计算,获得所述第一候选目标与所述跟踪目标的相似度。Inputting a feature vector of the tracking target and a feature vector of the first candidate target into a first fully connected network of the first depth neural network for similarity calculation, obtaining the first candidate target and the tracking The similarity of the target.
  9. 如权利要求3所述的目标跟踪方法,其中,所述在所述第i图像块内确定所述多个候选目标,包括:The target tracking method according to claim 3, wherein said determining said plurality of candidate targets in said ith image block comprises:
    将所述第i图像块输入至第二深度神经网络的第三卷积网络中进行特征计算,获得所述第i图像块的特征图,其中,所述第二深度神经网络基于Siamese结构;Inputting the ith image block into a third convolution network of the second depth neural network to perform feature calculation, to obtain a feature map of the ith image block, wherein the second depth neural network is based on a Siamese structure;
    将所述第i图像块的特征图输入至所述第二深度神经网络的RPN网络中,获得所述多个候选目标以及所述多个候选目标的特征向量。And inputting a feature map of the ith image block into an RPN network of the second depth neural network to obtain feature numbers of the plurality of candidate targets and the plurality of candidate targets.
  10. 如权利要求9所述的目标跟踪方法,其中,所述计算出每个候选目标与所述跟踪目标的相似度,包括:The target tracking method according to claim 9, wherein the calculating the similarity between each candidate target and the tracking target comprises:
    从所述多个候选目标的特征向量中提取第一候选目标的特征向量,其中,所述第一候选目标为所述多个候选目标中的任一候选目标;Extracting a feature vector of the first candidate target from the feature vectors of the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
    将所述跟踪目标的图像输入至所述第二深度神经网络的第四卷积网络中进行特征计算,获得所述跟踪目标的特征向量,所述第四卷积网络和所述第三卷积网络共享卷积层参数; Inputting an image of the tracking target into a fourth convolution network of the second depth neural network for feature calculation, obtaining a feature vector of the tracking target, the fourth convolution network and the third convolution Network shared convolutional layer parameters;
    将所述跟踪目标的特征向量和所述第一候选目标的特征向量输入至所述第二深度神经网络的第二全连接网络中进行相似度计算,获得所述第一候选目标与所述跟踪目标的相似度。Inputting a feature vector of the tracking target and a feature vector of the first candidate target into a second fully connected network of the second depth neural network to perform a similarity calculation, obtaining the first candidate target and the tracking The similarity of the target.
  11. 一种电子设备,所述电子设备具有图像采集单元,所述图像采集单元用于采集图像数据,所述电子设备,包括:An electronic device having an image acquisition unit, the image acquisition unit is configured to collect image data, and the electronic device includes:
    第一确定单元,配置为在所述图像数据的初始帧图像中确定一跟踪目标;a first determining unit, configured to determine a tracking target in an initial frame image of the image data;
    提取单元,配置为在所述图像数据的后续帧图像中提取多个候选目标,所述后续帧图像是所述初始帧图像之后的任一帧图像;An extracting unit configured to extract a plurality of candidate targets in a subsequent frame image of the image data, the subsequent frame image being any frame image subsequent to the initial frame image;
    计算单元,配置为计算出候选目标与所述跟踪目标的相似度;a calculating unit configured to calculate a similarity between the candidate target and the tracking target;
    第二确定单元,配置为将所述多个候选目标中的与所述跟踪目标的相似度最高的候选目标确定为所述跟踪目标。The second determining unit is configured to determine, as the tracking target, a candidate target that has the highest similarity with the tracking target among the plurality of candidate targets.
  12. 如权利要求11所述的电子设备,其中,所述第一确定单元,包括:The electronic device of claim 11, wherein the first determining unit comprises:
    第一确定子单元,配置为在通过显示屏输出所述初始帧图像后,获取用户的选择操作;基于用户的选择操作,在所述初始帧图像中确定所述跟踪目标;或者,a first determining subunit configured to acquire a user's selection operation after outputting the initial frame image through the display screen; determining the tracking target in the initial frame image based on a user's selection operation; or
    第二确定子单元,配置为获取用于描述所述跟踪目标的特征信息;基于所述特征信息,在所述初始帧图像中确定所述跟踪目标。a second determining subunit configured to acquire feature information for describing the tracking target; and determining the tracking target in the initial frame image based on the feature information.
  13. 如权利要求11所述的电子设备,其中,所述提取单元,包括:The electronic device of claim 11, wherein the extracting unit comprises:
    第一确定子单元,配置为确定所述跟踪目标在第i-1帧图像中的第i-1包围框,其中,所述第i-1帧图像属于所述图像数据,i为大于等于2的整数;在i等于2时,所述第i-1帧图像即为所述初始帧图像;a first determining subunit, configured to determine an i-th bounding frame of the tracking target in the i-1th frame image, wherein the i-1th frame image belongs to the image data, and i is greater than or equal to 2 An integer of the i-th frame is the initial frame image when i is equal to 2;
    第二确定子单元,配置为基于所述第i-1包围框,在第i帧图像中确定第i图像块,其中,所述第i帧图像即为所述后续帧图像,所述第i图像块的中心与所述第i-1包围框的中心位置相同,所述第i图像块的面积大于所 述第i-1包围框的面积;a second determining subunit, configured to determine an i-th image block in the i-th frame image based on the i-th bounding frame, wherein the i-th frame image is the subsequent frame image, the ith The center of the image block is the same as the center position of the i-1th bounding frame, and the area of the i-th image block is larger than The area of the i-1 bounding frame;
    第三确定子单元,配置为在所述第i图像块内确定所述多个候选目标。a third determining subunit configured to determine the plurality of candidate targets within the ith image block.
  14. 如权利要求11所述的电子设备,其中,所述计算单元,包括:The electronic device of claim 11, wherein the computing unit comprises:
    第一选择子单元,配置为从所述多个候选目标中选出第一候选目标,其中,所述第一候选目标是所述多个候选目标中的任一候选目标;a first selection sub-unit configured to select a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
    第一计算子单元,配置为计算所述第一候选目标的第一颜色特征向量,以及计算所述跟踪目标的第二颜色特征向量;a first calculation subunit configured to calculate a first color feature vector of the first candidate target, and calculate a second color feature vector of the tracking target;
    第二计算子单元,配置为计算所述第一颜色特征向量和所述第二颜色特征向量的距离,其中,所述距离即为所述第一候选目标与所述跟踪目标的相似度。a second calculating subunit configured to calculate a distance between the first color feature vector and the second color feature vector, wherein the distance is a similarity between the first candidate target and the tracking target.
  15. 如权利要求14所述的电子设备,其中,所述第一计算子单元,还配置为:The electronic device of claim 14, wherein the first computing subunit is further configured to:
    将所述第一候选目标的图像进行主成分分割,获得第一mask图像;以及,将所述跟踪目标的图像进行主成分分割,获得第二mask图像;将所述第一mask图像和所述第二mask图像缩放至相同大小;将所述第一mask图像平均分成M个区域;以及,将所述第二mask图像平均分成M个区域,M为正整数;计算所述第一mask图像中每个区域的颜色特征向量;以及,计算所述第二mask图像中每个区域的颜色特征向量;将所述第一mask图像中每个区域的颜色特征向量顺序连接,获得所述第一颜色特征向量;以及,将所述第二mask图像中每个区域的颜色特征向量顺序连接,获得所述第二颜色特征向量。Performing main component segmentation on the image of the first candidate target to obtain a first mask image; and performing principal component segmentation on the image of the tracking target to obtain a second mask image; and the first mask image and the first mask image Dividing the second mask image to the same size; dividing the first mask image into M regions; and dividing the second mask image into M regions, M is a positive integer; calculating the first mask image a color feature vector of each region; and calculating a color feature vector of each region in the second mask image; sequentially connecting color feature vectors of each region in the first mask image to obtain the first color a feature vector; and, sequentially connecting the color feature vectors of each of the regions in the second mask image to obtain the second color feature vector.
  16. 如权利要求15所述的电子设备,其中,所述第一计算子单元,还配置为:The electronic device of claim 15, wherein the first computing subunit is further configured to:
    确定W种主颜色,W为正整数;计算所述第一mask图像中第一区域中每个像素在每种主颜色上的投影权重,所述第一区域是所述第一mask图 像中的M个区域中的任一区域;以及,计算所述第二mask图像中第二区域中每个像素在每种主颜色上的投影权重,所述第二区域是所述第二mask图像中的M个区域中的任一区域;基于所述第一区域中每个像素在每种主颜色上的投影权重,获得所述第一区域中每个像素对应的W维颜色特征向量;以及,基于所述第二区域中每个像素在每种主颜色上的投影权重,获得所述第二区域中每个像素对应W维颜色特征向量;对所述第一区域中每个像素对应的W维颜色特征向量进行归一化,获得所述第一区域中每个像素的颜色特征向量;以及,对所述第二区域中每个像素对应的W维颜色特征向量进行归一化,获得所述第二区域中每个像素的颜色特征向量;将所述第一区域中每个像素的颜色特征向量相加,获得所述第一区域的颜色特征向量;以及,将所述第二区域中每个像素的颜色特征向量相加,获得所述第二区域的颜色特征向量。Determining a W main color, W being a positive integer; calculating a projection weight of each pixel in the first region of the first mask image on each of the main colors, the first region being the first mask map Any one of the M regions in the image; and calculating a projection weight of each pixel in the second region of the second mask image on each of the main colors, the second region being the second mask Any one of the M regions in the image; obtaining a W-dimensional color feature vector corresponding to each pixel in the first region based on a projection weight of each pixel in each of the first regions; And obtaining, according to a projection weight of each pixel in each of the second colors in the second region, a corresponding one-dimensional W-dimensional color feature vector in the second region; corresponding to each pixel in the first region The W-dimensional color feature vector is normalized to obtain a color feature vector of each pixel in the first region; and normalizing the W-dimensional color feature vector corresponding to each pixel in the second region, Obtaining a color feature vector of each pixel in the second region; adding a color feature vector of each pixel in the first region to obtain a color feature vector of the first region; and, the second Each pixel in the area Adding the color feature vector, the feature vector to obtain a color of the second region.
  17. 如权利要求16所述的电子设备,其中,所述第一计算子单元,还配置为基于如下等式,计算第一像素在每n种主颜色上的投影权重:The electronic device of claim 16, wherein the first computing subunit is further configured to calculate a projection weight of the first pixel on each n primary colors based on the following equation:
    Figure PCTCN2017110577-appb-100002
    Figure PCTCN2017110577-appb-100002
    其中,所述第一像素为所述第一区域或所述第二区域中的任一像素,所述第n种主颜色是所述W种主颜色中的任意一种主颜色,wn为所述第一像素在所述第n种主颜色上的投影权重,Ir,、Ig,、Ib为所述第一像素的RGB值;Rn、Gn、Bn为所述第n种主颜色的RGB值。The first pixel is any one of the first region or the second region, and the nth main color is any one of the W main colors, and w n is a projection weight of the first pixel on the nth main color, I r , I g , and I b are RGB values of the first pixel; R n , G n , B n are the The RGB values of the n main colors.
  18. 如权利要求11所述的电子设备,其中,所述计算单元,包括:The electronic device of claim 11, wherein the computing unit comprises:
    第二选择子单元,配置为从所述多个候选目标中选出第一候选目标,其中,所述第一候选目标是所述多个候选目标中的任一候选目标;a second selection subunit, configured to select a first candidate target from the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
    归一化子单元,配置为将所述第一候选目标的图像与所述跟踪目标的图像归一化至相同大小;a normalization subunit configured to normalize an image of the first candidate target to an image of the tracking target to the same size;
    第一输入子单元,配置为将所述跟踪目标的图像输入至第一深度神经 网络的第一卷积网络中进行特征计算,获得所述跟踪目标的特征向量,其中,所述第一深度神经网络基于Siamese结构;a first input subunit configured to input an image of the tracking target to a first deep nerve Feature calculation is performed in a first convolutional network of the network to obtain a feature vector of the tracking target, wherein the first depth neural network is based on a Siamese structure;
    第二输入子单元,配置为将所述第一候选目标的图像输入至所述第一深度神经网络的第二卷积网络中进行特征计算,获得所述第一候选目标的特征向量;a second input subunit, configured to input an image of the first candidate target into a second convolution network of the first depth neural network to perform feature calculation, to obtain a feature vector of the first candidate target;
    第三输入子单元,配置为将所述跟踪目标的特征向量和所述第一候选目标的特征向量输入至所述第一深度神经网络的第一全连接网络中进行相似度计算,获得所述第一候选目标与所述跟踪目标的相似度。a third input subunit, configured to input a feature vector of the tracking target and a feature vector of the first candidate target into a first fully connected network of the first depth neural network to perform a similarity calculation, to obtain the The similarity between the first candidate target and the tracking target.
  19. 如权利要求13所述的电子设备,其中,所述第三确定子单元,还配置为:The electronic device of claim 13, wherein the third determining subunit is further configured to:
    将所述第i图像块输入至第二深度神经网络的第三卷积网络中进行特征计算,获得所述第i图像块的特征图,其中,所述第二深度神经网络基于Siamese结构;将所述第i图像块的特征图输入至所述第二深度神经网络的RPN网络中,获得所述多个候选目标以及所述多个候选目标的特征向量。Inputting the ith image block into a third convolutional network of the second depth neural network to perform feature calculation to obtain a feature map of the ith image block, wherein the second depth neural network is based on a Siamese structure; The feature map of the ith image block is input to an RPN network of the second depth neural network, and the plurality of candidate targets and feature vectors of the plurality of candidate targets are obtained.
  20. 如权利要求19所述的电子设备,其中,所述计算单元,包括:The electronic device of claim 19, wherein the computing unit comprises:
    提取子单元,配置为从所述多个候选目标的特征向量中提取第一候选目标的特征向量,其中,所述第一候选目标为所述多个候选目标中的任一候选目标;Extracting a subunit, configured to extract a feature vector of the first candidate target from the feature vectors of the plurality of candidate targets, wherein the first candidate target is any one of the plurality of candidate targets;
    第四输入子单元,配置为将所述跟踪目标的图像输入至所述第二深度神经网络的第四卷积网络中进行特征计算,获得所述跟踪目标的特征向量,所述第四卷积网络和所述第三卷积网络共享卷积层参数;a fourth input subunit, configured to input an image of the tracking target into a fourth convolution network of the second depth neural network for feature calculation, to obtain a feature vector of the tracking target, the fourth convolution Sharing a convolution layer parameter with the network and the third convolutional network;
    第五输入子单元,配置为将所述跟踪目标的特征向量和所述第一候选目标的特征向量输入至所述第二深度神经网络的第二全连接网络中进行相似度计算,获得所述第一候选目标与所述跟踪目标的相似度。a fifth input subunit, configured to input a feature vector of the tracking target and a feature vector of the first candidate target into a second fully connected network of the second depth neural network to perform a similarity calculation, to obtain the The similarity between the first candidate target and the tracking target.
  21. 一种电子设备,其中,包括:处理器和用于存储能够在处理器上 运行的计算机程序的存储器,其中,所述处理器用于运行所述计算机程序时,执行权利要求1至10所述方法的步骤。An electronic device, comprising: a processor and for storing on a processor A memory of a running computer program, wherein the processor is operative to perform the steps of the method of claims 1 to 10 when the computer program is run.
  22. 一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现权利要求1至10所述方法的步骤。 A computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the method of claims 1 to 10.
PCT/CN2017/110577 2016-11-11 2017-11-10 Target tracking method, electronic device, and storage medium WO2018086607A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611041675.6A CN106650630B (en) 2016-11-11 2016-11-11 A kind of method for tracking target and electronic equipment
CN201611041675.6 2016-11-11

Publications (1)

Publication Number Publication Date
WO2018086607A1 true WO2018086607A1 (en) 2018-05-17

Family

ID=58811573

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/110577 WO2018086607A1 (en) 2016-11-11 2017-11-10 Target tracking method, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN106650630B (en)
WO (1) WO2018086607A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335289A (en) * 2019-06-13 2019-10-15 河海大学 A kind of method for tracking target based on on-line study
CN110544268A (en) * 2019-07-29 2019-12-06 燕山大学 Multi-target tracking method based on structured light and SiamMask network
CN110570460A (en) * 2019-09-06 2019-12-13 腾讯云计算(北京)有限责任公司 Target tracking method and device, computer equipment and computer readable storage medium
CN110766720A (en) * 2019-09-23 2020-02-07 盐城吉大智能终端产业研究院有限公司 Multi-camera vehicle tracking system based on deep learning
CN110889718A (en) * 2019-11-15 2020-03-17 腾讯科技(深圳)有限公司 Method and apparatus for screening program, medium, and electronic device
CN111105436A (en) * 2018-10-26 2020-05-05 曜科智能科技(上海)有限公司 Target tracking method, computer device, and storage medium
CN111428539A (en) * 2019-01-09 2020-07-17 成都通甲优博科技有限责任公司 Target tracking method and device
CN111598928A (en) * 2020-05-22 2020-08-28 郑州轻工业大学 Abrupt change moving target tracking method based on semantic evaluation and region suggestion
CN111783878A (en) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 Target detection method and device, electronic equipment and readable storage medium
CN111814905A (en) * 2020-07-23 2020-10-23 上海眼控科技股份有限公司 Target detection method, target detection device, computer equipment and storage medium
CN111914890A (en) * 2020-06-23 2020-11-10 北京迈格威科技有限公司 Image block matching method between images, image registration method and product
CN112037256A (en) * 2020-08-17 2020-12-04 中电科新型智慧城市研究院有限公司 Target tracking method and device, terminal equipment and computer readable storage medium
US20210271892A1 (en) * 2019-04-26 2021-09-02 Tencent Technology (Shenzhen) Company Limited Action recognition method and apparatus, and human-machine interaction method and apparatus
CN113538507A (en) * 2020-04-15 2021-10-22 南京大学 Single-target tracking method based on full convolution network online training
CN114491131A (en) * 2022-01-24 2022-05-13 北京至简墨奇科技有限公司 Method and device for reordering candidate images and electronic equipment

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650630B (en) * 2016-11-11 2019-08-23 纳恩博(北京)科技有限公司 A kind of method for tracking target and electronic equipment
CN107346413A (en) * 2017-05-16 2017-11-14 北京建筑大学 Traffic sign recognition method and system in a kind of streetscape image
CN109214238B (en) * 2017-06-30 2022-06-28 阿波罗智能技术(北京)有限公司 Multi-target tracking method, device, equipment and storage medium
CN107168343B (en) * 2017-07-14 2020-09-15 灵动科技(北京)有限公司 Control method of luggage case and luggage case
CN107292284B (en) * 2017-07-14 2020-02-28 成都通甲优博科技有限责任公司 Target re-detection method and device and unmanned aerial vehicle
US10592786B2 (en) 2017-08-14 2020-03-17 Huawei Technologies Co., Ltd. Generating labeled data for deep object tracking
CN107481265B (en) * 2017-08-17 2020-05-19 成都通甲优博科技有限责任公司 Target relocation method and device
CN108230359B (en) * 2017-11-12 2021-01-26 北京市商汤科技开发有限公司 Object detection method and apparatus, training method, electronic device, program, and medium
CN108229456B (en) * 2017-11-22 2021-05-18 深圳市商汤科技有限公司 Target tracking method and device, electronic equipment and computer storage medium
CN108171112B (en) * 2017-12-01 2021-06-01 西安电子科技大学 Vehicle identification and tracking method based on convolutional neural network
CN108133197B (en) * 2018-01-05 2021-02-05 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN110163029B (en) * 2018-02-11 2021-03-30 中兴飞流信息科技有限公司 Image recognition method, electronic equipment and computer readable storage medium
CN108416780B (en) * 2018-03-27 2021-08-31 福州大学 Object detection and matching method based on twin-region-of-interest pooling model
CN108491816A (en) * 2018-03-30 2018-09-04 百度在线网络技术(北京)有限公司 The method and apparatus for carrying out target following in video
CN108665485B (en) * 2018-04-16 2021-07-02 华中科技大学 Target tracking method based on relevant filtering and twin convolution network fusion
CN108596957B (en) * 2018-04-26 2022-07-22 北京小米移动软件有限公司 Object tracking method and device
CN108898620B (en) * 2018-06-14 2021-06-18 厦门大学 Target tracking method based on multiple twin neural networks and regional neural network
CN109118519A (en) * 2018-07-26 2019-01-01 北京纵目安驰智能科技有限公司 Target Re-ID method, system, terminal and the storage medium of Case-based Reasoning segmentation
CN109614907B (en) * 2018-11-28 2022-04-19 安徽大学 Pedestrian re-identification method and device based on feature-enhanced guided convolutional neural network
CN109685805B (en) * 2019-01-09 2021-01-26 银河水滴科技(北京)有限公司 Image segmentation method and device
CN111428535A (en) * 2019-01-09 2020-07-17 佳能株式会社 Image processing apparatus and method, and image processing system
CN111524159A (en) * 2019-02-01 2020-08-11 北京京东尚科信息技术有限公司 Image processing method and apparatus, storage medium, and processor
CN110147768B (en) * 2019-05-22 2021-05-28 云南大学 Target tracking method and device
CN112347817B (en) * 2019-08-08 2022-05-17 魔门塔(苏州)科技有限公司 Video target detection and tracking method and device
CN112800811B (en) * 2019-11-13 2023-10-13 深圳市优必选科技股份有限公司 Color block tracking method and device and terminal equipment
CN111178284A (en) * 2019-12-31 2020-05-19 珠海大横琴科技发展有限公司 Pedestrian re-identification method and system based on spatio-temporal union model of map data
CN111524162B (en) * 2020-04-15 2022-04-01 上海摩象网络科技有限公司 Method and device for retrieving tracking target and handheld camera
WO2022061615A1 (en) * 2020-09-23 2022-03-31 深圳市大疆创新科技有限公司 Method and apparatus for determining target to be followed, system, device, and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090019149A1 (en) * 2005-08-02 2009-01-15 Mobixell Networks Content distribution and tracking
AU2011265494A1 (en) * 2011-12-22 2013-07-11 Canon Kabushiki Kaisha Kernalized contextual feature
CN103218798A (en) * 2012-01-19 2013-07-24 索尼公司 Device and method of image processing
CN103339655A (en) * 2011-02-03 2013-10-02 株式会社理光 Image capturing apparatus, image capturing method, and computer program product
CN103679743A (en) * 2012-09-06 2014-03-26 索尼公司 Target tracking device and method as well as camera
CN105184778A (en) * 2015-08-25 2015-12-23 广州视源电子科技股份有限公司 Detection method and apparatus
CN106650630A (en) * 2016-11-11 2017-05-10 纳恩博(北京)科技有限公司 Target tracking method and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090019149A1 (en) * 2005-08-02 2009-01-15 Mobixell Networks Content distribution and tracking
CN103339655A (en) * 2011-02-03 2013-10-02 株式会社理光 Image capturing apparatus, image capturing method, and computer program product
AU2011265494A1 (en) * 2011-12-22 2013-07-11 Canon Kabushiki Kaisha Kernalized contextual feature
CN103218798A (en) * 2012-01-19 2013-07-24 索尼公司 Device and method of image processing
CN103679743A (en) * 2012-09-06 2014-03-26 索尼公司 Target tracking device and method as well as camera
CN105184778A (en) * 2015-08-25 2015-12-23 广州视源电子科技股份有限公司 Detection method and apparatus
CN106650630A (en) * 2016-11-11 2017-05-10 纳恩博(北京)科技有限公司 Target tracking method and electronic equipment

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105436A (en) * 2018-10-26 2020-05-05 曜科智能科技(上海)有限公司 Target tracking method, computer device, and storage medium
CN111105436B (en) * 2018-10-26 2023-05-09 曜科智能科技(上海)有限公司 Target tracking method, computer device and storage medium
CN111428539A (en) * 2019-01-09 2020-07-17 成都通甲优博科技有限责任公司 Target tracking method and device
US20210271892A1 (en) * 2019-04-26 2021-09-02 Tencent Technology (Shenzhen) Company Limited Action recognition method and apparatus, and human-machine interaction method and apparatus
US11710351B2 (en) * 2019-04-26 2023-07-25 Tencent Technology (Shenzhen) Company Limited Action recognition method and apparatus, and human-machine interaction method and apparatus
CN110335289A (en) * 2019-06-13 2019-10-15 河海大学 A kind of method for tracking target based on on-line study
CN110335289B (en) * 2019-06-13 2022-08-05 河海大学 Target tracking method based on online learning
CN110544268A (en) * 2019-07-29 2019-12-06 燕山大学 Multi-target tracking method based on structured light and SiamMask network
CN110544268B (en) * 2019-07-29 2023-03-24 燕山大学 Multi-target tracking method based on structured light and SiamMask network
CN110570460A (en) * 2019-09-06 2019-12-13 腾讯云计算(北京)有限责任公司 Target tracking method and device, computer equipment and computer readable storage medium
CN110570460B (en) * 2019-09-06 2024-02-13 腾讯云计算(北京)有限责任公司 Target tracking method, device, computer equipment and computer readable storage medium
CN110766720A (en) * 2019-09-23 2020-02-07 盐城吉大智能终端产业研究院有限公司 Multi-camera vehicle tracking system based on deep learning
CN110889718A (en) * 2019-11-15 2020-03-17 腾讯科技(深圳)有限公司 Method and apparatus for screening program, medium, and electronic device
CN110889718B (en) * 2019-11-15 2024-05-14 腾讯科技(深圳)有限公司 Scheme screening method, scheme screening device, medium and electronic equipment
CN113538507B (en) * 2020-04-15 2023-11-17 南京大学 Single-target tracking method based on full convolution network online training
CN113538507A (en) * 2020-04-15 2021-10-22 南京大学 Single-target tracking method based on full convolution network online training
CN111598928B (en) * 2020-05-22 2023-03-10 郑州轻工业大学 Abrupt motion target tracking method based on semantic evaluation and region suggestion
CN111598928A (en) * 2020-05-22 2020-08-28 郑州轻工业大学 Abrupt change moving target tracking method based on semantic evaluation and region suggestion
CN111914890B (en) * 2020-06-23 2024-05-14 北京迈格威科技有限公司 Image block matching method between images, image registration method and product
CN111914890A (en) * 2020-06-23 2020-11-10 北京迈格威科技有限公司 Image block matching method between images, image registration method and product
CN111783878A (en) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 Target detection method and device, electronic equipment and readable storage medium
CN111783878B (en) * 2020-06-29 2023-08-04 北京百度网讯科技有限公司 Target detection method, target detection device, electronic equipment and readable storage medium
CN111814905A (en) * 2020-07-23 2020-10-23 上海眼控科技股份有限公司 Target detection method, target detection device, computer equipment and storage medium
CN112037256A (en) * 2020-08-17 2020-12-04 中电科新型智慧城市研究院有限公司 Target tracking method and device, terminal equipment and computer readable storage medium
CN114491131B (en) * 2022-01-24 2023-04-18 北京至简墨奇科技有限公司 Method and device for reordering candidate images and electronic equipment
CN114491131A (en) * 2022-01-24 2022-05-13 北京至简墨奇科技有限公司 Method and device for reordering candidate images and electronic equipment

Also Published As

Publication number Publication date
CN106650630B (en) 2019-08-23
CN106650630A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
WO2018086607A1 (en) Target tracking method, electronic device, and storage medium
Wang et al. Detect globally, refine locally: A novel approach to saliency detection
Liu et al. Joint face alignment and 3d face reconstruction
US10776936B2 (en) Point cloud matching method
WO2022134337A1 (en) Face occlusion detection method and system, device, and storage medium
CN110909651B (en) Method, device and equipment for identifying video main body characters and readable storage medium
US11189020B2 (en) Systems and methods for keypoint detection
CN106203242B (en) Similar image identification method and equipment
US8805018B2 (en) Method of detecting facial attributes
EP4099221A1 (en) Face recognition method and apparatus
EP3147827A1 (en) Face recognition method and apparatus
CN109960742B (en) Local information searching method and device
US20160275339A1 (en) System and Method for Detecting and Tracking Facial Features In Images
Ishikura et al. Saliency detection based on multiscale extrema of local perceptual color differences
CN107316029B (en) A kind of living body verification method and equipment
US10489636B2 (en) Lip movement capturing method and device, and storage medium
CN109271930B (en) Micro-expression recognition method, device and storage medium
US9129152B2 (en) Exemplar-based feature weighting
JP2005327076A (en) Parameter estimation method, parameter estimation device and collation method
WO2021137946A1 (en) Forgery detection of face image
CN114155365B (en) Model training method, image processing method and related device
CN111091075A (en) Face recognition method and device, electronic equipment and storage medium
JP2021503139A (en) Image processing equipment, image processing method and image processing program
CN107862680A (en) A kind of target following optimization method based on correlation filter
Ibragimov et al. Accurate landmark-based segmentation by incorporating landmark misdetections

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17870266

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17870266

Country of ref document: EP

Kind code of ref document: A1