WO2021253686A1 - 特征点跟踪训练及跟踪方法、装置、电子设备及存储介质 - Google Patents

特征点跟踪训练及跟踪方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021253686A1
WO2021253686A1 PCT/CN2020/119545 CN2020119545W WO2021253686A1 WO 2021253686 A1 WO2021253686 A1 WO 2021253686A1 CN 2020119545 W CN2020119545 W CN 2020119545W WO 2021253686 A1 WO2021253686 A1 WO 2021253686A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
feature point
coordinates
frame
score map
Prior art date
Application number
PCT/CN2020/119545
Other languages
English (en)
French (fr)
Inventor
罗孙锋
王光甫
陈远鹏
刘帅成
Original Assignee
北京迈格威科技有限公司
成都旷视金智科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京迈格威科技有限公司, 成都旷视金智科技有限公司 filed Critical 北京迈格威科技有限公司
Publication of WO2021253686A1 publication Critical patent/WO2021253686A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular to a method, device, electronic equipment, and storage medium for feature point tracking training and tracking.
  • the feature point tracking algorithm is widely used in the fields of image alignment, SLAM (simultaneous localization and mapping, real-time positioning and map construction), and autonomous driving.
  • the point tracking algorithm predicts the position of a point by calculating the offset of the corresponding point in two adjacent frames of the video.
  • the embodiments of the present disclosure are proposed to provide a feature point tracking training and tracking method, device, electronic device, and storage medium that overcome the above problems or at least partially solve the above problems.
  • a feature point tracking training method including:
  • the network parameters of the twin feature extraction neural network and the feature point tracking neural network are adjusted, and the above steps are cyclically executed until the first loss value converges.
  • a feature point tracking method including:
  • the first tracking coordinates of the feature point coordinates in the target frame are determined.
  • a feature point tracking training device including:
  • the frame to be tracked acquisition module is used to acquire two adjacent frames in the sample video, and use one frame as the initial frame and the other frame as the target frame;
  • the feature point detection module is configured to perform feature point detection on the initial frame to obtain the feature point coordinates of the initial frame
  • the feature extraction module is configured to perform feature extraction on the initial frame and the target frame respectively through the twin feature extraction neural network to obtain the feature tensor corresponding to the initial frame and the feature tensor corresponding to the target frame;
  • the local matching module is used to determine the feature vector corresponding to the feature point coordinates from the feature tensor corresponding to the initial frame, and to perform a local match between the feature vector and the feature tensor corresponding to the target frame to obtain a matching score map ;
  • the feature point tracking module is used to input the matching score map into the feature point tracking neural network to obtain the predicted coordinates corresponding to the feature point coordinates in the matching score map;
  • a first loss calculation module configured to determine a loss value of the predicted coordinate and the coordinate corresponding to the highest score in the matching score map as the first loss value
  • the training control module is configured to adjust the network parameters of the twin feature extraction neural network and the feature point tracking neural network according to the first loss value, and perform the above steps in a loop until the first loss value converges.
  • a feature point tracking device including:
  • the frame to be tracked acquisition module is used to acquire two adjacent frames in the video to be tracked by feature points, and use one of the frames as the initial frame and the other frame as the target frame;
  • the feature point detection module is configured to perform feature point detection on the initial frame to obtain the feature point coordinates of the initial frame
  • the first feature extraction module is configured to perform feature extraction on the initial frame and the target frame respectively through the twin feature extraction neural network to obtain the feature tensor corresponding to the initial frame and the feature tensor corresponding to the target frame;
  • the first local matching module is used to determine the feature vector corresponding to the feature point coordinates from the feature tensor corresponding to the initial frame, and to perform a local match between the feature vector and the feature tensor corresponding to the target frame to obtain the first A matching score map;
  • the first feature point tracking module is configured to input the first matching score map into a feature point tracking neural network to obtain the first predicted coordinates corresponding to the feature point coordinates in the first matching score map;
  • the first tracking coordinate determination module is configured to determine the first tracking coordinate of the characteristic point coordinate in the target frame according to the first predicted coordinate and the characteristic point coordinate.
  • an electronic device including:
  • a memory in which computer-readable codes are stored
  • One or more processors when the computer-readable code is executed by the one or more processors, the electronic device executes the feature point tracking training method as described in the first aspect, or as the second aspect The described feature point tracking method.
  • a computer program including computer-readable code, which when the computer-readable code runs on a computing processing device, causes the computing processing device to execute as described in the first aspect.
  • a seventh aspect of the embodiments of the present disclosure there is provided a computer-readable medium in which the computer program according to the sixth aspect is stored.
  • the feature point tracking training and tracking method, device, electronic device, and storage medium provided by the embodiments of the present disclosure detect feature points of the initial frame in two adjacent frames to obtain the feature point coordinates of the initial frame, and extract nerves through twin features
  • the network performs feature extraction on the initial frame and the target frame respectively, and obtains the feature tensor corresponding to the initial frame and the feature tensor corresponding to the target frame. According to the feature point coordinates, the feature tensor corresponding to the target frame is locally matched to obtain the matching score map.
  • the matching score map is input into the feature point tracking neural network, and the predicted coordinates of the feature point coordinates in the matching score map are obtained, so as to calculate the loss value of the predicted coordinates and the coordinates corresponding to the highest score in the matching score map, and twinning according to the loss value
  • the network parameters of the feature extraction neural network and the feature point tracking neural network are adjusted, so that there is no need to label the sample video, reducing the dependence on data labeling, and it can be directly trained on the real scene data set, which simplifies
  • the training process avoids the training process of the optical flow model in the virtual data set, and can improve the generalization ability of the model.
  • FIG. 1 is a flowchart of the steps of a feature point tracking training method provided by an embodiment of the present disclosure
  • Figure 2 is a structural diagram of a sub-network in the twin feature extraction neural network in an embodiment of the present disclosure
  • Figure 3 is a structural diagram of a feature point tracking neural network in an embodiment of the present disclosure
  • FIG. 4 is a flowchart of steps of a feature point tracking training method provided by an embodiment of the present disclosure
  • FIG. 5 is a flow chart of the steps of a feature point tracking method provided by an embodiment of the present disclosure
  • Fig. 6 is a flow chart of the steps of a feature point tracking method provided by an embodiment of the present disclosure
  • Fig. 7 is a structural block diagram of a feature point tracking training device provided by an embodiment of the present disclosure.
  • FIG. 8 is a structural block diagram of a feature point tracking device provided by an embodiment of the present disclosure.
  • Fig. 9 schematically shows a block diagram of an electronic device for executing the method according to the present disclosure.
  • Fig. 10 schematically shows a storage unit for holding or carrying program codes for implementing the method according to the present disclosure.
  • Fig. 1 is a step flow chart of a feature point tracking training method provided by an embodiment of the present disclosure. As shown in Fig. 1, the method may include:
  • Step 101 Obtain two adjacent frames in a sample video, use one of them as an initial frame, and use the other frame as a target frame.
  • the frame rate of the sample video is greater than the preset frame rate, which can ensure that the brightness of two adjacent frames is consistent, and the offset of the moving point can also be kept within a small range.
  • the initial frame can be the first frame of two adjacent frames
  • the target frame can be the second frame of two adjacent frames; alternatively, the initial frame can also be the second frame of two adjacent frames, and the target frame is adjacent The first of two frames.
  • Step 102 Perform feature point detection on the initial frame to obtain feature point coordinates of the initial frame.
  • Feature point detection is performed on the initial frame through the feature point detection algorithm, and the feature point coordinates of the initial frame are obtained.
  • the feature point coordinates of the multiple feature points in the initial frame will be obtained.
  • the feature point detection algorithm may, for example, use the FAST (Features From Accelerated Segment Test, features of accelerated segment test) algorithm, or other traditional feature point detection algorithms.
  • FAST Features From Accelerated Segment Test, features of accelerated segment test
  • Step 103 Perform feature extraction on the initial frame and the target frame respectively through the twin feature extraction neural network to obtain the feature tensor corresponding to the initial frame and the feature tensor corresponding to the target frame.
  • the twin feature extraction neural network has two sub-networks with the same structure and shared weights. There are two inputs (Input1 and Input2). Each sub-network corresponds to one input, and the two input feeds are entered into two sub-networks respectively.
  • Neural network (Network1 and Network2), these two sub-neural networks respectively map the input to a new space to form a representation of the input in the new space.
  • the twin feature extraction neural network is used to extract features for each pixel in the initial frame and the target frame, and output two 128 with the same width and height as the input image.
  • Dimension tensor that is, each pixel on the original image corresponds to a 128-dimensional vector.
  • the twin feature extraction neural network uses two twin convolutional neural networks sharing weights, the convolutional neural network is a point matching model, and one convolutional neural network is a sub-network of the twin neural network.
  • Figure 2 is a structural diagram of a sub-network in the twin feature extraction neural network in an embodiment of the present disclosure. As shown in Figure 2, a sub-network performs 9-layer convolution processing on the input, and then performs L2 normalization to obtain and Input the corresponding 128-dimensional tensor.
  • the first layer of convolutional layer uses a 3 ⁇ 3 convolution kernel, the input is a 3-channel image, and the output is a 32-channel feature map, using expanded convolution, and the expansion rate (dialation) Is 1;
  • the second convolution layer uses a 3 ⁇ 3 convolution kernel, the input is a 32-channel feature map, and the output is a 32-channel feature map, using dilated convolution, with an expansion rate of 1;
  • the third convolutional layer Using a 3 ⁇ 3 convolution kernel, the input is a 32-channel feature map, and the output is a 64-channel feature map, using dilated convolution, with an expansion rate of 2;
  • the fourth layer of convolution layer uses a 3 ⁇ 3 convolution kernel, The input is a 64-channel feature map, and the output is a 64-channel feature map, using dilated convolution, with an expansion rate of 1.
  • the fifth convolution layer uses a 3 ⁇ 3 convolution kernel, and the input is a 64-channel feature map, and the output is It is a feature map of 128 channels, using dilated convolution, and the expansion rate is 2;
  • the sixth layer of convolution layer uses a 3 ⁇ 3 convolution kernel, the input is a feature map of 128 channels, and the output is a feature map of 128 channels, using expansion Convolution, the expansion rate is 1;
  • the seventh convolution layer uses a 2 ⁇ 2 convolution kernel, the input is a feature map of 128 channels, and the output is a feature map of 128 channels, using dilated convolution, the expansion rate is 2;
  • the eight-layer convolutional layer uses a 2 ⁇ 2 convolution kernel, the input is a 128-channel feature map, and the output is a 128-channel feature map.
  • the expansion convolution is used, and the expansion rate is 2; the ninth convolution layer uses 2 ⁇ 2
  • the convolution kernel of, the input is a feature map of 128 channels, and the output is a feature map of 128 channels.
  • the expansion convolution is used, and the expansion rate is 2.
  • Step 104 Determine the feature vector corresponding to the feature point coordinates from the feature tensor corresponding to the initial frame, and perform a local match between the feature vector and the feature tensor corresponding to the target frame to obtain a matching score map.
  • the local matching of the feature vector with the feature tensor corresponding to the target frame to obtain a matching score map may optionally include: taking the feature point coordinates as the center, in the Extracting a tensor of a preset size from the feature tensor corresponding to the target frame as a matching tensor; calculating the similarity between the feature vector and the matching tensor to obtain a matching score map.
  • the preset size may be 31 ⁇ 31, for example, which can be specifically set as required.
  • the similarity can be cosine similarity or other similarities.
  • Step 105 Input the matching score map into the feature point tracking neural network, and obtain the predicted coordinates corresponding to the feature point coordinates in the matching score map.
  • FIG. 3 is a structural diagram of the feature point tracking neural network in the embodiment of the present disclosure.
  • the feature point tracking neural network is a small convolutional neural network, including two convolutional layers and a fully connected layer And a tanh normalization layer, where the first convolution layer uses a 3 ⁇ 3 convolution kernel, the input is a 1-channel matching score map, the output is a 64-channel feature map, and the convolution stride is 2.
  • the padding is 1;
  • the second convolution layer uses a 3 ⁇ 3 convolution kernel, the input is a 64-channel feature map, and the output is a 64-channel feature map, the convolution step is 1, and the padding is It is 1;
  • the input of the fully connected layer is a feature map with a size of 31 ⁇ 31 and a channel of 64 channels, and the output is two coordinate values of x and y; after the tanh normalization layer is processed, the tracking coordinates are obtained.
  • the method further includes: performing softmax normalization on the matching score map Processing to obtain a normalized score map;
  • the inputting the matching score map into a feature point tracking neural network includes: inputting the normalized score map into a feature point tracking neural network.
  • input the normalized score map into the feature point tracking neural network and obtain the feature point coordinates at Match the corresponding predicted coordinates in the score map.
  • softmax normalization By performing softmax normalization on the matching score graph, the problem that there is no upper and lower limit for the matching score in the matching score map can be avoided.
  • the matching score is limited between 0 and 1 through softmax normalization, which makes the matching score more reasonable. Conducive to network training.
  • Step 106 Determine a loss value of the predicted coordinate and the coordinate corresponding to the highest score in the matching score map as a first loss value.
  • the loss function for calculating the first loss value adopts the L1 loss function.
  • the L1 loss function is also called the minimum absolute value deviation or the minimum absolute value error, which minimizes the sum of the absolute difference between the target value and the estimated value.
  • the corresponding predicted coordinate For each feature point coordinate, the corresponding predicted coordinate will be obtained.
  • the coordinate corresponding to the highest score in the matching score map corresponding to a feature point coordinate is considered as the tracking coordinate of the feature point, and each predicted coordinate and the corresponding matching score are calculated
  • the loss value of the coordinate corresponding to the highest score is used as the first loss value, and the network parameters of the twin feature extraction neural network and the feature point tracking neural network can be adjusted according to the first loss value.
  • Step 107 Judge whether the first loss value converges. If it does not converge, perform step 108. If it converges, then end the training.
  • the first loss value obtained in this training can be compared with the first loss value obtained in the previous training to determine whether the first loss value has converged. If the first loss value obtained in this training is compared with the first loss value obtained in the previous training If the loss value is the same, it is determined that the first loss value converges. If the first loss value obtained in this training is different from the first loss value obtained in the previous training, it is determined that the first loss value does not converge. In addition to comparing the first loss value of two adjacent times to determine whether the first loss value has converged, other methods can be used to determine whether the first loss value has converged. Whether the difference between the values is less than the threshold.
  • Step 108 Adjust the network parameters of the twin feature extraction neural network and the feature point tracking neural network according to the first loss value, and then perform step 101.
  • the first loss value perform back propagation, and adjust the network parameters of the twin feature extraction neural network and feature point tracking neural network, and then perform step 101 to step 108, and perform points again on the acquired adjacent two frames Track training.
  • the feature point tracking training method uses feature point detection on the initial frame in two adjacent frames to obtain the feature point coordinates of the initial frame, and perform feature extraction on the initial frame and the target frame through the twin feature extraction neural network. , Get the feature tensor corresponding to the initial frame and the feature tensor corresponding to the target frame, according to the feature point coordinates, perform local matching on the feature tensor corresponding to the target frame to obtain the matching score map, and input the matching score map into the feature point tracking neural network , Get the predicted coordinates corresponding to the feature point coordinates in the matching score map, so as to calculate the loss value of the predicted coordinates and the coordinate corresponding to the highest score in the matching score map, and extract the neural network and the feature point tracking neural network for the twin feature according to the loss value
  • the network parameters are adjusted to achieve self-supervised training, which eliminates the need for data labeling of sample videos, reduces the dependence on data labeling, and can directly train on real scene data sets, simplifying training The process avoids the training process of the optical flow model
  • Fig. 4 is a step flow chart of a feature point tracking training method provided by an embodiment of the present disclosure. As shown in Fig. 4, the method may include:
  • Step 401 Obtain two adjacent frames in the sample video, use one of them as the initial frame, and use the other frame as the target frame.
  • Step 402 Perform feature point detection on the initial frame to obtain feature point coordinates of the initial frame.
  • Step 403 Perform feature extraction on the initial frame and the target frame through the twin feature extraction neural network, respectively, to obtain a feature tensor corresponding to the initial frame and a feature tensor corresponding to the target frame.
  • Step 404 Determine the feature vector corresponding to the feature point coordinate from the feature tensor corresponding to the initial frame, and perform a local match between the feature vector and the feature tensor corresponding to the target frame to obtain a matching score map.
  • Step 405 Input the matching score map into the feature point tracking neural network, and obtain the predicted coordinates corresponding to the feature point coordinates in the matching score map.
  • Step 406 Determine the loss value of the predicted coordinate and the coordinate corresponding to the highest score in the matching score map as the first loss value.
  • Step 407 Determine the feature vector corresponding to the prediction coordinate from the feature tensor corresponding to the target frame, and perform a local match between the feature vector and the feature tensor corresponding to the initial frame to obtain a reverse matching score map.
  • the backward tracking can be performed according to the predicted coordinates for correction.
  • a tensor of a preset size is extracted from the feature tensor corresponding to the initial frame as a reverse matching tensor, and the cosine similarity between the feature vector corresponding to the predicted coordinate and the reverse matching tensor is calculated to obtain the reverse matching score map.
  • step 407 and step 406 are not limited to the aforementioned order, and step 407 and step 406 can also be executed at the same time.
  • Step 408 Input the reverse matching score map into the characteristic point tracking neural network to obtain the reverse tracking coordinates of the characteristic point coordinates in the initial frame.
  • Step 409 Calculate the loss value of the backward tracking coordinate and the characteristic point coordinate as a second loss value.
  • the loss function for calculating the second loss value is the same as the loss function for calculating the first loss value, and may also be an L1 loss function.
  • Step 410 Judge whether the first loss value and the second loss value converge, if they do not converge, perform step 411, and if they converge, the training ends.
  • step 411 is executed to adjust the network parameters and train again.
  • Step 411 Adjust the network parameters of the twin feature extraction neural network and the feature point tracking neural network according to the first loss value and the second loss value, and then perform step 401.
  • Steps 401 to 411 are executed to obtain the phases in the sample video again. Two frames next to each other and training.
  • the feature point tracking training method provided in this embodiment is based on the foregoing embodiment. After obtaining the predicted coordinates of the feature point coordinates in the matching score map, the feature vector is extracted on the target frame and the feature frame is extracted on the initial frame. Perform a partial matching and input the feature point tracking neural network to perform a backtracking. The point tracked this time is the backtracking point, and the backtracking coordinates are obtained, and the loss value of the backtracking coordinates and the feature point coordinates is calculated As the second loss value, it is possible to correct the error between the coordinate points corresponding to the highest score obtained in the first local matching and the first tracking, so as to make the network converge quickly, increase the training speed, and improve the accuracy of the feature point tracking result.
  • FIG. 5 is a step flow chart of a feature point tracking method provided by an embodiment of the present disclosure. As shown in FIG. 5, the method may include:
  • Step 501 Obtain two adjacent frames in the video to be tracked by feature points, and use one frame as the initial frame and the other frame as the target frame.
  • Obtain the video to be tracked with feature points obtain two adjacent frames from the video, use one of the frames as the initial frame, and the other frame as the target frame. For example, you can use the first frame as the initial frame and the second frame as the target frame. As the target frame.
  • Step 502 Perform feature point detection on the initial frame to obtain feature point coordinates of the initial frame.
  • the feature point detection algorithm is used to perform feature point detection on the initial frame, and the feature point coordinates of each feature point in the initial frame are obtained.
  • the feature point detection algorithm can be a FAST algorithm or other feature point detection algorithms.
  • Step 503 Perform feature extraction on the initial frame and the target frame respectively through the twin feature extraction neural network to obtain the feature tensor corresponding to the initial frame and the feature tensor corresponding to the target frame.
  • One feature extraction neural network in the twin feature extraction neural network performs feature extraction on each pixel in the initial frame to obtain the feature tensor corresponding to the initial frame, and another feature extraction neural network in the twin feature extraction neural network Perform feature extraction on each pixel in the target frame to obtain the feature tensor corresponding to the target frame.
  • the twin feature extraction neural network has been trained, and it can be obtained through training in the above-mentioned embodiment.
  • the structure of each feature extraction neural network is shown in Figure 2.
  • Step 504 Determine the feature vector corresponding to the feature point coordinate from the feature tensor corresponding to the initial frame, and perform a local match between the feature vector and the feature tensor corresponding to the target frame to obtain a first matching score map.
  • the partial matching of the feature vector with the feature tensor corresponding to the target frame to obtain a first matching score map may optionally include: centering on the feature point coordinates, Extracting a tensor of a preset size from the feature tensor corresponding to the target frame as the first matching tensor; calculating the similarity between the feature vector and the first matching tensor to obtain a first matching score map.
  • the preset size may be 31 ⁇ 31, for example, which can be specifically set as required.
  • the similarity can be cosine similarity or other similarities.
  • Step 505 Input the first matching score map into the feature point tracking neural network, and obtain the first predicted coordinates corresponding to the feature point coordinates in the first matching score map.
  • the first matching score map is input to the feature point tracking neural network, the first matching score map is processed through the feature point tracking neural network, and the first predicted coordinates corresponding to the feature point coordinates in the first matching score map are output.
  • the method further includes: comparing the first matching score map Perform softmax normalization processing to obtain the first normalized score map;
  • the inputting the first matching score map into a feature point tracking neural network includes: inputting the normalized score map into a feature point tracking neural network.
  • input the first normalized score map to feature point tracking The neural network obtains the first predicted coordinates corresponding to the feature point coordinates in the first matching score map.
  • Step 506 Determine the first tracking coordinate of the feature point coordinate in the target frame according to the first predicted coordinate and the feature point coordinate.
  • the first predicted coordinates are relative coordinates with respect to the coordinates of the feature points. Therefore, according to the first predicted coordinates and the coordinates of the feature points, the first predicted coordinates can be converted into tracking coordinates in the target frame, that is, the coordinates of the feature points are obtained in the target frame. The first tracking coordinate.
  • the first predicted coordinate is an offset coordinate relative to the coordinate of the characteristic point
  • determining the first tracking coordinates of the feature point coordinates in the target frame includes: adding the first prediction coordinates and the feature point coordinates to obtain the The first tracking coordinate of the feature point coordinate in the target frame.
  • the first predicted coordinates obtained are offset coordinates relative to the feature point coordinates, so that the first predicted coordinates and the feature point coordinates are added Calculate to obtain the first tracking coordinate of the feature point coordinate in the target frame. For example, if the first prediction coordinates are (2, 5) and the feature point coordinates are (51, 52), the first tracking coordinates of the feature point coordinates in the target frame are obtained as (53, 57).
  • the feature point tracking method by acquiring two adjacent frames in the video to be feature point tracking, one frame is used as the initial frame and the other frame is used as the target frame to perform feature point detection on the initial frame,
  • the feature point coordinates of the initial frame are obtained, and the feature extraction of the initial frame and the target frame are performed respectively through the twin feature extraction neural network, and the feature tensor corresponding to the initial frame and the feature tensor corresponding to the target frame are obtained.
  • the feature vector is extracted from the feature tensor corresponding to the frame and the feature tensor corresponding to the target frame is locally matched to obtain the first matching score map.
  • the first matching score map is input to the feature point tracking neural network, and feature point tracking is performed to obtain the feature point coordinates According to the corresponding first prediction coordinates in the first matching score map, the tracking coordinates of the feature point coordinates in the target frame are determined according to the first prediction coordinates and the feature point coordinates, that is, the local matching score map is calculated for the feature points through deep learning, and then Using score map regression to predict the tracking coordinates of feature points in the target frame improves the accuracy of feature point tracking and solves the problem that traditional LK algorithms are difficult to track dense and similar corner points.
  • Fig. 6 is a step flow chart of a feature point tracking method provided by an embodiment of the present disclosure. As shown in Fig. 6, the method may include:
  • Step 601 Obtain two adjacent frames in the video to be tracked by feature points, and use one of them as the initial frame and the other frame as the target frame.
  • Step 602 Perform feature point detection on the initial frame to obtain feature point coordinates of the initial frame.
  • Step 603 Perform feature extraction on the initial frame and the target frame through the twin feature extraction neural network, respectively, to obtain a feature tensor corresponding to the initial frame and a feature tensor corresponding to the target frame.
  • Step 604 Determine the feature vector corresponding to the feature point coordinate from the feature tensor corresponding to the initial frame, and perform a local match between the feature vector and the feature tensor corresponding to the target frame to obtain a first matching score map.
  • Step 605 Input the first matching score map into the feature point tracking neural network to obtain the first predicted coordinates corresponding to the feature point coordinates in the first matching score map.
  • Step 606 Determine the first tracking coordinate of the feature point coordinate in the target frame according to the first predicted coordinate and the feature point coordinate.
  • Step 607 Determine the feature vector corresponding to the first tracking coordinate from the feature tensor corresponding to the initial frame, and perform a local match between the feature vector and the feature tensor corresponding to the target frame to obtain a second matching score map.
  • the feature points with larger displacements can be tracked more accurately. That is, after determining the first tracking coordinate of the feature point coordinate in the target frame, perform cascade tracking again, and follow the first tracking coordinate as the standard, that is, determine the feature corresponding to the first tracking coordinate from the feature tensor corresponding to the initial frame Vector, with the first tracking coordinate as the center, determine the feature tensor of the preset size from the feature tensor corresponding to the target frame, as the second matching tensor, calculate the similarity between the second matching tensor and the feature vector corresponding to the first tracking coordinate Degree, the second matching score map is obtained.
  • Step 608 Input the second matching score map into the feature point tracking neural network to obtain second predicted coordinates corresponding to the feature point coordinates in the second matching score map.
  • Step 609 Determine the second tracking coordinate of the feature point coordinate in the target frame according to the first tracking coordinate and the second predicted coordinate.
  • the second predicted coordinate is an offset coordinate relative to the first tracking coordinate, so that the first tracking coordinate and the second predicted coordinate are added to calculate, and the second predicted coordinate can be converted into the second tracking coordinate in the target frame, That is, the second tracking coordinate of the characteristic point coordinate in the target frame is obtained, and the second tracking coordinate is used as the tracking result of the characteristic point in the target frame.
  • a multi-stage cascading prediction method can also be adopted, that is, multi-stage cascading is performed on local matching and feature point tracking to improve the accuracy of tracking larger displacement points.
  • the feature point tracking method provided by this embodiment performs a cascaded tracking after obtaining the first tracking coordinate of the feature point coordinate in the target frame, which can track the feature point with larger displacement and improve the tracking performance of the larger displacement point. accuracy.
  • FIG. 7 is a structural block diagram of a feature point tracking training device provided by an embodiment of the present disclosure. As shown in FIG. 7, the feature point tracking training device may include:
  • the to-be-tracked frame acquisition module 701 is used to acquire two adjacent frames in the sample video, using one frame as the initial frame and the other frame as the target frame;
  • the feature point detection module 702 is configured to perform feature point detection on the initial frame to obtain the feature point coordinates of the initial frame;
  • the feature extraction module 703 is configured to perform feature extraction on the initial frame and the target frame respectively through the twin feature extraction neural network to obtain the feature tensor corresponding to the initial frame and the feature tensor corresponding to the target frame;
  • the local matching module 704 is configured to determine the feature vector corresponding to the feature point coordinates from the feature tensor corresponding to the initial frame, and perform a local match between the feature vector and the feature tensor corresponding to the target frame to obtain a matching score picture;
  • the feature point tracking module 705 is configured to input the matching score map into the feature point tracking neural network to obtain the predicted coordinates corresponding to the feature point coordinates in the matching score map;
  • the first loss calculation module 706 is configured to determine the loss value of the predicted coordinate and the coordinate corresponding to the highest score in the matching score map as the first loss value;
  • the training control module 707 is configured to adjust the network parameters of the twin feature extraction neural network and the feature point tracking neural network according to the first loss value, and perform the above steps cyclically until the first loss value converges.
  • the device further includes:
  • the reverse local matching module is used to determine the feature vector corresponding to the prediction coordinate from the feature tensor corresponding to the target frame, and perform a local matching of the feature vector with the feature tensor corresponding to the initial frame to obtain a reverse matching Score chart
  • a backtracking module configured to input the reverse matching score map into the feature point tracking neural network to obtain the backtracking coordinates of the feature point coordinates in the initial frame;
  • the second loss calculation module is used to calculate the loss value of the backward tracking coordinates and the feature point coordinates as the second loss value
  • the training control module is specifically used for:
  • the first loss value and the second loss value adjust the network parameters of the twin feature extraction neural network and the feature point tracking neural network, and perform the above steps cyclically until the first loss value and the second loss The value converges.
  • the local matching module includes:
  • a matching tensor determining unit configured to extract a tensor of a preset size from the feature tensor corresponding to the target frame with the feature point coordinates as a center, as a matching tensor;
  • the local matching unit is used to calculate the similarity between the feature vector and the matching tensor to obtain a matching score map.
  • the device further includes:
  • a softmax normalization module configured to perform softmax normalization processing on the matching score map to obtain a normalized score map
  • the point tracking module is specifically used for:
  • the normalized score map is input into the feature point tracking neural network, and the predicted coordinates corresponding to the feature point coordinates in the matching score map are obtained.
  • the feature point tracking training device obtains the feature point coordinates of the initial frame through feature point detection of the initial frame in two adjacent frames, and performs feature extraction on the initial frame and the target frame through the twin feature extraction neural network. , Get the feature tensor corresponding to the initial frame and the feature tensor corresponding to the target frame, according to the feature point coordinates, perform local matching on the feature tensor corresponding to the target frame to obtain the matching score map, and input the matching score map into the feature point tracking neural network , Obtain the predicted coordinates corresponding to the feature point coordinates in the matching score map, and calculate the loss value of the predicted coordinates and the coordinate corresponding to the highest score in the matching score map, as the first loss value, and extract the twin feature according to the first loss value
  • the network parameters of the neural network and the feature point tracking neural network are adjusted, so that there is no need to label the sample video, reducing the dependence on data labeling, and it can be directly trained on the real scene data set, which simplifies the training process.
  • FIG. 8 is a structural block diagram of a feature point tracking device provided by an embodiment of the present disclosure. As shown in FIG. 8, the feature point tracking training device may include:
  • the frame to be tracked acquisition module 801 is used to acquire two adjacent frames in the video to be tracked by feature points, and use one of the frames as the initial frame and the other frame as the target frame;
  • the feature point detection module 802 is configured to perform feature point detection on the initial frame to obtain the feature point coordinates of the initial frame;
  • the first feature extraction module 803 is configured to perform feature extraction on the initial frame and the target frame respectively through the twin feature extraction neural network to obtain the feature tensor corresponding to the initial frame and the feature tensor corresponding to the target frame;
  • the first local matching module 804 is configured to determine the feature vector corresponding to the feature point coordinates from the feature tensor corresponding to the initial frame, and perform a local match between the feature vector and the feature tensor corresponding to the target frame to obtain The first matching score map;
  • the first feature point tracking module 805 is configured to input the first matching score map into the feature point tracking neural network to obtain the first predicted coordinates corresponding to the feature point coordinates in the first matching score map;
  • the first tracking coordinate determining module 806 is configured to determine the first tracking coordinate of the characteristic point coordinate in the target frame according to the first predicted coordinate and the characteristic point coordinate.
  • the first local matching module includes:
  • the first matching tensor determining unit is configured to extract a tensor of a preset size from the feature tensor corresponding to the target frame with the feature point coordinates as the center, as the first matching tensor;
  • the first local matching unit is configured to calculate the similarity between the feature vector and the first matching tensor to obtain a first matching score map.
  • the first predicted coordinate is an offset coordinate relative to the coordinate of the characteristic point
  • the first tracking coordinate determination module is specifically configured to:
  • the first predicted coordinates and the feature point coordinates are added together to obtain the first tracking coordinates of the feature point coordinates in the target frame.
  • the device further includes:
  • the first softmax normalization module is configured to perform softmax normalization processing on the first matching score map to obtain a first normalized score map
  • the first feature point tracking module is specifically used for:
  • the normalized score map is input into the feature point tracking neural network to obtain the first predicted coordinates corresponding to the feature point coordinates in the first matching score map.
  • the device further includes:
  • the second local matching module is used to determine the feature vector corresponding to the first tracking coordinate from the feature tensor corresponding to the initial frame, and to perform local matching of the feature vector with the feature tensor corresponding to the target frame to obtain the first Two matching score map;
  • the second feature point tracking module is configured to input the second matching score map into the feature point tracking neural network to obtain the second predicted coordinates corresponding to the feature point coordinates in the second matching score map;
  • the second tracking coordinate determining module is configured to determine the second tracking coordinate of the feature point coordinate in the target frame according to the first tracking coordinate and the second predicted coordinate.
  • the feature point tracking device acquires two adjacent frames in the video to be tracked by feature points, and uses one frame as the initial frame and the other frame as the target frame to perform feature point detection on the initial frame.
  • the feature point coordinates of the initial frame are obtained, and the feature extraction of the initial frame and the target frame are performed respectively through the twin feature extraction neural network, and the feature tensor corresponding to the initial frame and the feature tensor corresponding to the target frame are obtained.
  • the feature vector is extracted from the feature tensor corresponding to the frame and the feature tensor corresponding to the target frame is locally matched to obtain the first matching score map.
  • the first matching score map is input to the feature point tracking neural network, and feature point tracking is performed to obtain the feature point coordinates According to the corresponding first prediction coordinates in the first matching score map, the tracking coordinates of the feature point coordinates in the target frame are determined according to the first prediction coordinates and the feature point coordinates, that is, the local matching score map is calculated for the feature points through deep learning, and then Using score map regression to predict the tracking coordinates of feature points in the target frame improves the accuracy of feature point tracking and solves the problem that traditional LK algorithms are difficult to track dense and similar corner points.
  • the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
  • the device embodiments described above are merely illustrative, where the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement without creative work.
  • the various component embodiments of the present disclosure may be implemented by hardware, or by software modules running on one or more processors, or by a combination of them.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the computing processing device according to the embodiments of the present disclosure.
  • DSP digital signal processor
  • the present disclosure can also be implemented as a device or device program (for example, a computer program and a computer program product) for executing part or all of the methods described herein.
  • Such a program for realizing the present disclosure may be stored on a computer-readable medium, or may have the form of one or more signals.
  • Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.
  • FIG. 9 shows an electronic device that can implement the method according to the present disclosure.
  • the electronic device traditionally includes a processor 910 and a computer program product in the form of a memory 920 or a computer readable medium.
  • the memory 920 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the memory 920 has a storage space 930 for executing program codes 931 of any method steps in the foregoing methods.
  • the storage space 930 for program codes may include various program codes 931 respectively used to implement various steps in the above method. These program codes can be read from or written into one or more computer program products.
  • Such computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards, or floppy disks.
  • Such a computer program product is usually a portable or fixed storage unit as described with reference to FIG. 10.
  • the storage unit may have a storage segment, storage space, etc., arranged similarly to the memory 920 in the computing processing device of FIG. 10.
  • the program code can be compressed in an appropriate form, for example.
  • the storage unit includes computer-readable codes 931', that is, codes that can be read by, for example, a processor such as 910, which, when run by an electronic device, cause the electronic device to execute each of the methods described above. step.
  • the embodiments of the embodiments of the present disclosure may be provided as methods, devices, or computer program products. Therefore, the embodiments of the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the embodiments of the present disclosure may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing terminal equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the instruction device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing terminal equipment, so that a series of operation steps are executed on the computer or other programmable terminal equipment to produce computer-implemented processing, so that the computer or other programmable terminal equipment
  • the instructions executed above provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种特征点跟踪训练及跟踪方法、装置、电子设备及存储介质,该跟踪训练方法包括:获取样本视频中的相邻两帧,将一帧作为初始帧,将另一帧作为目标帧;对初始帧进行特征点检测,得到特征点坐标;通过孪生的特征提取神经网络得到初始帧对应的特征张量和目标帧对应的特征张量;从初始帧对应的特征张量中确定特征点坐标对应的特征向量,并将特征向量与目标帧对应的特征张量进行局部匹配,得到匹配得分图;将匹配得分图输入特征点跟踪神经网络,得到特征点坐标对应的预测坐标;确定预测坐标与匹配得分图中最高得分对应坐标的损失值;根据损失值,对网络参数进行调整,循环执行上述步骤,直至损失值收敛。本公开减少了对数据标注的依赖。

Description

特征点跟踪训练及跟踪方法、装置、电子设备及存储介质
本申请要求在2020年6月16日提交中国专利局、申请号为202010550224.5、发明名称为“特征点跟踪训练及跟踪方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及图像处理技术领域,特别是涉及一种特征点跟踪训练及跟踪方法、装置、电子设备及存储介质。
背景技术
特征点跟踪算法作为一个基础算法,在图像对齐、SLAM(simultaneous localization and mapping,即时定位与地图构建)、自动驾驶等领域得到广泛的应用。点跟踪算法通过计算视频中相邻两帧中对应点的偏移量来预测点的位置。
传统的特征点跟踪算法,如LK算法,基于相邻两帧的灰度图求稀疏光流,对于密集、相似的特征点跟踪效果较差。在深度学习领域,利用基于图像特征点匹配的神经网络也可以实现特征点跟踪,但是这类网络往往是基于大视差的图像数据进行训练,对于跟踪任务来说并不适合。基于光流估计的神经网络也可实现特征点跟踪,但是这类模型往往是在虚拟图像数据集上做预训练,再到真实场景数据集上做第二次训练,这类模型对于数据标注精度要求较高,整个训练过程较为繁琐。
发明内容
鉴于上述问题,提出了本公开实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种特征点跟踪训练及跟踪方法、装置、电子设备及存储介质。
依据本公开实施例的第一方面,提供了一种特征点跟踪训练方法,包括:
获取样本视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧;
对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标;
通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量;
从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到匹配得分图;
将所述匹配得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标;
确定所述预测坐标与所述匹配得分图中最高得分对应的坐标的损失值,作为第一损失值;
根据所述第一损失值,对所述孪生的特征提取神经网络和所述特征点跟踪神经网络的网络参数进行调整,循环执行上述步骤,直至第一损失值收敛。
依据本公开实施例的第二方面,提供了一种特征点跟踪方法,包括:
获取待进行特征点跟踪的视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧;
对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标;
通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量;
从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到第一匹配得分图;
将所述第一匹配得分图输入特征点跟踪神经网络,得到特征点坐标在第一匹配得分图中对应的第一预测坐标;
根据所述第一预测坐标和特征点坐标,确定特征点坐标在所述目标帧中的第一跟踪坐标。
依据本公开实施例的第三方面,提供了一种特征点跟踪训练装置,包括:
待跟踪帧获取模块,用于获取样本视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧;
特征点检测模块,用于对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标;
特征提取模块,用于通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的 特征张量;
局部匹配模块,用于从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到匹配得分图;
特征点跟踪模块,用于将所述匹配得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标;
第一损失计算模块,用于确定所述预测坐标与所述匹配得分图中最高得分对应的坐标的损失值,作为第一损失值;
训练控制模块,用于根据所述第一损失值,对所述孪生的特征提取神经网络和所述特征点跟踪神经网络的网络参数进行调整,循环执行上述步骤,直至第一损失值收敛。
依据本公开实施例的第四方面,提供了一种特征点跟踪装置,包括:
待跟踪帧获取模块,用于获取待进行特征点跟踪的视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧;
特征点检测模块,用于对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标;
第一特征提取模块,用于通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量;
第一局部匹配模块,用于从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到第一匹配得分图;
第一特征点跟踪模块,用于将所述第一匹配得分图输入特征点跟踪神经网络,得到特征点坐标在第一匹配得分图中对应的第一预测坐标;
第一跟踪坐标确定模块,用于根据所述第一预测坐标和特征点坐标,确定特征点坐标在所述目标帧中的第一跟踪坐标。
依据本公开实施例的第五方面,提供了一种电子设备,包括:
存储器,其中存储有计算机可读代码;
一个或多个处理器,当所述计算机可读代码被所述一个或多个处理器执行时,所述电子设备执行如第一方面中所述的特征点跟踪训练方法,或者如第二方面所述的特征点跟踪方法。
依据本公开实施例的第六方面,提供了一种计算机程序,包括计算机可 读代码,当所述计算机可读代码在计算处理设备上运行时,导致所述计算处理设备执行如第一方面所述的特征点跟踪训练方法或者如第二方面所述的特征点跟踪方法。
依据本公开实施例的第七方面,提供了一种计算机可读介质,其中存储了如第六方面所述的计算机程序。
本公开实施例提供的特征点跟踪训练及跟踪方法、装置、电子设备及存储介质,通过对相邻两帧中初始帧进行特征点检测,得到初始帧的特征点坐标,通过孪生的特征提取神经网络分别对初始帧和目标帧进行特征提取,得到初始帧对应的特征张量和目标帧对应的特征张量,根据特征点坐标,对目标帧对应的特征张量进行局部匹配,得到匹配得分图,将匹配得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标,从而计算预测坐标与匹配得分图中最高得分对应的坐标的损失值,根据该损失值对孪生的特征提取神经网络和特征点跟踪神经网络的网络参数进行调整,从而可以不需要对样本视频进行数据标注,减少了对数据标注的依赖,而且可以直接在真实场景数据集上进行训练,简化了训练过程,避免了光流模型在虚拟数据集训练的过程,可以提高模型的泛化能力。
上述说明仅是本公开技术方案的概述,为了能够更清楚了解本公开的技术手段,而可依照说明书的内容予以实施,并且为了让本公开的上述和其它目的、特征和优点能够更明显易懂,以下特举本公开的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本公开的限制。
图1是本公开实施例提供的一种特征点跟踪训练方法的步骤流程图;
图2是本公开实施例中孪生的特征提取神经网络中一个子网络的结构图;
图3是本公开实施例中特征点跟踪神经网络的结构图;
图4是本公开实施例提供的一种特征点跟踪训练方法的步骤流程图;
图5是本公开实施例提供的一种特征点跟踪方法的步骤流程图;
图6是本公开实施例提供的一种特征点跟踪方法的步骤流程图;
图7是本公开实施例提供的一种特征点跟踪训练装置的结构框图;
图8是本公开实施例提供的一种特征点跟踪装置的结构框图;
图9示意性地示出了用于执行根据本公开的方法的电子设备的框图;以及
图10示意性地示出了用于保持或者携带实现根据本公开的方法的程序代码的存储单元。
具体实施例
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
图1是本公开实施例提供的一种特征点跟踪训练方法的步骤流程图,如图1所示,该方法可以包括:
步骤101,获取样本视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧。
从样本视频中读取前后相邻两帧,将其中一帧作为初始帧,将另一帧作为目标帧。其中,样本视频的帧率大于预设帧率,可以保证相邻两帧在亮度上保持一致,移动点的偏移量也可以保持在一个较小的范围内。初始帧可以为相邻两帧中的第一帧,目标帧可以为相邻两帧中的第二帧;或者,初始帧也可以为相邻两帧中的第二帧,目标帧为相邻两帧中的第一帧。
步骤102,对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标。
通过特征点检测算法对初始帧进行特征点检测,得到初始帧的特征点坐标。在初始帧中有多个特征点时,通过进行特征点检测,会得到初始帧中多个特征点的特征点坐标。
其中,特征点检测算法例如可以是使用FAST(Features From Accelerated Segment Test,加速段测试的特征)算法,也可以使用其它传统的特征点检测算法。
步骤103,通过孪生的特征提取神经网络分别对所述初始帧和目标帧进 行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量。
其中,所述孪生的特征提取神经网络有两个结构相同,且共享权值的子网络,有两个输入(Input1 and Input2),每个子网络对应一个输入,将两个输入feed分别进入两个子神经网络(Network1 and Network2),这两个子神经网络分别将输入映射到新的空间,形成输入在新的空间中的表示。
为了使得相邻两帧能够进行准确得特征点匹配,使用孪生的特征提取神经网络分别对初始帧和目标帧中的每个像素点进行特征提取,输出两个宽与高和输入图像一致的128维张量,即原图上每一个像素点对应一个128维的向量。其中,孪生的特征提取神经网络采用两个权值共享的孪生卷积神经网络,卷积神经网络为点匹配模型,一个卷积神经网络为孪生神经网络的子网络。
图2是本公开实施例中孪生的特征提取神经网络中一个子网络的结构图,如图2所示,一个子网络对输入进行9层的卷积处理,之后进行L2归一化,得到与输入对应的128维张量,其中,第一层卷积层采用3×3的卷积核,输入为3通道的图像,输出为32通道的特征图,采用扩张卷积,扩张率(dialation)为1;第二层卷积层采用3×3的卷积核,输入为32通道的特征图,输出为32通道的特征图,采用扩张卷积,扩张率为1;第三层卷积层采用3×3的卷积核,输入为32通道的特征图,输出为64通道的特征图,采用扩张卷积,扩张率为2;第四层卷积层采用3×3的卷积核,输入为64通道的特征图,输出为64通道的特征图,采用扩张卷积,扩张率为1;第五层卷积层采用3×3的卷积核,输入为64通道的特征图,输出为128通道的特征图,采用扩张卷积,扩张率为2;第六层卷积层采用3×3的卷积核,输入为128通道的特征图,输出为128通道的特征图,采用扩张卷积,扩张率为1;第七层卷积层采用2×2的卷积核,输入为128通道的特征图,输出为128通道的特征图,采用扩张卷积,扩张率为2;第八层卷积层采用2×2的卷积核,输入为128通道的特征图,输出为128通道的特征图,采用扩张卷积,扩张率为2;第九层卷积层采用2×2的卷积核,输入为128通道的特征图,输出为128通道的特征图,采用扩张卷积,扩张率为2。
步骤104,从所述初始帧对应的特征张量中确定所述特征点坐标对应的 特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到匹配得分图。
从初始帧对应的特征张量中确定特征点坐标对应的特征向量,以特征点坐标为准,从目标帧对应的特征张量确定特征点坐标周围预设尺寸的张量,作为匹配张量,将特征点坐标对应的特征向量与匹配张量进行匹配,得到匹配得分图。匹配得分图中数值越高的点代表匹配的程度越高。
在本公开的一个实施例中,所述将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到匹配得分图,可选包括:以所述特征点坐标为中心,在所述目标帧对应的特征张量上提取预设尺寸的张量,作为匹配张量;计算所述特征向量与所述匹配张量的相似度,得到匹配得分图。
在进行局部匹配时,首先以特征点坐标为中心,在目标帧对应的特征张量上提取预设尺寸的张量,作为匹配张量,计算特征向量与匹配张量中每一个匹配向量的相似度,从而得到预设尺寸的匹配得分图。以特征点坐标为中心来截取匹配张量,可以得到较为准确的匹配得分图,从而提高点跟踪的准确性。其中,预设尺寸例如可以是31×31,具体可以根据需要设置。相似度可以为余弦相似度,也可以是其他相似度。
步骤105,将所述匹配得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标。
通过特征点跟踪神经网络来实现特征点跟踪,即将匹配得分图输入特征点跟踪神经网络,通过特征点跟踪神经网络对匹配得分图进行处理,输出特征点坐标在匹配得分图中对应的预测坐标。其中,图3是本公开实施例中特征点跟踪神经网络的结构图,如图3所示,特征点跟踪神经网络为一个小型的卷积神经网络,包括两个卷积层、一个全连接层和一个tanh归一化层,其中,第一层卷积层采用3×3的卷积核,输入为1通道的匹配得分图,输出为64通道的特征图,卷积步长(stride)为2,填充度(padding)为1;第二层卷积层采用3×3的卷积核,输入为64通道的特征图,输出为64通道的特征图,卷积步长为1,填充度为1;全连接层的输入为尺寸为31×31且通道为64通道的特征图,输出为x和y两个坐标值;tanh归一化层进行处理后,得到跟踪坐标。
在本公开的一个实施例中,在所述将所述特征向量与所述目标帧对应 的特征张量进行局部匹配,得到匹配得分图之后,还包括:对所述匹配得分图进行softmax归一化处理,得到归一化得分图;
所述将所述匹配得分图输入特征点跟踪神经网络,包括:将所述归一化得分图输入特征点跟踪神经网络。
在得到匹配得分图后,对匹配得分图进行softmax归一化处理,得到归一化得分图,在进行特征点跟踪时,将归一化得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标。通过对匹配得分图进行softmax归一化处理,可以避免匹配得分图中匹配得分高低没有上下限的问题,通过softmax归一化将匹配得分限制在0和1之间,使得匹配得分更加合理,有利于网络的训练。
步骤106,确定所述预测坐标与所述匹配得分图中最高得分对应的坐标的损失值,作为第一损失值。
其中,计算第一损失值的损失函数采用L1损失函数,L1损失函数也称为最小绝对值偏差或最小绝对值误差,它是把目标值与估计值的绝对差值的总和最小化。
对于每个特征点坐标,会得到各自对应的预测坐标,将一个特征点坐标对应的匹配得分图中最高得分对应的坐标认为是该特征点的跟踪坐标,计算每个预测坐标与对应的匹配得分图中最高得分对应的坐标的损失值,将该损失值作为第一损失值,可以根据该第一损失值对孪生的特征提取神经网络和特征点跟踪神经网络的网络参数进行调整。
步骤107,判断第一损失值是否收敛,如果不收敛,则执行步骤108,如果收敛,则结束训练。
可以将本次训练得到的第一损失值与前次训练得到的第一损失值进行对比,判断第一损失值是否收敛,若本次训练得到的第一损失值与前次训练得到的第一损失值相同,则确定第一损失值收敛,若本次训练得到的第一损失值与前次训练得到的第一损失值不同,则确定第一损失值不收敛。除了上述比较相邻两次的第一损失值来判断第一损失值是否收敛外,与可以采取其他方式来进行判断,比如可以判断预设相邻次数训练后的相邻两次的第一损失值的差值是否小于阈值。
步骤108,根据所述第一损失值,对所述孪生的特征提取神经网络和所 述特征点跟踪神经网络的网络参数进行调整,之后执行步骤101。
根据第一损失值,进行反向传播,并对孪生的特征提取神经网络和特征点跟踪神经网络的网络参数进行调整,之后执行步骤101至步骤108,再次对获取到的相邻两帧进行点跟踪训练。
本实施例提供的特征点跟踪训练方法,通过对相邻两帧中初始帧进行特征点检测,得到初始帧的特征点坐标,通过孪生的特征提取神经网络分别对初始帧和目标帧进行特征提取,得到初始帧对应的特征张量和目标帧对应的特征张量,根据特征点坐标,对目标帧对应的特征张量进行局部匹配,得到匹配得分图,将匹配得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标,从而计算预测坐标与匹配得分图中最高得分对应的坐标的损失值,根据该损失值对孪生的特征提取神经网络和特征点跟踪神经网络的网络参数进行调整,实现了通过自监督的方式进行训练,从而可以不需要对样本视频进行数据标注,减少了对数据标注的依赖,而且可以直接在真实场景数据集上进行训练,简化了训练过程,避免了光流模型在虚拟数据集训练的过程,可以提高模型的泛化能力。
图4是本公开实施例提供的一种特征点跟踪训练方法的步骤流程图,如图4所示,该方法可以包括:
步骤401,获取样本视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧。
步骤402,对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标。
步骤403,通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量。
步骤404,从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到匹配得分图。
步骤405,将所述匹配得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标。
步骤406,确定所述预测坐标与所述匹配得分图中最高得分对应的坐标 的损失值,作为第一损失值。
步骤407,从所述目标帧对应的特征张量中确定所述预测坐标对应的特征向量,并将该特征向量与所述初始帧对应的特征张量进行局部匹配,得到反向匹配得分图。
为了避免局部匹配得到的匹配得分图中最高得分对应的坐标不精确,可以再根据预测坐标进行反向跟踪来进行校正。这时,先将预测坐标转换为在目标帧中的坐标,从目标帧对应的特征张量中确定转换后的坐标对应的特征向量,作为预测坐标对应的特征向量,并以转换后的坐标为中心,在初始帧对应的特征张量中提取预设尺寸的张量,作为反向匹配张量,计算预测坐标对应的特征向量与反向匹配张量的余弦相似度,得到反向匹配得分图。
需要说明的是,步骤407和步骤406的执行顺序不限于上述顺序,步骤407和步骤406还可以同时执行。
步骤408,将所述反向匹配得分图输入所述特征点跟踪神经网络,得到特征点坐标在所述初始帧中的反向跟踪坐标。
对反向匹配得分图进行softmax归一化处理,将归一化处理后的反向匹配得分图输入特征点跟踪神经网络,得到特征点坐标在初始帧中的反向跟踪坐标。
步骤409,计算所述反向跟踪坐标与所述特征点坐标的损失值,作为第二损失值。
计算反向跟踪坐标与特征点坐标的误差,作为第二损失值。其中,计算第二损失值的损失函数和计算第一损失值的损失函数相同,也可以为L1损失函数。
步骤410,判断第一损失值和第二损失值是否收敛,若不收敛则执行步骤411,若收敛则结束训练。
将本次训练得到的第一损失值和第二损失值分别与前次训练得到的第一损失值和第二损失值进行对比,确定第一损失值和第二损失值是否均收敛,如果均收敛,则可以结束训练,如果有一个不收敛则执行步骤411,以对网络参数进行调整并再次训练。
步骤411,根据所述第一损失值和第二损失值,对所述孪生的特征提取 神经网络和所述特征点跟踪神经网络的网络参数进行调整,之后执行步骤401。
根据第一损失值和第二损失值,进行反向传播,以对孪生的特征提取神经网络和特征点跟踪神经网络的网络参数进行调整,执行步骤401至步骤411,再次获取样本视频中的相邻两帧并进行训练。
本实施例提供的特征点跟踪训练方法,在上述实施例的基础上,在得到特征点坐标在匹配得分图中的预测坐标后,再在目标帧上提取特征向量并在初始帧上提取特征张量,进行局部匹配并输入特征点跟踪神经网络,进行一次反向跟踪,此次跟踪到的点为反向跟踪点,得到反向跟踪坐标,并计算反向跟踪坐标与特征点坐标的损失值,作为第二损失值,从而可以校正第一次局部匹配得到的最高得分对应坐标点和一次跟踪的误差,从而可以使得网络快速收敛,提高训练速度,并可以提高特征点跟踪结果的精确性。
图5是本公开实施例提供的一种特征点跟踪方法的步骤流程图,如图5所示,该方法可以包括:
步骤501,获取待进行特征点跟踪的视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧。
获取待进行特征点跟踪的视频,从该视频中获取相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧,例如可以将第一帧作为初始帧,将第二帧作为目标帧。
步骤502,对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标。
使用特征点检测算法对初始帧进行特征点检测,得到初始帧中各个特征点的特征点坐标。其中,特征点检测算法可以是FAST算法,也可以是其他特征点检测算法。
步骤503,通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量。
通过孪生的特征提取神经网络中的一个特征提取神经网络对初始帧中的各个像素点进行特征提取,得到初始帧对应的特征张量,通过孪生的特 征提取神经网络中的另一个特征提取神经网络对目标帧中的各个像素点进行特征提取,得到目标帧对应的特征张量。其中,孪生的特征提取神经网络已经训练完成,可以是经过上述实施例训练得到。每个特征提取神经网络的结构如图2所示。
步骤504,从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到第一匹配得分图。
从初始帧对应的特征张量中确定特征点坐标对应的特征向量,以特征点坐标为准,从目标帧对应的特征张量中确定特征点坐标周围预设尺寸的张量,作为匹配张量,将特征点坐标对应的特征向量与匹配张量进行匹配,得到第一匹配得分图。第一匹配得分图中数值越高的点代表匹配的程度越高。
在本公开的一个实施例中,所述将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到第一匹配得分图,可选包括:以所述特征点坐标为中心,在所述目标帧对应的特征张量上提取预设尺寸的张量,作为第一匹配张量;计算所述特征向量与所述第一匹配张量的相似度,得到第一匹配得分图。
在进行局部匹配时,首先以特征点坐标为中心,在目标帧对应的特征张量上提取预设尺寸的张量,作为第一匹配张量,计算特征向量与第一匹配张量中每一个第一匹配向量的相似度,从而得到预设尺寸的第一匹配得分图。以特征点坐标为中心来截取第一匹配张量,可以得到较为准确的第一匹配得分图,从而提高特征点跟踪的准确性。其中,预设尺寸例如可以是31×31,具体可以根据需要设置。相似度可以是余弦相似度,也可以是其他相似度。
步骤505,将所述第一匹配得分图输入特征点跟踪神经网络,得到特征点坐标在第一匹配得分图中对应的第一预测坐标。
将第一匹配得分图输入特征点跟踪神经网络,通过特征点跟踪神经网络对第一匹配得分图进行处理,输出特征点坐标在第一匹配得分图中对应的第一预测坐标。
在本公开的一个实施例中,在所述将所述特征向量与所述目标帧对应 的特征张量进行局部匹配,得到第一匹配得分图之后,还包括:对所述第一匹配得分图进行softmax归一化处理,得到第一归一化得分图;
所述将所述第一匹配得分图输入特征点跟踪神经网络,包括:将所述归一化得分图输入特征点跟踪神经网络。
在得到第一匹配得分图后,对第一匹配得分图进行softmax归一化处理,得到第一归一化得分图,在进行特征点跟踪时,将第一归一化得分图输入特征点跟踪神经网络,得到特征点坐标在第一匹配得分图中对应的第一预测坐标。通过对第一匹配得分图进行softmax归一化处理,可以避免第一匹配得分图中匹配得分高低没有上下限的问题,通过softmax归一化将匹配得分限制在0和1之间,使得匹配得分更加合理。
步骤506,根据所述第一预测坐标和特征点坐标,确定特征点坐标在所述目标帧中的第一跟踪坐标。
第一预测坐标为相对于特征点坐标的相对坐标,从而根据第一预测坐标和特征点坐标,可以将第一预测坐标转换为在目标帧中的跟踪坐标,即得到特征点坐标在目标帧中的第一跟踪坐标。
其中,所述第一预测坐标是相对于所述特征点坐标的偏移坐标;
根据所述第一预测坐标和特征点坐标,确定特征点坐标在所述目标帧中的第一跟踪坐标,包括:将所述第一预测坐标与所述特征点坐标进行相加计算,得到所述特征点坐标在所述目标帧中的第一跟踪坐标。
在进行局部匹配时,由于是以特征点坐标为准进行的局部匹配,从而得到的第一预测坐标是相对于特征点坐标的偏移坐标,从而将第一预测坐标与特征点坐标进行相加计算,得到特征点坐标在目标帧中的第一跟踪坐标。例如,第一预测坐标为(2,5),特征点坐标为(51,52),则得到特征点坐标在目标帧中的第一跟踪坐标为(53,57)。
本实施例提供的特征点跟踪方法,通过获取待进行特征点跟踪的视频中的相邻两帧,将其中一帧作为初始帧,将另一帧作为目标帧,对初始帧进行特征点检测,得到初始帧的特征点坐标,通过孪生的特征提取神经网络分别对初始帧和目标帧进行特征提取,得到初始帧对应的特征张量和目标帧对应的特征张量,根据特征点坐标,从初始帧对应的特征张量中提取特征向量并对目标帧对应的特征张量进行局部匹配,得到第一匹配得分 图,将第一匹配得分图输入特征点跟踪神经网络,进行特征点跟踪,得到特征点坐标在第一匹配得分图中对应的第一预测坐标,从而根据第一预测坐标和特征点坐标确定特征点坐标在目标帧中的跟踪坐标,即通过深度学习对特征点计算局部匹配得分图,再利用得分图回归预测特征点在目标帧中的跟踪坐标,提高了特征点跟踪的准确度,而且解决了传统LK算法难以跟踪密集、相似角点的问题。
图6是本公开实施例提供的一种特征点跟踪方法的步骤流程图,如图6所示,该方法可以包括:
步骤601,获取待进行特征点跟踪的视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧。
步骤602,对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标。
步骤603,通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量。
步骤604,从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到第一匹配得分图。
步骤605,将所述第一匹配得分图输入特征点跟踪神经网络,得到特征点坐标在第一匹配得分图中对应的第一预测坐标。
步骤606,根据所述第一预测坐标和特征点坐标,确定特征点坐标在所述目标帧中的第一跟踪坐标。
步骤607,从所述初始帧对应的特征张量中确定所述第一跟踪坐标对应的特征向量,并将该特征向量与所述目标帧对应的特征张量进行局部匹配,得到第二匹配得分图。
采用两级预测的方式,级联同一个特征点跟踪神经网络,可以更精准地跟踪较大位移的特征点。即在确定特征点坐标在目标帧中的第一跟踪坐标后,再进行一次级联跟踪,以第一跟踪坐标为准进行跟踪,即从初始帧对应的特征张量中确定第一跟踪坐标对应的特征向量,以第一跟踪坐标为中心,从目标帧对应的特征张量中确定预设尺寸的特征张量,作为第二匹 配张量,计算第二匹配张量与第一跟踪坐标对应的特征向量的相似度,得到第二匹配得分图。
步骤608,将所述第二匹配得分图输入所述特征点跟踪神经网络,得到特征点坐标在第二匹配得分图中对应的第二预测坐标。
对第二匹配得分图进行softmax归一化处理,得到第二归一化得分图,将第二归一化得分图输入特征点跟踪神经网络,得到特征点坐标在第二匹配得分图中对应的第二预测坐标。
步骤609,根据所述第一跟踪坐标和第二预测坐标,确定特征点坐标在所述目标帧中的第二跟踪坐标。
第二预测坐标为相对于第一跟踪坐标的偏移坐标,从而将第一跟踪坐标与第二预测坐标进行相加计算,可以将第二预测坐标转换为在目标帧中的第二跟踪坐标,即得到特征点坐标在目标帧中的第二跟踪坐标,将第二跟踪坐标作为特征点在目标帧中的跟踪结果。
需要说明的是,对于较大位移的跟踪点,还可以采取多级级联预测的方式,即对局部匹配和特征点跟踪进行多级级联,以提高较大位移点跟踪的准确性。
本实施例提供的特征点跟踪方法,通过在得到特征点坐标在目标帧中的第一跟踪坐标后,进行一次级联跟踪,可以跟踪较大位移的特征点,提高了较大位移点跟踪的准确性。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开实施例并不受所描述的动作顺序的限制,因为依据本公开实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本公开实施例所必须的。
图7是本公开实施例提供的一种特征点跟踪训练装置的结构框图,如图7所示,该特征点跟踪训练装置可以包括:
待跟踪帧获取模块701,用于获取样本视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧;
特征点检测模块702,用于对所述初始帧进行特征点检测,得到所述初 始帧的特征点坐标;
特征提取模块703,用于通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量;
局部匹配模块704,用于从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到匹配得分图;
特征点跟踪模块705,用于将所述匹配得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标;
第一损失计算模块706,用于确定所述预测坐标与所述匹配得分图中最高得分对应的坐标的损失值,作为第一损失值;
训练控制模块707,用于根据所述第一损失值,对所述孪生的特征提取神经网络和所述特征点跟踪神经网络的网络参数进行调整,循环执行上述步骤,直至第一损失值收敛。
可选的,所述装置还包括:
反向局部匹配模块,用于从所述目标帧对应的特征张量中确定所述预测坐标对应的特征向量,并将该特征向量与所述初始帧对应的特征张量进行局部匹配,得到反向匹配得分图;
反向跟踪模块,用于将所述反向匹配得分图输入所述特征点跟踪神经网络,得到特征点坐标在所述初始帧中的反向跟踪坐标;
第二损失计算模块,用于计算所述反向跟踪坐标与所述特征点坐标的损失值,作为第二损失值;
所述训练控制模块具体用于:
根据所述第一损失值和第二损失值,对所述孪生的特征提取神经网络和所述特征点跟踪神经网络的网络参数进行调整,循环执行上述步骤,直至第一损失值和第二损失值收敛。
可选的,所述局部匹配模块包括:
匹配张量确定单元,用于以所述特征点坐标为中心,在所述目标帧对应的特征张量上提取预设尺寸的张量,作为匹配张量;
局部匹配单元,用于计算所述特征向量与所述匹配张量的相似度,得 到匹配得分图。
可选的,所述装置还包括:
softmax归一化模块,用于对所述匹配得分图进行softmax归一化处理,得到归一化得分图;
所述点跟踪模块具体用于:
将所述归一化得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标。
本实施例提供的特征点跟踪训练装置,通过对相邻两帧中初始帧进行特征点检测,得到初始帧的特征点坐标,通过孪生的特征提取神经网络分别对初始帧和目标帧进行特征提取,得到初始帧对应的特征张量和目标帧对应的特征张量,根据特征点坐标,对目标帧对应的特征张量进行局部匹配,得到匹配得分图,将匹配得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标,从而计算预测坐标与匹配得分图中最高得分对应的坐标的损失值,作为第一损失值,根据该第一损失值对孪生的特征提取神经网络和特征点跟踪神经网络的网络参数进行调整,从而可以不需要对样本视频进行数据标注,减少了对数据标注的依赖,而且可以直接在真实场景数据集上进行训练,简化了训练过程,避免了光流模型在虚拟数据集训练的过程,可以提高模型的泛化能力。
图8是本公开实施例提供的一种特征点跟踪装置的结构框图,如图8所示,该特征点跟踪训练装置可以包括:
待跟踪帧获取模块801,用于获取待进行特征点跟踪的视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧;
特征点检测模块802,用于对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标;
第一特征提取模块803,用于通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量;
第一局部匹配模块804,用于从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到第一匹配得分图;
第一特征点跟踪模块805,用于将所述第一匹配得分图输入特征点跟踪神经网络,得到特征点坐标在第一匹配得分图中对应的第一预测坐标;
第一跟踪坐标确定模块806,用于根据所述第一预测坐标和特征点坐标,确定特征点坐标在所述目标帧中的第一跟踪坐标。
可选的,所述第一局部匹配模块包括:
第一匹配张量确定单元,用于以所述特征点坐标为中心,在所述目标帧对应的特征张量上提取预设尺寸的张量,作为第一匹配张量;
第一局部匹配单元,用于计算所述特征向量与所述第一匹配张量的相似度,得到第一匹配得分图。
可选的,所述第一预测坐标是相对于所述特征点坐标的偏移坐标;
所述第一跟踪坐标确定模块具体用于:
将所述第一预测坐标与所述特征点坐标进行相加计算,得到所述特征点坐标在所述目标帧中的第一跟踪坐标。
可选的,所述装置还包括:
第一softmax归一化模块,用于对所述第一匹配得分图进行softmax归一化处理,得到第一归一化得分图;
所述第一特征点跟踪模块具体用于:
将所述归一化得分图输入特征点跟踪神经网络,得到特征点坐标在第一匹配得分图中对应的第一预测坐标。
可选的,所述装置还包括:
第二局部匹配模块,用于从所述初始帧对应的特征张量中确定所述第一跟踪坐标对应的特征向量,并将该特征向量与所述目标帧对应的特征张量进行局部匹配,得到第二匹配得分图;
第二特征点跟踪模块,用于将所述第二匹配得分图输入所述特征点跟踪神经网络,得到特征点坐标在第二匹配得分图中对应的第二预测坐标;
第二跟踪坐标确定模块,用于根据所述第一跟踪坐标和第二预测坐标,确定特征点坐标在所述目标帧中的第二跟踪坐标。
本实施例提供的特征点跟踪装置,通过获取待进行特征点跟踪的视频中的相邻两帧,将其中一帧作为初始帧,将另一帧作为目标帧,对初始帧进行特征点检测,得到初始帧的特征点坐标,通过孪生的特征提取神经网 络分别对初始帧和目标帧进行特征提取,得到初始帧对应的特征张量和目标帧对应的特征张量,根据特征点坐标,从初始帧对应的特征张量中提取特征向量并对目标帧对应的特征张量进行局部匹配,得到第一匹配得分图,将第一匹配得分图输入特征点跟踪神经网络,进行特征点跟踪,得到特征点坐标在第一匹配得分图中对应的第一预测坐标,从而根据第一预测坐标和特征点坐标确定特征点坐标在目标帧中的跟踪坐标,即通过深度学习对特征点计算局部匹配得分图,再利用得分图回归预测特征点在目标帧中的跟踪坐标,提高了特征点跟踪的准确度,而且解决了传统LK算法难以跟踪密集、相似角点的问题。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
本公开的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本公开实施例的计算处理设备中的一些或者全部部件的一些或者全部功能。本公开还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本公开的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图9示出了可以实现根据本公开的方法的电子设备。该电子设备传统上包括处理器910和以存储器920形式的计算机程序产品或者计算机可读介质。存储器920可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器920具有用于执行上述方法中的任何方法步骤的程序代码931的存储空间930。例如,用于 程序代码的存储空间930可以包括分别用于实现上面的方法中的各种步骤的各个程序代码931。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图10所述的便携式或者固定存储单元。该存储单元可以具有与图10的计算处理设备中的存储器920类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码931’,即可以由例如诸如910之类的处理器读取的代码,这些代码当由电子设备运行时,导致该电子设备执行上面所描述的方法中的各个步骤。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本公开实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本公开实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开实施例是参照根据本公开实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图 一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本公开实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本公开实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本公开所提供的一种特征点跟踪训练及跟踪方法、装置、电子设备及存储介质,进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本公开的方法及其核心思想;同时,对于本领域的一般技术人员,依据本公开的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本公开的限制。

Claims (14)

  1. 一种特征点跟踪训练方法,包括:
    获取样本视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧;
    对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标;
    通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量;
    从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到匹配得分图;
    将所述匹配得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标;
    确定所述预测坐标与所述匹配得分图中最高得分对应的坐标的损失值,作为第一损失值;
    根据所述第一损失值,对所述孪生的特征提取神经网络和所述特征点跟踪神经网络的网络参数进行调整,循环执行上述步骤,直至第一损失值收敛。
  2. 根据权利要求1所述的方法,其中,在所述将所述匹配得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标之后,还包括:
    从所述目标帧对应的特征张量中确定所述预测坐标对应的特征向量,并将该特征向量与所述初始帧对应的特征张量进行局部匹配,得到反向匹配得分图;
    将所述反向匹配得分图输入所述特征点跟踪神经网络,得到特征点坐标在所述初始帧中的反向跟踪坐标;
    计算所述反向跟踪坐标与所述特征点坐标的损失值,作为第二损失值;
    所述根据所述第一损失值,对所述孪生的特征提取神经网络和所述特征点跟踪神经网络的网络参数进行调整,循环执行上述步骤,直至第一损失值收敛,包括:
    根据所述第一损失值和第二损失值,对所述孪生的特征提取神经网络和所述特征点跟踪神经网络的网络参数进行调整,循环执行上述步骤,直至第一损失值和第二损失值收敛。
  3. 根据权利要求1或2所述的方法,其中,所述将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到匹配得分图,包括:
    以所述特征点坐标为中心,在所述目标帧对应的特征张量上提取预设尺寸的张量,作为匹配张量;
    计算所述特征向量与所述匹配张量的相似度,得到匹配得分图。
  4. 根据权利要求1-3任一项所述的方法,其中,在所述将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到匹配得分图之后,还包括:
    对所述匹配得分图进行softmax归一化处理,得到归一化得分图;
    所述将所述匹配得分图输入特征点跟踪神经网络,包括:
    将所述归一化得分图输入特征点跟踪神经网络。
  5. 一种特征点跟踪方法,包括:
    获取待进行特征点跟踪的视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧;
    对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标;
    通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量;
    从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到第一匹配得分图;
    将所述第一匹配得分图输入特征点跟踪神经网络,得到特征点坐标在第一匹配得分图中对应的第一预测坐标;
    根据所述第一预测坐标和特征点坐标,确定特征点坐标在所述目标帧中的第一跟踪坐标。
  6. 根据权利要求5所述的方法,其中,所述将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到第一匹配得分图,包括:
    以所述特征点坐标为中心,在所述目标帧对应的特征张量上提取预设尺寸的张量,作为第一匹配张量;
    计算所述特征向量与所述第一匹配张量的相似度,得到第一匹配得分图。
  7. 根据权利要求5或6所述的方法,其中,所述第一预测坐标是相对于所述特征点坐标的偏移坐标;
    根据所述第一预测坐标和特征点坐标,确定特征点坐标在所述目标帧中的第一跟踪坐标,包括:
    将所述第一预测坐标与所述特征点坐标进行相加计算,得到所述特征点坐标在所述目标帧中的第一跟踪坐标。
  8. 根据权利要求5-7任一项所述的方法,其中,在所述将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到第一匹配得分图之后,还包括:
    对所述第一匹配得分图进行softmax归一化处理,得到第一归一化得分图;
    所述将所述第一匹配得分图输入特征点跟踪神经网络,包括:
    将所述归一化得分图输入特征点跟踪神经网络。
  9. 根据权利要求5-8任一项所述的方法,其中,在所述根据所述第一预测坐标和特征点坐标,确定特征点坐标在所述目标帧中的第一跟踪坐标之后,还包括:
    从所述初始帧对应的特征张量中确定所述第一跟踪坐标对应的特征向量,并将该特征向量与所述目标帧对应的特征张量进行局部匹配,得到第二匹配得分图;
    将所述第二匹配得分图输入所述特征点跟踪神经网络,得到特征点坐标在第二匹配得分图中对应的第二预测坐标;
    根据所述第一跟踪坐标和第二预测坐标,确定特征点坐标在所述目标帧中的第二跟踪坐标。
  10. 一种特征点跟踪训练装置,包括:
    待跟踪帧获取模块,用于获取样本视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧;
    特征点检测模块,用于对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标;
    特征提取模块,用于通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量;
    局部匹配模块,用于从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到匹配得分图;
    特征点跟踪模块,用于将所述匹配得分图输入特征点跟踪神经网络,得到特征点坐标在匹配得分图中对应的预测坐标;
    第一损失计算模块,用于确定所述预测坐标与所述匹配得分图中最高得分对应的坐标的损失值,作为第一损失值;
    训练控制模块,用于根据所述第一损失值,对所述孪生的特征提取神经网络和所述特征点跟踪神经网络的网络参数进行调整,循环执行上述步骤,直至第一损失值收敛。
  11. 一种特征点跟踪装置,包括:
    待跟踪帧获取模块,用于获取待进行特征点跟踪的视频中的相邻两帧,将其中一帧作为初始帧,并将另一帧作为目标帧;
    特征点检测模块,用于对所述初始帧进行特征点检测,得到所述初始帧的特征点坐标;
    第一特征提取模块,用于通过孪生的特征提取神经网络分别对所述初始帧和目标帧进行特征提取,得到所述初始帧对应的特征张量和所述目标帧对应的特征张量;
    第一局部匹配模块,用于从所述初始帧对应的特征张量中确定所述特征点坐标对应的特征向量,并将所述特征向量与所述目标帧对应的特征张量进行局部匹配,得到第一匹配得分图;
    第一点跟踪模块,用于将所述第一匹配得分图输入特征点跟踪神经网络,得到特征点坐标在第一匹配得分图中对应的第一预测坐标;
    第一跟踪坐标确定模块,用于根据所述第一预测坐标和特征点坐标,确定特征点坐标在所述目标帧中的第一跟踪坐标。
  12. 一种电子设备,包括:
    存储器,其中存储有计算机可读代码;
    一个或多个处理器,当所述计算机可读代码被所述一个或多个处理器执行时,所述电子设备执行如权利要求1-4任一项所述的特征点跟踪训练方法,或者如权利要求5-9任一项所述的特征点跟踪方法。
  13. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算处理设备上运行时,导致所述计算处理设备执行根据1-4任一项所述的特征点跟踪训练方法,或者根据权利要求5-9任一项所述的特征点跟踪方法。
  14. 一种计算机可读介质,其中存储了如权利要求13所述的计算机程序。
PCT/CN2020/119545 2020-06-16 2020-09-30 特征点跟踪训练及跟踪方法、装置、电子设备及存储介质 WO2021253686A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010550224.5A CN111914878B (zh) 2020-06-16 2020-06-16 特征点跟踪训练及跟踪方法、装置、电子设备及存储介质
CN202010550224.5 2020-06-16

Publications (1)

Publication Number Publication Date
WO2021253686A1 true WO2021253686A1 (zh) 2021-12-23

Family

ID=73237743

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119545 WO2021253686A1 (zh) 2020-06-16 2020-09-30 特征点跟踪训练及跟踪方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN111914878B (zh)
WO (1) WO2021253686A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842378A (zh) * 2022-04-26 2022-08-02 南京信息技术研究院 一种基于孪生网络的多摄像头单目标追踪方法
CN115497633A (zh) * 2022-10-19 2022-12-20 联仁健康医疗大数据科技股份有限公司 一种数据处理方法、装置、设备及存储介质
CN116385496A (zh) * 2023-05-19 2023-07-04 北京航天时代光电科技有限公司 一种基于图像处理的游泳运动实时测速方法及系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836684B (zh) * 2021-03-09 2023-03-10 上海高德威智能交通系统有限公司 基于辅助驾驶的目标尺度变化率计算方法、装置及设备
CN115393405A (zh) * 2021-05-21 2022-11-25 北京字跳网络技术有限公司 一种图像对齐方法及装置
CN113674218B (zh) * 2021-07-28 2024-06-14 中国科学院自动化研究所 焊缝特征点提取方法、装置、电子设备与存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182695A (zh) * 2017-12-29 2018-06-19 纳恩博(北京)科技有限公司 目标跟踪模型训练方法及装置、电子设备和存储介质
CN110766725A (zh) * 2019-10-31 2020-02-07 北京市商汤科技开发有限公司 模板图像的更新、目标跟踪方法及装置、电子设备及介质
CN110956131A (zh) * 2019-11-27 2020-04-03 北京迈格威科技有限公司 单目标追踪方法、装置及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5864631A (en) * 1992-08-03 1999-01-26 Yamaha Corporation Method and apparatus for musical score recognition with quick processing of image data
US11308350B2 (en) * 2016-11-07 2022-04-19 Qualcomm Incorporated Deep cross-correlation learning for object tracking
CN111260688A (zh) * 2020-01-13 2020-06-09 深圳大学 一种孪生双路目标跟踪方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182695A (zh) * 2017-12-29 2018-06-19 纳恩博(北京)科技有限公司 目标跟踪模型训练方法及装置、电子设备和存储介质
CN110766725A (zh) * 2019-10-31 2020-02-07 北京市商汤科技开发有限公司 模板图像的更新、目标跟踪方法及装置、电子设备及介质
CN110956131A (zh) * 2019-11-27 2020-04-03 北京迈格威科技有限公司 单目标追踪方法、装置及系统

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842378A (zh) * 2022-04-26 2022-08-02 南京信息技术研究院 一种基于孪生网络的多摄像头单目标追踪方法
CN115497633A (zh) * 2022-10-19 2022-12-20 联仁健康医疗大数据科技股份有限公司 一种数据处理方法、装置、设备及存储介质
CN115497633B (zh) * 2022-10-19 2024-01-30 联仁健康医疗大数据科技股份有限公司 一种数据处理方法、装置、设备及存储介质
CN116385496A (zh) * 2023-05-19 2023-07-04 北京航天时代光电科技有限公司 一种基于图像处理的游泳运动实时测速方法及系统

Also Published As

Publication number Publication date
CN111914878B (zh) 2023-10-31
CN111914878A (zh) 2020-11-10

Similar Documents

Publication Publication Date Title
WO2021253686A1 (zh) 特征点跟踪训练及跟踪方法、装置、电子设备及存储介质
Yang et al. SiamCorners: Siamese corner networks for visual tracking
Yang et al. A hybrid data association framework for robust online multi-object tracking
CN113920170B (zh) 结合场景上下文和行人社会关系的行人轨迹预测方法、系统及存储介质
CN109902588B (zh) 一种手势识别方法、装置及计算机可读存储介质
CN111210446A (zh) 一种视频目标分割方法、装置和设备
CN114565808B (zh) 一种面向无监督视觉表示的双动量对比学习方法
Zhang et al. Learning adaptive sparse spatially-regularized correlation filters for visual tracking
CN113628244A (zh) 基于无标注视频训练的目标跟踪方法、系统、终端及介质
Liu et al. Soks: Automatic searching of the optimal kernel shapes for stripe-wise network pruning
Kim et al. Self-supervised keypoint detection based on multi-layer random forest regressor
CN115471771A (zh) 一种基于语义级时序关联建模的视频时序动作定位方法
Niu et al. Boundary-aware RGBD salient object detection with cross-modal feature sampling
Yang et al. PaaRPN: Probabilistic anchor assignment with region proposal network for visual tracking
Wang et al. EMAT: Efficient feature fusion network for visual tracking via optimized multi-head attention
Yang et al. TGAN: A simple model update strategy for visual tracking via template-guidance attention network
Yang et al. IASA: An IoU-aware tracker with adaptive sample assignment
CN117173607A (zh) 多层级融合多目标跟踪方法、系统及计算机可读存储介质
Jiang et al. MultiBSP: multi-branch and multi-scale perception object tracking framework based on siamese CNN
Xue et al. Self-supervised video representation learning by maximizing mutual information
CN112633078B (zh) 目标跟踪自校正方法、系统、介质、设备、终端及应用
Wang et al. Prototype-guided instance matching for multiple pedestrian tracking
Ke et al. Template enhancement and mask generation for siamese tracking
Fan et al. Dual aligned siamese dense regression tracker
CN116580063B (zh) 目标追踪方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20941455

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20941455

Country of ref document: EP

Kind code of ref document: A1