WO2021035807A1 - Target tracking method and device fusing optical flow information and siamese framework - Google Patents

Target tracking method and device fusing optical flow information and siamese framework Download PDF

Info

Publication number
WO2021035807A1
WO2021035807A1 PCT/CN2019/105275 CN2019105275W WO2021035807A1 WO 2021035807 A1 WO2021035807 A1 WO 2021035807A1 CN 2019105275 W CN2019105275 W CN 2019105275W WO 2021035807 A1 WO2021035807 A1 WO 2021035807A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
optical flow
current frame
feature
weight
Prior art date
Application number
PCT/CN2019/105275
Other languages
French (fr)
Chinese (zh)
Inventor
曹文明
李宇鸿
何志权
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Publication of WO2021035807A1 publication Critical patent/WO2021035807A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the present invention relates to the field of image recognition, in particular to a target tracking method and device combining optical flow information and Siamese framework.
  • Tracking algorithm From the generative model algorithm of Kalman, particle filter and feature point matching to the current differential model algorithm based on the correlation filtering framework and Siamese (twin) framework, the accuracy and operation speed of the tracking algorithm are continuously improving.
  • the advantage of the generative model algorithm based on feature point matching is that the model structure is simple, there is no training process, but the calculation accuracy is not high, and the feature points will disappear when there is occlusion; the full convolutional network model algorithm based on the Siamese framework has a fast calculation speed, but only Taking into account the appearance characteristics of the image, it is impossible to track objects with complex backgrounds and violent movements.
  • the present invention proposes a target tracking method and device combining optical flow information and Siamese framework to solve the problem of low calculation accuracy of generative model algorithms based on feature point matching in the prior art and based on Siamese framework.
  • the full convolutional network model algorithm can not track the technical problem of complex background and violently moving objects.
  • a target tracking method fusing optical flow information and Siamese framework including:
  • S101 Obtain the current frame, the current frame is the Nth frame, where N>3, and then obtain the previous three frames of the current frame, namely the N-3th frame, the N-2th frame, and the N-1th frame.
  • Frame N-3, Frame N-2, Frame N-1 and the current frame N respectively use the TVNet optical flow network to calculate the optical flow to obtain Flow1, Flow2, Flow3; and crop Flow1, Flow2, and Flow3 ( Crop) operation to obtain 22 ⁇ 22 optical flow vector diagrams P1, P2, P3; input the current frame into the feature network to obtain a 22 ⁇ 22 current frame feature map F N ; separate the current frame feature map F N with the optical flow vector Figures P1, P2, and P3 are combined, and then warp the combined result to obtain the deformed feature maps F 1 , F 2 , F 3 ;
  • S102 Use the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N as the detection frame input timing scoring model to obtain the feature weights of the candidate detection frames, and combine the features of the candidate detection frames The weight and the candidate detection frame fused with optical flow features are multiplied according to formula (1) to obtain the final detection frame;
  • I i represents the sequence number of the current frame
  • I i refers to the i-th frame of the current frame
  • It is the final detection frame obtained by fusing the optical flow information of other frames in the current frame
  • w j->i represents the feature weight of the candidate detection frame calculated and output by the timing scoring model
  • f j->i is the jth frame
  • the motion information of is mapped to the i-th frame through the optical flow network, and then the resulting optical flow result image and the j-th frame image are warped (Warp) operation;
  • F(I i , I j ) is the optical flow calculation of I i and I j through the optical flow network, and the result obtained realizes the mapping of the motion information in the jth frame to the i-th frame;
  • f j Is the feature map of the i-th frame,
  • W(,) is the result of the optical flow calculation and the I j frame is fused, the fused information is warped (Warp) operation, and it is applied to the feature map positioning of each channel Warp operation with the linear deformation equation;
  • the input of the time series scoring model is the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N in each time period without scoring, and the output is the weight value of the candidate detection frame;
  • the time series scoring model has a pooling layer, where the pooling layer can perform a global average pooling operation and a global maximum pooling operation. Through the global average pooling operation and the global maximum pooling, each candidate detection frame contains The information volume of the object is scored, and the intermediate matrix after the operation is obtained,
  • the global average pooling operation is:
  • G S-GA represents the global average pooling process.
  • q T represents T candidate detection frames
  • q x and q y represent pixels in the feature map
  • H represents the height of the feature map before the input to the global average pooling operation
  • W represents the input to the feature map before the global average pooling operation width
  • the global maximum pooling operation is:
  • G S-GM represents the global maximum pooling process
  • the global average pooling operation outputs a T ⁇ 1 dimensional vector to form a global average pooling intermediate matrix
  • the global maximum pooling operation also outputs a T ⁇ 1 dimensional vector to form the global maximum pooling intermediate matrix.
  • the global average pooling intermediate matrix and the global maximum pooling intermediate matrix are input to the shared network layer, and the correlation between each candidate frame and the current frame is scored; through the shared network layer, the global average pooling and maximum are obtained respectively Value pooled weight matrix, the shared network layer implements convolution operation, and the parameters are obtained by experience or training; then two weight matrixes are added element by element to obtain the weight feature vector; and the weight obtained
  • the feature vector is used as the input of the activation function Relu, and the activation function Relu is:
  • the time series scoring model is trained by the convolutional neural network model according to the loss function.
  • the time series scoring model is trained by a convolutional neural network model according to a loss function, and the loss function is:
  • v represents the true value of each point in the candidate response graph of the image waiting to be trained in the training set
  • y ⁇ +1,-1 ⁇ represents the label of the standard tracking frame
  • the convolutional neural filter in the shared network layer adopts deformable convolution calculation, and adds a variable to the traditional convolution operation area.
  • the learned parameter ⁇ pn is the learned parameter ⁇ pn.
  • a target tracking device integrating optical flow information and Siamese framework, including:
  • Obtain feature module used to obtain the current frame, the current frame is the Nth frame, where N>3, and then obtain the previous three frames of the current frame, which are the N-3th frame, the N-2th frame, and the N-1th frame.
  • Frame, the N-3th, N-2th, N-1th frame and the current Nth frame respectively use TVNet optical flow network to calculate the optical flow to obtain Flow1, Flow2, Flow3; and for Flow1, Flow2 and Flow3 performs cropping (Crop) operation to obtain 22 ⁇ 22 optical flow vector diagrams P1, P2, P3; input the current frame into the feature network to obtain a 22 ⁇ 22 current frame feature map F N ; separate the current frame feature map F N Combine with the optical flow vector diagrams P1, P2, P3, and then perform a warp operation on the combined result to obtain the deformed feature maps F 1 , F 2 , F 3 ;
  • Weight calculation module used to use the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N as the detection frame input timing scoring model to obtain the feature weight of the candidate detection frame, and compare the candidate The feature weight of the detection frame is multiplied by the candidate detection frame fused with optical flow features according to formula (1) to obtain the final detection frame;
  • I i represents the sequence number of the current frame
  • I i refers to the i-th frame of the current frame
  • It is the final detection frame obtained by fusing the optical flow information of other frames in the current frame
  • w j->i represents the feature weight of the candidate detection frame calculated and output by the time series scoring model
  • f j->i is the jth frame
  • the motion information of is mapped to the i-th frame through the optical flow network, and then the resulting optical flow result image and the j-th frame image are warped (Warp) operation;
  • F(I i , I j ) is the optical flow calculation of I i and I j through the optical flow network, and the result obtained realizes the mapping of the motion information in the jth frame to the i-th frame;
  • f j Is the feature map of the i-th frame,
  • W(,) is the result of the optical flow calculation and the I j frame is fused, the fused information is warped (Warp) operation, and it is applied to the feature map positioning of each channel Warp operation with the linear deformation equation;
  • the input of the time series scoring model is the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N in each time period without scoring, and the output is the weight value of the candidate detection frame;
  • the time series scoring model has a pooling layer, where the pooling layer can perform a global average pooling operation and a global maximum pooling operation. Through the global average pooling operation and the global maximum pooling, each candidate detection frame contains The information volume of the object is scored, and the intermediate matrix after the operation is obtained,
  • the global average pooling operation is:
  • G S-GA represents the global average pooling process.
  • q T represents T candidate detection frames
  • q x and q y represent pixels in the feature map
  • H represents the height of the feature map before the input to the global average pooling operation
  • W represents the input to the feature map before the global average pooling operation width
  • the global maximum pooling operation is:
  • G S-GM represents the global maximum pooling process
  • the global average pooling operation outputs a T ⁇ 1 dimensional vector to form a global average pooling intermediate matrix
  • the global maximum pooling operation also outputs a T ⁇ 1 dimensional vector to form the global maximum pooling intermediate matrix.
  • the global average pooling intermediate matrix and the global maximum pooling intermediate matrix are input to the shared network layer, and the correlation between each candidate frame and the current frame is scored; through the shared network layer, the global average pooling and maximum are obtained respectively Value pooled weight matrix, the shared network layer implements convolution operation, and the parameters are obtained by experience or training; then two weight matrixes are added element by element to obtain the weight feature vector; and the weight obtained
  • the feature vector is used as the input of the activation function Relu, and the activation function Relu is:
  • the time series scoring model is trained by the convolutional neural network model according to the loss function.
  • time series scoring model is trained by a convolutional neural network model according to a loss function, and the loss function is:
  • v represents the true value of each point in the candidate response graph of the image waiting to be trained in the training set
  • y ⁇ +1,-1 ⁇ represents the label of the standard tracking frame
  • the convolutional neural filter in the shared network layer adopts deformable convolution calculation, and adds a variable to the traditional convolution operation area.
  • the learned parameter ⁇ pn is the learned parameter ⁇ pn.
  • a target tracking system integrating optical flow information and Siamese framework including:
  • Memory used to store multiple instructions
  • the multiple instructions are used to be stored by the memory and loaded by the processor to execute the aforementioned target tracking method combining optical flow information and Siamese framework.
  • a computer-readable storage medium in which a plurality of instructions are stored; the plurality of instructions are used to be loaded by a processor and execute the fused optical flow information as described above And the target tracking method of Siamese framework.
  • target tracking is performed based on a feature map integrating optical flow information and a Siamese framework, which has high calculation accuracy and fast speed, and can track objects with complex backgrounds and violent motions.
  • FIG. 1 is a structural diagram of a target tracking system integrating optical flow information and Siamese framework according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a time series scoring model according to an embodiment of the present invention.
  • Figure 3A is a schematic diagram of traditional 3 ⁇ 3 convolution calculation
  • 3B-3C are schematic diagrams of deformable convolution calculation
  • Figure 4 is a flow chart of the target tracking method fusing optical flow information and Siamese framework proposed by the present invention
  • Fig. 5 is a block diagram of the target tracking device fusing optical flow information and Siamese framework proposed by the present invention.
  • FIG. 1 shows the structure diagram of the target tracking system fusing optical flow information and the Siamese framework of an embodiment of the present invention.
  • the current frame is the Nth frame (N>3), and then get the previous three frames of the current frame, which are the N-3th frame, the N-2th frame, the N-1th frame, and the N-3th frame.
  • Frame, N-2th frame, N-1th frame and the current frame, namely the Nth frame use TVNet optical flow network to calculate optical flow (the TVNet optical flow network can be found in VLMADRE J, BERTINETO L, HENRIQUES J, et al .End-to-end representation learning for correlation filter based tracking[C].Honolulu,Hawaii,USA.2017.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2805-2813), get Flow1, Flow2, Flow3 .
  • Crop operation is performed on Flow1, Flow2, and Flow3 to obtain 22 ⁇ 22 optical flow vector diagrams P1, P2, P3.
  • the feature network is constructed on the basis of AlexNet without the fully connected layer.
  • Input the current frame into the feature network to obtain a 22 ⁇ 22 current frame feature map F N.
  • the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N are used as the candidate detection frame input timing scoring model to obtain the feature weight of the candidate detection frame, and the feature of the candidate detection frame The weight and the candidate detection frame fused with optical flow features are multiplied according to formula (1) to obtain the final detection frame.
  • i represents the sequence number of the current frame
  • I i refers to the i-th frame of the current frame
  • w j->i represents the feature weight of the candidate detection frame calculated and output by the time series scoring model.
  • f j->i is to map the motion information in the j-th frame to the i-th frame through the optical flow network, and then perform a warp operation on the resulting optical flow result image and the j-th frame image;
  • F(I i , I j ) is the optical flow calculation of I i and I j through the optical flow network, and the result obtained realizes the mapping of the motion information in the jth frame to the i-th frame;
  • f j The feature map of the i-th frame, W(,) is the result of the optical flow calculation and the I j frame is fused, the fused information is warped (Warp) operation, and it is applied to the feature map positioning of each channel
  • the linear deformation equation performs warp operation.
  • FIG. 2 shows the principle diagram of the time series scoring model of the present invention. as shown in picture 2,
  • the time series scoring model is a deformable convolutional network model, and the trained time series scoring model can achieve an effective candidate detection frame by scoring the amount of information contained in each candidate detection frame and the correlation with the current frame.
  • the weight of the candidate detection frame is large, and the weight of the candidate detection frame with small effect or invalid is small.
  • the input of the time series scoring model is the deformed feature map of each time period without scoring or the feature map of the current frame, and the output is the weight value of the candidate detection frame.
  • the time series scoring model has a pooling layer, where the pooling layer can perform a global average pooling operation and a global maximum pooling operation.
  • the input information of the time series scoring model is the deformed feature map of each time period without scoring or the feature map of the current frame, which is also called candidate detection frame.
  • candidate detection frame Through global average pooling operation and global maximum pooling, each A candidate detection frame contains the information of the object to be scored, and the intermediate matrix after the operation is obtained,
  • the global average pooling operation is:
  • G S-GA represents the global average pooling process.
  • q T represents T candidate detection frames
  • q x and q y represent pixels in the feature map
  • H represents the height of the feature map before the input to the global average pooling operation
  • W represents the input to the feature map before the global average pooling operation width.
  • the global maximum pooling operation is:
  • G S-GM represents the global maximum pooling process.
  • the global average pooling operation outputs a T ⁇ 1 dimensional vector to form a global average pooling intermediate matrix
  • the global maximum pooling operation also outputs a T ⁇ 1 dimensional vector to form the global maximum pooling intermediate matrix. matrix.
  • the global average pooling intermediate matrix and the global maximum pooling intermediate matrix are input to the shared network layer, and the relevance of each candidate frame to the current frame is scored.
  • the weight matrices of global average pooling and maximum pooling are respectively obtained through the shared network layer.
  • the shared network layer implements the convolution operation, and the parameters are obtained by empirical values or training. Then the two weight matrices are added element-by-element to obtain the weight eigenvectors. And the obtained weight feature vector is used as the input of the activation function Relu, and the activation function Relu is:
  • x refers to the input weight feature vector
  • is a coefficient
  • can take a value of 0 to obtain the candidate detection frame time sequence weight.
  • the convolutional neural filter in the shared network layer adopts deformable convolution calculation, and the convolution calculation formula is as follows:
  • the aforementioned convolution calculation formula is a conventional convolution operation formula, W(p n ) refers to the convolution kernel parameter, and X refers to the image to be convolved.
  • a learnable parameter ⁇ pn is added, which can be learned by the fully connected layer convolution.
  • the time series scoring model is based on the loss function of the convolutional neural network model
  • v represents the true value of each point in the candidate response graph of the image waiting to be trained in the training set
  • y ⁇ +1,-1 ⁇ represents the label of the standard tracking frame.
  • Figures 3B-3C are deformable convolution calculations. It can be seen that the 9 points involved in the calculation are any pixels in the current image. Such filters have better diversity and can extract more features.
  • FIG. 4 shows a flowchart of the present invention's target tracking method combining optical flow information and the Siamese framework. As shown in Figure 4, the method includes:
  • S101 Obtain the current frame, the current frame is the Nth frame, where N>3, and then obtain the previous three frames of the current frame, namely the N-3th frame, the N-2th frame, and the N-1th frame.
  • Frame N-3, Frame N-2, Frame N-1 and the current frame N respectively use the TVNet optical flow network to calculate the optical flow to obtain Flow1, Flow2, Flow3; and crop Flow1, Flow2, and Flow3 ( Crop) operation to obtain 22 ⁇ 22 optical flow vector diagrams P1, P2, P3; input the current frame into the feature network to obtain a 22 ⁇ 22 current frame feature map F N ; separate the current frame feature map F N with the optical flow vector Figures P1, P2, and P3 are combined, and then warp the combined result to obtain the deformed feature maps F 1 , F 2 , F 3 ;
  • S102 Use the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N as the candidate detection frame input timing scoring model to obtain the feature weight of the candidate detection frame, and combine the The feature weight and the candidate detection frame fused with optical flow features are multiplied according to formula (1) to obtain the final detection frame;
  • i represents the sequence number of the current frame
  • I i refers to the i-th frame of the current frame
  • w j->i represents the feature weight of the candidate detection frame calculated and output by the timing scoring model
  • f j->i is the jth frame
  • the motion information of is mapped to the i-th frame through the optical flow network, and then the resulting optical flow result image and the j-th frame image are warped (Warp) operation;
  • F(I i , I j ) is the optical flow calculation of I i and I j through the optical flow network, and the result obtained realizes the mapping of the motion information in the jth frame to the i-th frame;
  • f j Is the feature map of the i-th frame,
  • W(,) is the result of the optical flow calculation and the I j frame is fused, the fused information is warped (Warp) operation, and it is applied to the feature map positioning of each channel Warp operation with the linear deformation equation;
  • the input of the time series scoring model is the deformed feature map of each time period without scoring and the feature map of the current frame, and the output is the weight value of the candidate detection frame;
  • the time series scoring model has a pooling layer, where the pooling layer can perform a global average pooling operation and a global maximum pooling operation. Through the global average pooling operation and the global maximum pooling, each candidate detection frame contains The information volume of the object is scored, and the intermediate matrix after the operation is obtained,
  • the global average pooling operation is:
  • G S-GA represents the global average pooling process.
  • q T represents T candidate detection frames
  • q x and q y represent pixels in the feature map
  • H represents the height of the feature map before the input to the global average pooling operation
  • W represents the input to the feature map before the global average pooling operation width.
  • the global maximum pooling operation is:
  • G S-GM represents the global maximum pooling process
  • the global average pooling operation outputs a T ⁇ 1 dimensional vector to form a global average pooling intermediate matrix
  • the global maximum pooling operation also outputs a T ⁇ 1 dimensional vector to form the global maximum pooling intermediate matrix.
  • the global average pooling intermediate matrix and the global maximum pooling intermediate matrix are input to the shared network layer, and the correlation between each candidate frame and the current frame is scored; through the shared network layer, the global average pooling and maximum are obtained respectively Value pooled weight matrix, the shared network layer implements convolution operation, and the parameters are obtained by experience or training; then two weight matrixes are added element by element to obtain the weight feature vector; and the weight obtained
  • the feature vector is used as the input of the activation function Relu, and the activation function Relu is:
  • the time series scoring model is trained by a convolutional neural network model according to a loss function, and the loss function is:
  • v represents the true value of each point in the candidate response graph of the image waiting to be trained in the training set
  • y ⁇ +1,-1 ⁇ represents the label of the standard tracking frame.
  • the convolutional neural filter in the shared network layer adopts deformable convolution calculation, and the convolution calculation formula is as follows:
  • FIG. 5 is a block diagram of the target tracking device combining optical flow information and Siamese framework proposed by the present invention.
  • the following describes the target tracking device fusing optical flow information and Siamese framework of the present invention with reference to FIG. 5.
  • the device includes:
  • Obtain feature module used to obtain the current frame, the current frame is the Nth frame, where N>3, and then obtain the previous three frames of the current frame, which are the N-3th frame, the N-2th frame, and the N-1th frame.
  • Frame, the N-3th, N-2th, N-1th frame and the current Nth frame respectively use TVNet optical flow network to calculate the optical flow to obtain Flow1, Flow2, Flow3; and for Flow1, Flow2 and Flow3 performs cropping (Crop) operation to obtain 22 ⁇ 22 optical flow vector diagrams P1, P2, P3; input the current frame into the feature network to obtain a 22 ⁇ 22 current frame feature map F N ; separate the current frame feature map F N Combine with the optical flow vector diagrams P1, P2, P3, and then perform a warp operation on the combined result to obtain the deformed feature maps F 1 , F 2 , F 3 ;
  • Weight calculation module used to use the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N as the detection frame input timing scoring model to obtain the feature weight of the candidate detection frame, and compare the candidate The feature weight of the detection frame is multiplied by the candidate detection frame fused with optical flow features according to formula (1) to obtain the final detection frame;
  • I i represents the sequence number of the current frame
  • I i refers to the i-th frame of the current frame
  • It is the final detection frame obtained by fusing the optical flow information of other frames in the current frame
  • w j->i represents the feature weight of the candidate detection frame calculated and output by the timing scoring model
  • f j->i is the jth frame
  • the motion information of is mapped to the i-th frame through the optical flow network, and then the resulting optical flow result image and the j-th frame image are warped (Warp) operation;
  • F(I i , I j ) is the optical flow calculation of I i and I j through the optical flow network, and the result obtained realizes the mapping of the motion information in the jth frame to the i-th frame;
  • f j Is the feature map of the i-th frame,
  • W(,) is the result of the optical flow calculation and the I j frame is fused, the fused information is warped (Warp) operation, and it is applied to the feature map positioning of each channel Warp operation with the linear deformation equation;
  • the input of the time series scoring model is the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N in each time period without scoring, and the output is the weight value of the candidate detection frame;
  • the time series scoring model has a pooling layer, where the pooling layer can perform a global average pooling operation and a global maximum pooling operation. Through the global average pooling operation and the global maximum pooling, each candidate detection frame contains The information volume of the object is scored, and the intermediate matrix after the operation is obtained,
  • the global average pooling operation is:
  • G S-GA represents the global average pooling process.
  • q T represents T candidate detection frames
  • q x and q y represent pixels in the feature map
  • H represents the height of the input feature map
  • W represents the width of the input feature map
  • the global maximum pooling operation is:
  • G S-GM represents the global maximum pooling process
  • the global average pooling operation outputs a T ⁇ 1 dimensional vector to form a global average pooling intermediate matrix
  • the global maximum pooling operation also outputs a T ⁇ 1 dimensional vector to form the global maximum pooling intermediate matrix.
  • the global average pooling intermediate matrix and the global maximum pooling intermediate matrix are input to the shared network layer, and the correlation between each candidate frame and the current frame is scored; through the shared network layer, the global average pooling and maximum are obtained respectively Value pooled weight matrix, the shared network layer implements convolution operation, and the parameters are obtained by experience or training; then two weight matrixes are added element by element to obtain the weight feature vector; and the weight obtained
  • the feature vector is used as the input of the activation function Relu, and the activation function Relu is:
  • the time series scoring model is trained by the convolutional neural network model according to the loss function.
  • time series scoring model is trained by a convolutional neural network model according to a loss function, and the loss function is:
  • v represents the true value of each point in the candidate response graph of the image waiting to be trained in the training set
  • y ⁇ +1,-1 ⁇ represents the label of the standard tracking frame
  • the convolutional neural filter in the shared network layer adopts deformable convolution calculation, and adds a variable to the traditional convolution operation area.
  • the learned parameter ⁇ pn is the learned parameter ⁇ pn.
  • the embodiment of the present invention further provides a target tracking system integrating optical flow information and Siamese framework, including:
  • Memory used to store multiple instructions
  • the multiple instructions are used to be stored by the memory and loaded by the processor to execute the target tracking method of fusing optical flow information and Siamese framework as described above.
  • the embodiment of the present invention further provides a computer-readable storage medium in which multiple instructions are stored; the multiple instructions are used by a processor to load and execute the above-mentioned fused optical flow information and Siamese Framework's target tracking method.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined. Or it can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium.
  • the above-mentioned software functional unit is stored in a storage medium and includes several instructions to make a computer device (which can be a personal computer, a physical machine server, or a network cloud server, etc., need to install Windows or Windows Server operating system) to execute each of the present invention Part of the steps of the method described in the embodiment.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a target tracking method and device fusing optical flow information and a Siamese framework. The method comprises: obtaining optical flow information of a current Nth frame and the previous three frames of the current frame, and obtaining the current frame again, wherein the current frame is the Nth frame, and N is greater than 3; processing the previous three frames of the current frame to obtain warped feature maps; and inputting the warped feature maps and a current frame feature map as detection frames into a time sequence scoring model to obtain weights of features of each feature map, i.e., an optical flow integration frame, and performing operation on the weights of each feature map and the feature maps to obtain a final detection frame. According to the solution of the present invention, target tracking is performed on the basis of the feature map integrated with the optical flow information and in combination with the Siamese framework, the calculation precision is high, the speed is high, and objects with complex backgrounds and violent motion can be tracked.

Description

一种融合光流信息和Siamese框架的目标跟踪方法及装置Target tracking method and device combining optical flow information and Siamese frame 技术领域Technical field
本发明涉及图像识别领域,尤其涉及一种融合光流信息和Siamese框架的目标跟踪方法及装置。The present invention relates to the field of image recognition, in particular to a target tracking method and device combining optical flow information and Siamese framework.
背景技术Background technique
随着计算机视觉的快速发展,单目标跟踪越来越受到大众的关注。跟踪算法从卡尔曼、粒子滤波器和特征点匹配的生成式模型算法到现在的基于相关滤波框架和Siamese(孪生)框架的差别式模型算法,跟踪算法的精度及运算速度在不断提高。With the rapid development of computer vision, single target tracking has attracted more and more attention from the public. Tracking algorithm From the generative model algorithm of Kalman, particle filter and feature point matching to the current differential model algorithm based on the correlation filtering framework and Siamese (twin) framework, the accuracy and operation speed of the tracking algorithm are continuously improving.
基于特征点匹配的生成式模型算法的优点是模型结构简单,无训练过程,但是计算精度不高,有遮挡时特征点会消失;基于Siamese框架的全卷积网络模型算法计算速度快,但只考虑了图像的外观特征,无法跟踪背景复杂以及剧烈运动的对象。The advantage of the generative model algorithm based on feature point matching is that the model structure is simple, there is no training process, but the calculation accuracy is not high, and the feature points will disappear when there is occlusion; the full convolutional network model algorithm based on the Siamese framework has a fast calculation speed, but only Taking into account the appearance characteristics of the image, it is impossible to track objects with complex backgrounds and violent movements.
发明内容Summary of the invention
为解决上述技术问题,本发明提出了一种融合光流信息和Siamese框架的目标跟踪方法及装置,用以解决现有技术中基于特征点匹配的生成式模型算法计算精度不高、基于Siamese框架的全卷积网络模型算法无法跟踪背景复杂以及剧烈运动的对象的技术问题。In order to solve the above technical problems, the present invention proposes a target tracking method and device combining optical flow information and Siamese framework to solve the problem of low calculation accuracy of generative model algorithms based on feature point matching in the prior art and based on Siamese framework. The full convolutional network model algorithm can not track the technical problem of complex background and violently moving objects.
根据本发明的第一方面,提供一种融合光流信息和Siamese框架的目标跟踪方法,包括:According to the first aspect of the present invention, a target tracking method fusing optical flow information and Siamese framework is provided, including:
S101:获取当前帧,当前帧为第N帧,其中N>3,再获取当前帧的前面的三帧,分别是第N-3帧、第N-2帧、第N-1帧,所述第N-3帧、第N-2帧、第N-1帧分别和当前第N帧使用TVNet光流网络来计算光流,得到Flow1、Flow2、Flow3;并对Flow1、Flow2及Flow3进行裁剪(Crop)操作,得到22×22的光流矢量图P1、P2、P3;将当前帧输入特征网络,得到22×22的当前帧特征图F N;将当前帧特征图F N分别与光流矢量图P1、P2、P3结合,再对结合后的结果进行变形(Warp)操作,得到变形后的特征图F 1、F 2、F 3S101: Obtain the current frame, the current frame is the Nth frame, where N>3, and then obtain the previous three frames of the current frame, namely the N-3th frame, the N-2th frame, and the N-1th frame. Frame N-3, Frame N-2, Frame N-1 and the current frame N respectively use the TVNet optical flow network to calculate the optical flow to obtain Flow1, Flow2, Flow3; and crop Flow1, Flow2, and Flow3 ( Crop) operation to obtain 22×22 optical flow vector diagrams P1, P2, P3; input the current frame into the feature network to obtain a 22×22 current frame feature map F N ; separate the current frame feature map F N with the optical flow vector Figures P1, P2, and P3 are combined, and then warp the combined result to obtain the deformed feature maps F 1 , F 2 , F 3 ;
S102:将变形后的特征图F 1、F 2、F 3与当前帧特征图F N作为检测帧输入时序打分模型,得到所述候选检测帧的特征权重,并将所述候选检测帧的特征权重与融合了光流特征的候选检测帧按照公式(1)相乘得到最终的检测帧; S102: Use the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N as the detection frame input timing scoring model to obtain the feature weights of the candidate detection frames, and combine the features of the candidate detection frames The weight and the candidate detection frame fused with optical flow features are multiplied according to formula (1) to obtain the final detection frame;
Figure PCTCN2019105275-appb-000001
Figure PCTCN2019105275-appb-000001
i表示当前帧的序号,I i指当前帧第i帧,I j指在当前帧I i前面的某一帧如第j帧,j∈{i-T,…,i-2,i-1},T=3,即当前帧的前面的三帧;
Figure PCTCN2019105275-appb-000002
是当前帧通过融合其他帧光流信息后得到的最终的检测帧,w j->i表示由时序打分模型计算并输出的候选检测帧的特征权重;f j->i是将第j帧中的运动信息通过光流网络映射到第i帧,然后再将得出的光流结果图与第j帧图像进行变形(Warp)操作;
i represents the sequence number of the current frame, I i refers to the i-th frame of the current frame, I j refers to a certain frame before the current frame I i , such as the j-th frame, j∈{iT,...,i-2,i-1}, T=3, that is, the previous three frames of the current frame;
Figure PCTCN2019105275-appb-000002
It is the final detection frame obtained by fusing the optical flow information of other frames in the current frame, w j->i represents the feature weight of the candidate detection frame calculated and output by the timing scoring model; f j->i is the jth frame The motion information of is mapped to the i-th frame through the optical flow network, and then the resulting optical flow result image and the j-th frame image are warped (Warp) operation;
将第j帧中的运动信息通过光流网络映射到第i帧定义为Mapping the motion information in the j-th frame to the i-th frame through the optical flow network is defined as
f j→i=W(f j,M i→j) W(f j,F(I i,I j)) f j→i =W(f j ,M i→j ) = W(f j ,F(I i ,I j ))
其中,F(I i,I j)是通过所述光流网络对I i和I j进行光流计算,得出的结果实现了将第j帧中的运动信息映射到第i帧;f j是第i帧的特征图,W(,)是对所述光流计算得出的结果与I j帧融合,对融合后的信息,进行变形(Warp)操作,应用到每个通道特征映射定位的线性形变方程进行变形(Warp)操作; Wherein, F(I i , I j ) is the optical flow calculation of I i and I j through the optical flow network, and the result obtained realizes the mapping of the motion information in the jth frame to the i-th frame; f j Is the feature map of the i-th frame, W(,) is the result of the optical flow calculation and the I j frame is fused, the fused information is warped (Warp) operation, and it is applied to the feature map positioning of each channel Warp operation with the linear deformation equation;
其中,所述时序打分模型输入为未经打分的各个时段的变形后的特征图F 1、F 2、F 3与当前帧特征图F N,输出为候选检测帧的权重数值; Wherein, the input of the time series scoring model is the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N in each time period without scoring, and the output is the weight value of the candidate detection frame;
所述时序打分模型具有池化层,其中的池化层可以执行全局平均池化操作 和全局最大值池化操作,通过全局平均池化操作和全局最大值池化,对每个候选检测帧包含物体的信息量进行打分,得到操作后的中间矩阵,The time series scoring model has a pooling layer, where the pooling layer can perform a global average pooling operation and a global maximum pooling operation. Through the global average pooling operation and the global maximum pooling, each candidate detection frame contains The information volume of the object is scored, and the intermediate matrix after the operation is obtained,
所述全局平均池化操作为:The global average pooling operation is:
Figure PCTCN2019105275-appb-000003
Figure PCTCN2019105275-appb-000003
其中G S-GA(...)表示全局平均池化过程。q T表示T个候选检测帧,q x和q y表示特征图中的像素点,H表示输入到全局平均池化操作前特征图的高,W表示输入到全局平均池化操作前特征图的宽; Where G S-GA (...) represents the global average pooling process. q T represents T candidate detection frames, q x and q y represent pixels in the feature map, H represents the height of the feature map before the input to the global average pooling operation, and W represents the input to the feature map before the global average pooling operation width;
所述全局最大值池化操作为:The global maximum pooling operation is:
G S-GM(q T)=Max(q T(q x,q y)) G S-GM (q T )=Max(q T (q x ,q y ))
G S-GM(...)表示全局最大值池化过程; G S-GM (...) represents the global maximum pooling process;
所述这全局平均池化操作输出一个T×1维的向量,构成全局平均池化中间矩阵,所述全局最大值池化操作也输出一个T×1维的向量,构成全局最大值池化中间矩阵;The global average pooling operation outputs a T×1 dimensional vector to form a global average pooling intermediate matrix, and the global maximum pooling operation also outputs a T×1 dimensional vector to form the global maximum pooling intermediate matrix. matrix;
将所述全局平均池化中间矩阵和所述全局最大值池化中间矩阵输入共享网络层,对每个候选帧与当前帧的关联性进行打分;通过共享网络层分别得到全局平均池化和最大值池化的权值矩阵,所述共享网络层实现卷积操作,参数由经验值或训练得到;再对两个权值矩阵进行逐元素相加操作,得到权重特征向量;并将得到的权重特征向量作为激活函数Relu的输入,所述激活函数Relu为:The global average pooling intermediate matrix and the global maximum pooling intermediate matrix are input to the shared network layer, and the correlation between each candidate frame and the current frame is scored; through the shared network layer, the global average pooling and maximum are obtained respectively Value pooled weight matrix, the shared network layer implements convolution operation, and the parameters are obtained by experience or training; then two weight matrixes are added element by element to obtain the weight feature vector; and the weight obtained The feature vector is used as the input of the activation function Relu, and the activation function Relu is:
Figure PCTCN2019105275-appb-000004
Figure PCTCN2019105275-appb-000004
其中,x指输入的所述权重特征向量,α为系数,Where x refers to the input weight feature vector, α is the coefficient,
所述时序打分模型是由卷积神经网络模型根据损失函数进行训练的。The time series scoring model is trained by the convolutional neural network model according to the loss function.
进一步地,further,
所述时序打分模型是由卷积神经网络模型根据损失函数进行训练的,所述损失函数为:The time series scoring model is trained by a convolutional neural network model according to a loss function, and the loss function is:
l(y,v)=log(1+exp(-yv))l(y,v)=log(1+exp(-yv))
其中v表示训练集中等待训练的图像的候选响应图每个点的真实值,y∈{+1,-1}表示标准跟踪框的标签;通过最小化上述损失函数来不断学习、训练,当所述损失函数趋于稳定时,所述时序打分模型训练完毕,得到所述时序打分模型的系数,利用训练好的时序打分模型对候选检测帧的权重数值进行计算,从而得到候选检测帧时序权重。Where v represents the true value of each point in the candidate response graph of the image waiting to be trained in the training set, and y∈{+1,-1} represents the label of the standard tracking frame; continuous learning and training are performed by minimizing the above loss function. When the loss function becomes stable, the training of the time series scoring model is completed, the coefficients of the time series scoring model are obtained, and the weight values of candidate detection frames are calculated using the trained time series scoring model to obtain the candidate detection frame time sequence weights.
进一步地,为了更好的提取候选检测帧的图像特征,所述共享网络层中的卷积神经滤波器采用可变形的卷积计算,在传统的卷积操作的作用区域上,加入了一个可学习的参数Δpn。Further, in order to better extract the image features of the candidate detection frame, the convolutional neural filter in the shared network layer adopts deformable convolution calculation, and adds a variable to the traditional convolution operation area. The learned parameter Δpn.
根据本发明第二方面,提供一种融合光流信息和Siamese框架的目标跟踪装置,包括:According to the second aspect of the present invention, a target tracking device integrating optical flow information and Siamese framework is provided, including:
获取特征模块:用于获取当前帧,当前帧为第N帧,其中N>3,再获取当前帧的前面的三帧,分别是第N-3帧、第N-2帧、第N-1帧,所述第N-3帧、第N-2帧、第N-1帧分别和当前第N帧使用TVNet光流网络来计算光流,得到Flow1、Flow2、Flow3;并对Flow1、Flow2及Flow3进行裁剪(Crop)操作,得到22×22的光流矢量图P1、P2、P3;将当前帧输入特征网络,得到22×22的当前帧特征图F N;将当前帧特征图F N分别与光流矢量图P1、P2、P3结合,再对结合后的结果进行变形(Warp)操作,得到变形后的特征图F 1、F 2、F 3Obtain feature module: used to obtain the current frame, the current frame is the Nth frame, where N>3, and then obtain the previous three frames of the current frame, which are the N-3th frame, the N-2th frame, and the N-1th frame. Frame, the N-3th, N-2th, N-1th frame and the current Nth frame respectively use TVNet optical flow network to calculate the optical flow to obtain Flow1, Flow2, Flow3; and for Flow1, Flow2 and Flow3 performs cropping (Crop) operation to obtain 22×22 optical flow vector diagrams P1, P2, P3; input the current frame into the feature network to obtain a 22×22 current frame feature map F N ; separate the current frame feature map F N Combine with the optical flow vector diagrams P1, P2, P3, and then perform a warp operation on the combined result to obtain the deformed feature maps F 1 , F 2 , F 3 ;
权重计算模块:用于将变形后的特征图F 1、F 2、F 3与当前帧特征图F N作为检测帧输入时序打分模型,得到所述候选检测帧的特征权重,并将所述候选检测帧的特征权重与融合了光流特征的候选检测帧按照公式(1)相乘得到最终的检测帧; Weight calculation module: used to use the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N as the detection frame input timing scoring model to obtain the feature weight of the candidate detection frame, and compare the candidate The feature weight of the detection frame is multiplied by the candidate detection frame fused with optical flow features according to formula (1) to obtain the final detection frame;
Figure PCTCN2019105275-appb-000005
Figure PCTCN2019105275-appb-000005
i表示当前帧的序号,I i指当前帧第i帧,I j指在当前帧I i前面的某一帧如第j帧,j∈{i-T,…,i-2,i-1},T=3,即当前帧的前面的三帧;
Figure PCTCN2019105275-appb-000006
是当前帧通过融合其他帧光流信息后得到的最终的检测帧,w j->i表示由时序打分模型计算并输出的候选检测帧的特征权重;f j->i是将第j帧中的运动信息通过光流网络映射到第i帧,然后再将得出的光流结果图与第j帧图像进行变形(Warp)操作;
i represents the sequence number of the current frame, I i refers to the i-th frame of the current frame, I j refers to a certain frame before the current frame I i , such as the j-th frame, j∈{iT,...,i-2,i-1}, T=3, that is, the previous three frames of the current frame;
Figure PCTCN2019105275-appb-000006
It is the final detection frame obtained by fusing the optical flow information of other frames in the current frame, w j->i represents the feature weight of the candidate detection frame calculated and output by the time series scoring model; f j->i is the jth frame The motion information of is mapped to the i-th frame through the optical flow network, and then the resulting optical flow result image and the j-th frame image are warped (Warp) operation;
将第j帧中的运动信息通过光流网络映射到第i帧定义为Mapping the motion information in the j-th frame to the i-th frame through the optical flow network is defined as
f j→i=W(f j,M i→j)=W(f j,F(I i,I j)) f j→i =W(f j ,M i→j )=W(f j ,F(I i ,I j ))
其中,F(I i,I j)是通过所述光流网络对I i和I j进行光流计算,得出的结果实现了将第j帧中的运动信息映射到第i帧;f j是第i帧的特征图,W(,)是对所述光流计算得出的结果与I j帧融合,对融合后的信息,进行变形(Warp)操作,应用到每个通道特征映射定位的线性形变方程进行变形(Warp)操作; Wherein, F(I i , I j ) is the optical flow calculation of I i and I j through the optical flow network, and the result obtained realizes the mapping of the motion information in the jth frame to the i-th frame; f j Is the feature map of the i-th frame, W(,) is the result of the optical flow calculation and the I j frame is fused, the fused information is warped (Warp) operation, and it is applied to the feature map positioning of each channel Warp operation with the linear deformation equation;
其中,所述时序打分模型输入为未经打分的各个时段的变形后的特征图F 1、F 2、F 3与当前帧特征图F N,输出为候选检测帧的权重数值; Wherein, the input of the time series scoring model is the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N in each time period without scoring, and the output is the weight value of the candidate detection frame;
所述时序打分模型具有池化层,其中的池化层可以执行全局平均池化操作和全局最大值池化操作,通过全局平均池化操作和全局最大值池化,对每个候选检测帧包含物体的信息量进行打分,得到操作后的中间矩阵,The time series scoring model has a pooling layer, where the pooling layer can perform a global average pooling operation and a global maximum pooling operation. Through the global average pooling operation and the global maximum pooling, each candidate detection frame contains The information volume of the object is scored, and the intermediate matrix after the operation is obtained,
所述全局平均池化操作为:The global average pooling operation is:
Figure PCTCN2019105275-appb-000007
Figure PCTCN2019105275-appb-000007
其中G S-GA(...)表示全局平均池化过程。q T表示T个候选检测帧,q x和q y表示特征图中的像素点,H表示输入到全局平均池化操作前特征图的高,W表示输入到全局平均池化操作前特征图的宽; Where G S-GA (...) represents the global average pooling process. q T represents T candidate detection frames, q x and q y represent pixels in the feature map, H represents the height of the feature map before the input to the global average pooling operation, and W represents the input to the feature map before the global average pooling operation width;
所述全局最大值池化操作为:The global maximum pooling operation is:
G S-GM(q T)=Max(q T(q x,q y)) G S-GM (q T )=Max(q T (q x ,q y ))
G S-GM(...)表示全局最大值池化过程; G S-GM (...) represents the global maximum pooling process;
所述这全局平均池化操作输出一个T×1维的向量,构成全局平均池化中间矩阵,所述全局最大值池化操作也输出一个T×1维的向量,构成全局最大值池化中间矩阵;The global average pooling operation outputs a T×1 dimensional vector to form a global average pooling intermediate matrix, and the global maximum pooling operation also outputs a T×1 dimensional vector to form the global maximum pooling intermediate matrix. matrix;
将所述全局平均池化中间矩阵和所述全局最大值池化中间矩阵输入共享网络层,对每个候选帧与当前帧的关联性进行打分;通过共享网络层分别得到全局平均池化和最大值池化的权值矩阵,所述共享网络层实现卷积操作,参数由经验值或训练得到;再对两个权值矩阵进行逐元素相加操作,得到权重特征向量;并将得到的权重特征向量作为激活函数Relu的输入,所述激活函数Relu为:The global average pooling intermediate matrix and the global maximum pooling intermediate matrix are input to the shared network layer, and the correlation between each candidate frame and the current frame is scored; through the shared network layer, the global average pooling and maximum are obtained respectively Value pooled weight matrix, the shared network layer implements convolution operation, and the parameters are obtained by experience or training; then two weight matrixes are added element by element to obtain the weight feature vector; and the weight obtained The feature vector is used as the input of the activation function Relu, and the activation function Relu is:
Figure PCTCN2019105275-appb-000008
Figure PCTCN2019105275-appb-000008
其中,x指输入的所述权重特征向量,α为系数,Where x refers to the input weight feature vector, α is the coefficient,
所述时序打分模型是由卷积神经网络模型根据损失函数进行训练的。The time series scoring model is trained by the convolutional neural network model according to the loss function.
进一步地,所述时序打分模型是由卷积神经网络模型根据损失函数进行训练的,所述损失函数为:Further, the time series scoring model is trained by a convolutional neural network model according to a loss function, and the loss function is:
l(y,v)=log(1+exp(-yv))l(y,v)=log(1+exp(-yv))
其中v表示训练集中等待训练的图像的候选响应图每个点的真实值,y∈{+1,-1}表示标准跟踪框的标签;通过最小化上述损失函数来不断学习、训练,当所述损失函数趋于稳定时,所述时序打分模型训练完毕,得到所述时序打分模型的系数,利用训练好的时序打分模型对候选检测帧的权重数值进行计算,从而得到候选检测帧时序权重。Where v represents the true value of each point in the candidate response graph of the image waiting to be trained in the training set, and y∈{+1,-1} represents the label of the standard tracking frame; continuous learning and training are performed by minimizing the above loss function. When the loss function becomes stable, the training of the time series scoring model is completed, the coefficients of the time series scoring model are obtained, and the weight values of candidate detection frames are calculated using the trained time series scoring model to obtain the candidate detection frame time sequence weights.
进一步地,为了更好的提取候选检测帧的图像特征,所述共享网络层中的卷积神经滤波器采用可变形的卷积计算,在传统的卷积操作的作用区域上,加入了一个可学习的参数Δpn。Further, in order to better extract the image features of the candidate detection frame, the convolutional neural filter in the shared network layer adopts deformable convolution calculation, and adds a variable to the traditional convolution operation area. The learned parameter Δpn.
根据本发明第三方面,提供一种融合光流信息和Siamese框架的目标跟踪系统,包括:According to the third aspect of the present invention, a target tracking system integrating optical flow information and Siamese framework is provided, including:
处理器,用于执行多条指令;Processor, used to execute multiple instructions;
存储器,用于存储多条指令;Memory, used to store multiple instructions;
其中,所述多条指令,用于由所述存储器存储,并由所述处理器加载并执行如前所述的融合光流信息和Siamese框架的目标跟踪方法。Wherein, the multiple instructions are used to be stored by the memory and loaded by the processor to execute the aforementioned target tracking method combining optical flow information and Siamese framework.
根据本发明第四方面,提供一种计算机可读存储介质,所述存储介质中存储有多条指令;所述多条指令,用于由处理器加载并执行如前所述的融合光流信息和Siamese框架的目标跟踪方法。According to a fourth aspect of the present invention, there is provided a computer-readable storage medium in which a plurality of instructions are stored; the plurality of instructions are used to be loaded by a processor and execute the fused optical flow information as described above And the target tracking method of Siamese framework.
根据本发明的上述方案,基于整合了光流信息的特征图并结合Siamese框架进行目标跟踪,计算精度高、速度快,可以跟踪背景复杂以及剧烈运动的对象。According to the above-mentioned solution of the present invention, target tracking is performed based on a feature map integrating optical flow information and a Siamese framework, which has high calculation accuracy and fast speed, and can track objects with complex backgrounds and violent motions.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术 手段,并可依照说明书的内容予以实施,以下以本发明的较佳实施例并配合附图详细说明如后。The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly and implement it in accordance with the content of the description, the preferred embodiments of the present invention are described in detail below in conjunction with the accompanying drawings.
附图说明Description of the drawings
构成本发明的一部分的附图用来提供对本发明的进一步理解,本发明提供如下附图进行说明。在附图中:The drawings constituting a part of the present invention are used to provide a further understanding of the present invention, and the present invention provides the following drawings for illustration. In the attached picture:
图1为本发明一个实施方式的融合光流信息和Siamese框架的目标跟踪系统的结构图;FIG. 1 is a structural diagram of a target tracking system integrating optical flow information and Siamese framework according to an embodiment of the present invention;
图2为本发明一个实施方式的时序打分模型原理图;2 is a schematic diagram of a time series scoring model according to an embodiment of the present invention;
图3A为传统的3×3卷积计算示意图;Figure 3A is a schematic diagram of traditional 3×3 convolution calculation;
图3B-图3C为可变形的卷积计算示意图;3B-3C are schematic diagrams of deformable convolution calculation;
图4为本发明提出的融合光流信息和Siamese框架的目标跟踪方法流程图;Figure 4 is a flow chart of the target tracking method fusing optical flow information and Siamese framework proposed by the present invention;
图5为本发明提出的融合光流信息和Siamese框架的目标跟踪装置组成框图。Fig. 5 is a block diagram of the target tracking device fusing optical flow information and Siamese framework proposed by the present invention.
具体实施方式detailed description
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明具体实施例及相应的附图对本发明技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the technical solutions of the present invention will be described clearly and completely in conjunction with specific embodiments of the present invention and the corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
首先结合图1说明本发明的融合光流信息和Siamese框架的目标跟踪系统结构,图1示出了本发明一个实施方式的融合光流信息和Siamese框架的目标跟踪系统的结构图。First, the structure of the target tracking system fusing optical flow information and the Siamese framework of the present invention will be explained with reference to FIG. 1. FIG. 1 shows the structure diagram of the target tracking system fusing optical flow information and the Siamese framework of an embodiment of the present invention.
获取当前帧,当前帧为第N帧(N>3),再获取当前帧的前面的三帧,分别是第N-3帧、第N-2帧、第N-1帧,第N-3帧、第N-2帧、第N-1帧分别和当前帧即第N帧使用TVNet光流网络来计算光流(所述TVNet光流网络可参见VALMADRE J,BERTINETTO L,HENRIQUES J,et al.End-to-end representation learning for correlation filter based tracking[C].Honolulu,Hawaii,USA.2017.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2805-2813),得到Flow1、Flow2、Flow3。并对Flow1、Flow2及Flow3进行裁剪(Crop)操作,得到22×22的光流矢量图P1、P2、P3。构建以AlexNet为基础的特征网络,所述特征网络是在AlexNet的基础上,去掉全连接层构建的。将当前帧输入特征网络,得到22×22的当前帧特征图F N。将当前帧特征图F N分别与光流矢量图P1、P2、P3结合,再对结合后的结果进行变形(Warp)操作,得到变形后的特征图F 1、F 2、F 3。最后将变形后的特征图F 1、F 2、F 3与当前帧特征图F N作为候选检测帧输入时序打分模型,得到所述候选检测帧的特征权重,并将所述候选检测帧的特征权重与融合了光流特征的候选检测帧按照公式(1)相乘得到最终的检测帧。 Get the current frame, the current frame is the Nth frame (N>3), and then get the previous three frames of the current frame, which are the N-3th frame, the N-2th frame, the N-1th frame, and the N-3th frame. Frame, N-2th frame, N-1th frame and the current frame, namely the Nth frame, use TVNet optical flow network to calculate optical flow (the TVNet optical flow network can be found in VLMADRE J, BERTINETO L, HENRIQUES J, et al .End-to-end representation learning for correlation filter based tracking[C].Honolulu,Hawaii,USA.2017.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2805-2813), get Flow1, Flow2, Flow3 . Crop operation is performed on Flow1, Flow2, and Flow3 to obtain 22×22 optical flow vector diagrams P1, P2, P3. Construct a feature network based on AlexNet. The feature network is constructed on the basis of AlexNet without the fully connected layer. Input the current frame into the feature network to obtain a 22×22 current frame feature map F N. Combine the current frame feature map F N with the optical flow vector diagrams P1, P2, P3, and then perform a warp operation on the combined result to obtain deformed feature maps F 1 , F 2 , and F 3 . Finally, the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N are used as the candidate detection frame input timing scoring model to obtain the feature weight of the candidate detection frame, and the feature of the candidate detection frame The weight and the candidate detection frame fused with optical flow features are multiplied according to formula (1) to obtain the final detection frame.
Figure PCTCN2019105275-appb-000009
Figure PCTCN2019105275-appb-000009
其中,i表示当前帧的序号,I i指当前帧第i帧,I j指在当前帧I i前面的某一帧如第j帧,j∈{i-T,…,i-2,i-1},本实施例中T=3,即当前帧的前面的三帧;
Figure PCTCN2019105275-appb-000010
是当前帧通过融合其他帧光流信息后得到的最终的检测帧,w j->i表示由时序打分模型计算并输出的候选检测帧的特征权重。f j->i是将第j帧中的运动信息通过光流网络映射到第i帧,然后再将得出的光流结果图与第j帧图像进行变形(Warp)操作;
Among them, i represents the sequence number of the current frame, I i refers to the i-th frame of the current frame, I j refers to a frame before the current frame I i , such as the j-th frame, j∈{iT,...,i-2,i-1 }, T=3 in this embodiment, that is, the previous three frames of the current frame;
Figure PCTCN2019105275-appb-000010
It is the final detection frame obtained by fusing the optical flow information of other frames in the current frame, w j->i represents the feature weight of the candidate detection frame calculated and output by the time series scoring model. f j->i is to map the motion information in the j-th frame to the i-th frame through the optical flow network, and then perform a warp operation on the resulting optical flow result image and the j-th frame image;
将第j帧中的运动信息通过光流网络映射到第i帧定义为Mapping the motion information in the j-th frame to the i-th frame through the optical flow network is defined as
f j→i=W(f j,M i→j)=W(f j,F(I i,I j)) f j→i =W(f j ,M i→j )=W(f j ,F(I i ,I j ))
其中,F(I i,I j)是通过所述光流网络对I i和I j进行光流计算,得出的结果实现了将第j帧中的运动信息映射到第i帧;f j第i帧的特征图,W(,)是对所述光流计算得出的结果与I j帧融合,对融合后的信息,进行变形(Warp)操作,应用到每个通道特征映射定位的线性形变方程进行变形(Warp)操作。 Wherein, F(I i , I j ) is the optical flow calculation of I i and I j through the optical flow network, and the result obtained realizes the mapping of the motion information in the jth frame to the i-th frame; f j The feature map of the i-th frame, W(,) is the result of the optical flow calculation and the I j frame is fused, the fused information is warped (Warp) operation, and it is applied to the feature map positioning of each channel The linear deformation equation performs warp operation.
以下结合图2说明本发明的时序打分模型,图2示出了本发明的时序打分模型原理图。如图2所示,The following describes the time series scoring model of the present invention with reference to FIG. 2, and FIG. 2 shows the principle diagram of the time series scoring model of the present invention. as shown in picture 2,
所述时序打分模型为可变形卷积网络模型,训练好的所述时序打分模型通过对每个候选检测帧包含物体的信息量以及与当前帧的关联性进行打分,能够实现有效的候选检测帧权重大,效果小或者无效的候选检测帧权重小。所述时序打分模型的输入为未经打分的各个时段的变形后的特征图或当前帧的特征图,输出为候选检测帧的权重数值。The time series scoring model is a deformable convolutional network model, and the trained time series scoring model can achieve an effective candidate detection frame by scoring the amount of information contained in each candidate detection frame and the correlation with the current frame. The weight of the candidate detection frame is large, and the weight of the candidate detection frame with small effect or invalid is small. The input of the time series scoring model is the deformed feature map of each time period without scoring or the feature map of the current frame, and the output is the weight value of the candidate detection frame.
所述时序打分模型具有池化层,其中的池化层可以执行全局平均池化操作和全局最大值池化操作。所述时序打分模型的输入信息为未经打分的各个时段的变形后的特征图或当前帧的特征图,也称为候选检测帧,通过全局平均池化操作和全局最大值池化,对每个候选检测帧包含物体的信息量进行打分,得到操作后的中间矩阵,The time series scoring model has a pooling layer, where the pooling layer can perform a global average pooling operation and a global maximum pooling operation. The input information of the time series scoring model is the deformed feature map of each time period without scoring or the feature map of the current frame, which is also called candidate detection frame. Through global average pooling operation and global maximum pooling, each A candidate detection frame contains the information of the object to be scored, and the intermediate matrix after the operation is obtained,
所述全局平均池化操作为:The global average pooling operation is:
Figure PCTCN2019105275-appb-000011
Figure PCTCN2019105275-appb-000011
其中G S-GA(...)表示全局平均池化过程。q T表示T个候选检测帧,q x和q y表示特征图中的像素点,H表示输入到全局平均池化操作前特征图的高,W表示输入到全局平均池化操作前特征图的宽。 Where G S-GA (...) represents the global average pooling process. q T represents T candidate detection frames, q x and q y represent pixels in the feature map, H represents the height of the feature map before the input to the global average pooling operation, and W represents the input to the feature map before the global average pooling operation width.
所述全局最大值池化操作为:The global maximum pooling operation is:
G S-GM(q T)=Max(q T(q x,q y)) G S-GM (q T )=Max(q T (q x ,q y ))
G S-GM(...)表示全局最大值池化过程。 G S-GM (...) represents the global maximum pooling process.
所述这全局平均池化操作输出一个T×1维的向量,构成全局平均池化中间矩阵,所述全局最大值池化操作也输出一个T×1维的向量,构成全局最大值池化中间矩阵。The global average pooling operation outputs a T×1 dimensional vector to form a global average pooling intermediate matrix, and the global maximum pooling operation also outputs a T×1 dimensional vector to form the global maximum pooling intermediate matrix. matrix.
将所述全局平均池化中间矩阵和所述全局最大值池化中间矩阵输入共享网络层,对每个候选帧与当前帧的关联性进行打分。通过共享网络层分别得到全局平均池化和最大值池化的权值矩阵,所述共享网络层实现卷积操作,参数由经验值或训练得到。再对两个权值矩阵进行逐元素相加操作,得到权重特征向 量。并将得到的权重特征向量作为激活函数Relu的输入,所述激活函数Relu为:The global average pooling intermediate matrix and the global maximum pooling intermediate matrix are input to the shared network layer, and the relevance of each candidate frame to the current frame is scored. The weight matrices of global average pooling and maximum pooling are respectively obtained through the shared network layer. The shared network layer implements the convolution operation, and the parameters are obtained by empirical values or training. Then the two weight matrices are added element-by-element to obtain the weight eigenvectors. And the obtained weight feature vector is used as the input of the activation function Relu, and the activation function Relu is:
Figure PCTCN2019105275-appb-000012
Figure PCTCN2019105275-appb-000012
其中,x指输入的所述权重特征向量,α为系数,α可以取值为0,从而得到候选检测帧时序权重。Wherein, x refers to the input weight feature vector, α is a coefficient, and α can take a value of 0 to obtain the candidate detection frame time sequence weight.
本实施例中,为了更好的提取候选检测帧的图像特征,所述共享网络层中的卷积神经滤波器采用可变形的卷积计算,所述卷积计算的公式如下:In this embodiment, in order to better extract the image features of the candidate detection frame, the convolutional neural filter in the shared network layer adopts deformable convolution calculation, and the convolution calculation formula is as follows:
Figure PCTCN2019105275-appb-000013
Figure PCTCN2019105275-appb-000013
上述卷积计算公式是常规的卷积操作公式,W(p n)指的是卷积核参数,X指的是待卷积的图像。 The aforementioned convolution calculation formula is a conventional convolution operation formula, W(p n ) refers to the convolution kernel parameter, and X refers to the image to be convolved.
在传统的卷积操作的作用区域上,加入了一个可学习的参数Δpn,该参数可以由全连接层卷积学习得到。In the active area of the traditional convolution operation, a learnable parameter Δpn is added, which can be learned by the fully connected layer convolution.
所述时序打分模型是由卷积神经网络模型根据损失函数The time series scoring model is based on the loss function of the convolutional neural network model
l(y,v)=log(1+exp(-yv))进行训练的,l(y,v)=log(1+exp(-yv)) for training,
其中v表示训练集中等待训练的图像的候选响应图每个点的真实值,y∈{+1,-1}表示标准跟踪框的标签。通过最小化上述损失函数来不断学习、训练,当所述损失函数趋于稳定时,所述时序打分模型训练完毕,得到所述时序打分模型的系数,利用训练好的时序打分模型对候选检测帧的权重数值进行计算。Where v represents the true value of each point in the candidate response graph of the image waiting to be trained in the training set, and y∈{+1,-1} represents the label of the standard tracking frame. Continuous learning and training are performed by minimizing the above loss function. When the loss function becomes stable, the time series scoring model is trained to obtain the coefficients of the time series scoring model, and the trained time series scoring model is used to detect candidate frames The weight value is calculated.
以下结合图3说明可变形的卷积计算。The deformable convolution calculation will be described below in conjunction with FIG. 3.
如图3A所示,图3A是传统的3×3卷积计算,正方形区域内的9个像素参与线性计算y=∑ iw ix i,其中w i为卷积滤波器的系数,x i为图像的像素值。图3B-图3C为可变形卷积计算,可以看出,参与计算的9个点为当前图像中任意像素,这样的滤波器具有更好的多样性,所能提取的特征也更加丰富。 As shown in Figure 3A, Figure 3A is a traditional 3×3 convolution calculation. 9 pixels in the square area participate in the linear calculation y=∑ i w i x i , where w i is the coefficient of the convolution filter, x i Is the pixel value of the image. Figures 3B-3C are deformable convolution calculations. It can be seen that the 9 points involved in the calculation are any pixels in the current image. Such filters have better diversity and can extract more features.
以下结合图4说明本发明的融合光流信息和Siamese框架的目标跟踪方法,图4示出了本发明的融合光流信息和Siamese框架的目标跟踪方法流程图。如图4所示,所述方法包括:The following describes the target tracking method of the present invention combining optical flow information and the Siamese framework with reference to FIG. 4, and FIG. 4 shows a flowchart of the present invention's target tracking method combining optical flow information and the Siamese framework. As shown in Figure 4, the method includes:
S101:获取当前帧,当前帧为第N帧,其中N>3,再获取当前帧的前面的三 帧,分别是第N-3帧、第N-2帧、第N-1帧,所述第N-3帧、第N-2帧、第N-1帧分别和当前第N帧使用TVNet光流网络来计算光流,得到Flow1、Flow2、Flow3;并对Flow1、Flow2及Flow3进行裁剪(Crop)操作,得到22×22的光流矢量图P1、P2、P3;将当前帧输入特征网络,得到22×22的当前帧特征图F N;将当前帧特征图F N分别与光流矢量图P1、P2、P3结合,再对结合后的结果进行变形(Warp)操作,得到变形后的特征图F 1、F 2、F 3S101: Obtain the current frame, the current frame is the Nth frame, where N>3, and then obtain the previous three frames of the current frame, namely the N-3th frame, the N-2th frame, and the N-1th frame. Frame N-3, Frame N-2, Frame N-1 and the current frame N respectively use the TVNet optical flow network to calculate the optical flow to obtain Flow1, Flow2, Flow3; and crop Flow1, Flow2, and Flow3 ( Crop) operation to obtain 22×22 optical flow vector diagrams P1, P2, P3; input the current frame into the feature network to obtain a 22×22 current frame feature map F N ; separate the current frame feature map F N with the optical flow vector Figures P1, P2, and P3 are combined, and then warp the combined result to obtain the deformed feature maps F 1 , F 2 , F 3 ;
S102:将变形后的特征图F 1、F 2、F 3与当前帧特征图F N作为候选检测帧输入时序打分模型,得到所述候选检测帧的特征权重,并将所述候选检测帧的特征权重与融合了光流特征的候选检测帧按照公式(1)相乘得到最终的检测帧; S102: Use the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N as the candidate detection frame input timing scoring model to obtain the feature weight of the candidate detection frame, and combine the The feature weight and the candidate detection frame fused with optical flow features are multiplied according to formula (1) to obtain the final detection frame;
Figure PCTCN2019105275-appb-000014
Figure PCTCN2019105275-appb-000014
其中,i表示当前帧的序号,I i指当前帧第i帧,I j指在当前帧I i前面的某一帧如第j帧,j∈{i-T,…,i-2,i-1},T=3,即当前帧的前面的三帧;
Figure PCTCN2019105275-appb-000015
是当前帧通过融合其他帧光流信息后得到的最终的检测帧,w j->i表示由时序打分模型计算并输出的候选检测帧的特征权重;f j->i是将第j帧中的运动信息通过光流网络映射到第i帧,然后再将得出的光流结果图与第j帧图像进行变形(Warp)操作;
Among them, i represents the sequence number of the current frame, I i refers to the i-th frame of the current frame, I j refers to a frame before the current frame I i , such as the j-th frame, j∈{iT,...,i-2,i-1 }, T=3, that is, three frames before the current frame;
Figure PCTCN2019105275-appb-000015
It is the final detection frame obtained by fusing the optical flow information of other frames in the current frame, w j->i represents the feature weight of the candidate detection frame calculated and output by the timing scoring model; f j->i is the jth frame The motion information of is mapped to the i-th frame through the optical flow network, and then the resulting optical flow result image and the j-th frame image are warped (Warp) operation;
将第j帧中的运动信息通过光流网络映射到第i帧定义为Mapping the motion information in the j-th frame to the i-th frame through the optical flow network is defined as
f j→i=W(f j,M i→j)=W(f j,F(I i,I j)) f j→i =W(f j ,M i→j )=W(f j ,F(I i ,I j ))
其中,F(I i,I j)是通过所述光流网络对I i和I j进行光流计算,得出的结果实现了将第j帧中的运动信息映射到第i帧;f j是第i帧的特征图,W(,)是对所述光流计算得出的结果与I j帧融合,对融合后的信息,进行变形(Warp)操作,应用到每个通道特征映射定位的线性形变方程进行变形(Warp)操作; Wherein, F(I i , I j ) is the optical flow calculation of I i and I j through the optical flow network, and the result obtained realizes the mapping of the motion information in the jth frame to the i-th frame; f j Is the feature map of the i-th frame, W(,) is the result of the optical flow calculation and the I j frame is fused, the fused information is warped (Warp) operation, and it is applied to the feature map positioning of each channel Warp operation with the linear deformation equation;
其中,所述时序打分模型输入为未经打分的各个时段的变形后的特征图、当前帧的特征图,输出为候选检测帧的权重数值;Wherein, the input of the time series scoring model is the deformed feature map of each time period without scoring and the feature map of the current frame, and the output is the weight value of the candidate detection frame;
所述时序打分模型具有池化层,其中的池化层可以执行全局平均池化操作和全局最大值池化操作,通过全局平均池化操作和全局最大值池化,对每个候选检测帧包含物体的信息量进行打分,得到操作后的中间矩阵,The time series scoring model has a pooling layer, where the pooling layer can perform a global average pooling operation and a global maximum pooling operation. Through the global average pooling operation and the global maximum pooling, each candidate detection frame contains The information volume of the object is scored, and the intermediate matrix after the operation is obtained,
所述全局平均池化操作为:The global average pooling operation is:
Figure PCTCN2019105275-appb-000016
Figure PCTCN2019105275-appb-000016
其中G S-GA(...)表示全局平均池化过程。q T表示T个候选检测帧,q x和q y表示特征图中的像素点,H表示输入到全局平均池化操作前特征图的高,W表示输入到全局平均池化操作前特征图的宽。 Where G S-GA (...) represents the global average pooling process. q T represents T candidate detection frames, q x and q y represent pixels in the feature map, H represents the height of the feature map before the input to the global average pooling operation, and W represents the input to the feature map before the global average pooling operation width.
所述全局最大值池化操作为:The global maximum pooling operation is:
G S-GM(q T)=Max(q T(q x,q y)) G S-GM (q T )=Max(q T (q x ,q y ))
G S-GM(...)表示全局最大值池化过程; G S-GM (...) represents the global maximum pooling process;
所述这全局平均池化操作输出一个T×1维的向量,构成全局平均池化中间矩阵,所述全局最大值池化操作也输出一个T×1维的向量,构成全局最大值池化中间矩阵;The global average pooling operation outputs a T×1 dimensional vector to form a global average pooling intermediate matrix, and the global maximum pooling operation also outputs a T×1 dimensional vector to form the global maximum pooling intermediate matrix. matrix;
将所述全局平均池化中间矩阵和所述全局最大值池化中间矩阵输入共享网络层,对每个候选帧与当前帧的关联性进行打分;通过共享网络层分别得到全局平均池化和最大值池化的权值矩阵,所述共享网络层实现卷积操作,参数由经验值或训练得到;再对两个权值矩阵进行逐元素相加操作,得到权重特征向量;并将得到的权重特征向量作为激活函数Relu的输入,所述激活函数Relu为:The global average pooling intermediate matrix and the global maximum pooling intermediate matrix are input to the shared network layer, and the correlation between each candidate frame and the current frame is scored; through the shared network layer, the global average pooling and maximum are obtained respectively Value pooled weight matrix, the shared network layer implements convolution operation, and the parameters are obtained by experience or training; then two weight matrixes are added element by element to obtain the weight feature vector; and the weight obtained The feature vector is used as the input of the activation function Relu, and the activation function Relu is:
Figure PCTCN2019105275-appb-000017
Figure PCTCN2019105275-appb-000017
从而得到候选检测帧时序权重;Thus, the candidate detection frame time sequence weight is obtained;
所述时序打分模型是由卷积神经网络模型根据损失函数进行训练的,所述损失函数为:The time series scoring model is trained by a convolutional neural network model according to a loss function, and the loss function is:
l(y,v)=log(1+exp(-yv))l(y,v)=log(1+exp(-yv))
其中v表示训练集中等待训练的图像的候选响应图每个点的真实值,y∈{+1,-1}表示标准跟踪框的标签。通过最小化上述损失函数来不断学习、训练,当所述损失函数趋于稳定时,所述时序打分模型训练完毕,得到所述时序打分模型的系数,利用训练好的时序打分模型对候选检测帧的权重数值进行计算。Where v represents the true value of each point in the candidate response graph of the image waiting to be trained in the training set, and y∈{+1,-1} represents the label of the standard tracking frame. Continuous learning and training are performed by minimizing the above loss function. When the loss function becomes stable, the time series scoring model is trained to obtain the coefficients of the time series scoring model, and the trained time series scoring model is used to detect candidate frames The weight value is calculated.
为了更好的提取候选检测帧的图像特征,所述共享网络层中的卷积神经滤 波器采用可变形的卷积计算,所述卷积计算的公式如下:In order to better extract the image features of the candidate detection frame, the convolutional neural filter in the shared network layer adopts deformable convolution calculation, and the convolution calculation formula is as follows:
Figure PCTCN2019105275-appb-000018
Figure PCTCN2019105275-appb-000018
在传统的卷积操作的作用区域上,加入了一个可学习的参数Δpn。In the traditional convolution operation area, a learnable parameter Δpn is added.
请参考图5,其为本发明提出的融合光流信息和Siamese框架的目标跟踪装置组成框图。以下结合图5说明本发明的融合光流信息和Siamese框架的目标跟踪装置,如图所示,该装置包括:Please refer to FIG. 5, which is a block diagram of the target tracking device combining optical flow information and Siamese framework proposed by the present invention. The following describes the target tracking device fusing optical flow information and Siamese framework of the present invention with reference to FIG. 5. As shown in the figure, the device includes:
获取特征模块:用于获取当前帧,当前帧为第N帧,其中N>3,再获取当前帧的前面的三帧,分别是第N-3帧、第N-2帧、第N-1帧,所述第N-3帧、第N-2帧、第N-1帧分别和当前第N帧使用TVNet光流网络来计算光流,得到Flow1、Flow2、Flow3;并对Flow1、Flow2及Flow3进行裁剪(Crop)操作,得到22×22的光流矢量图P1、P2、P3;将当前帧输入特征网络,得到22×22的当前帧特征图F N;将当前帧特征图F N分别与光流矢量图P1、P2、P3结合,再对结合后的结果进行变形(Warp)操作,得到变形后的特征图F 1、F 2、F 3Obtain feature module: used to obtain the current frame, the current frame is the Nth frame, where N>3, and then obtain the previous three frames of the current frame, which are the N-3th frame, the N-2th frame, and the N-1th frame. Frame, the N-3th, N-2th, N-1th frame and the current Nth frame respectively use TVNet optical flow network to calculate the optical flow to obtain Flow1, Flow2, Flow3; and for Flow1, Flow2 and Flow3 performs cropping (Crop) operation to obtain 22×22 optical flow vector diagrams P1, P2, P3; input the current frame into the feature network to obtain a 22×22 current frame feature map F N ; separate the current frame feature map F N Combine with the optical flow vector diagrams P1, P2, P3, and then perform a warp operation on the combined result to obtain the deformed feature maps F 1 , F 2 , F 3 ;
权重计算模块:用于将变形后的特征图F 1、F 2、F 3与当前帧特征图F N作为检测帧输入时序打分模型,得到所述候选检测帧的特征权重,并将所述候选检测帧的特征权重与融合了光流特征的候选检测帧按照公式(1)相乘得到最终的检测帧; Weight calculation module: used to use the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N as the detection frame input timing scoring model to obtain the feature weight of the candidate detection frame, and compare the candidate The feature weight of the detection frame is multiplied by the candidate detection frame fused with optical flow features according to formula (1) to obtain the final detection frame;
Figure PCTCN2019105275-appb-000019
Figure PCTCN2019105275-appb-000019
i表示当前帧的序号,I i指当前帧第i帧,I j指在当前帧I i前面的某一帧如第j帧,j∈{i-T,…,i-2,i-1},T=3,即当前帧的前面的三帧;
Figure PCTCN2019105275-appb-000020
是当前帧通过融合其他帧光流信息后得到的最终的检测帧,w j->i表示由时序打分模型计算并输出的候选检测帧的特征权重;f j->i是将第j帧中的运动信息通过光流网络映射到第i帧,然后再将得出的光流结果图与第j帧图像进行变形(Warp)操作;
i represents the sequence number of the current frame, I i refers to the i-th frame of the current frame, I j refers to a certain frame before the current frame I i , such as the j-th frame, j∈{iT,...,i-2,i-1}, T=3, that is, the previous three frames of the current frame;
Figure PCTCN2019105275-appb-000020
It is the final detection frame obtained by fusing the optical flow information of other frames in the current frame, w j->i represents the feature weight of the candidate detection frame calculated and output by the timing scoring model; f j->i is the jth frame The motion information of is mapped to the i-th frame through the optical flow network, and then the resulting optical flow result image and the j-th frame image are warped (Warp) operation;
将第j帧中的运动信息通过光流网络映射到第i帧定义为Mapping the motion information in the j-th frame to the i-th frame through the optical flow network is defined as
f j→i=W(f j,M i→j)=W(f j,F(I i,I j)) f j→i =W(f j ,M i→j )=W(f j ,F(I i ,I j ))
其中,F(I i,I j)是通过所述光流网络对I i和I j进行光流计算,得出的结果实现了将第j帧中的运动信息映射到第i帧;f j是第i帧的特征图,W(,)是对所述光 流计算得出的结果与I j帧融合,对融合后的信息,进行变形(Warp)操作,应用到每个通道特征映射定位的线性形变方程进行变形(Warp)操作; Wherein, F(I i , I j ) is the optical flow calculation of I i and I j through the optical flow network, and the result obtained realizes the mapping of the motion information in the jth frame to the i-th frame; f j Is the feature map of the i-th frame, W(,) is the result of the optical flow calculation and the I j frame is fused, the fused information is warped (Warp) operation, and it is applied to the feature map positioning of each channel Warp operation with the linear deformation equation;
其中,所述时序打分模型输入为未经打分的各个时段的变形后的特征图F 1、F 2、F 3与当前帧特征图F N,输出为候选检测帧的权重数值; Wherein, the input of the time series scoring model is the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N in each time period without scoring, and the output is the weight value of the candidate detection frame;
所述时序打分模型具有池化层,其中的池化层可以执行全局平均池化操作和全局最大值池化操作,通过全局平均池化操作和全局最大值池化,对每个候选检测帧包含物体的信息量进行打分,得到操作后的中间矩阵,The time series scoring model has a pooling layer, where the pooling layer can perform a global average pooling operation and a global maximum pooling operation. Through the global average pooling operation and the global maximum pooling, each candidate detection frame contains The information volume of the object is scored, and the intermediate matrix after the operation is obtained,
所述全局平均池化操作为:The global average pooling operation is:
Figure PCTCN2019105275-appb-000021
Figure PCTCN2019105275-appb-000021
其中G S-GA(...)表示全局平均池化过程。q T表示T个候选检测帧,q x和q y表示特征图中的像素点,H表示输入特征图的高,W表示输入特征图的宽; Where G S-GA (...) represents the global average pooling process. q T represents T candidate detection frames, q x and q y represent pixels in the feature map, H represents the height of the input feature map, and W represents the width of the input feature map;
所述全局最大值池化操作为:The global maximum pooling operation is:
G S-GM(q T)=Max(q T(q x,q y)) G S-GM (q T )=Max(q T (q x ,q y ))
G S-GM(...)表示全局最大值池化过程; G S-GM (...) represents the global maximum pooling process;
所述这全局平均池化操作输出一个T×1维的向量,构成全局平均池化中间矩阵,所述全局最大值池化操作也输出一个T×1维的向量,构成全局最大值池化中间矩阵;The global average pooling operation outputs a T×1 dimensional vector to form a global average pooling intermediate matrix, and the global maximum pooling operation also outputs a T×1 dimensional vector to form the global maximum pooling intermediate matrix. matrix;
将所述全局平均池化中间矩阵和所述全局最大值池化中间矩阵输入共享网络层,对每个候选帧与当前帧的关联性进行打分;通过共享网络层分别得到全局平均池化和最大值池化的权值矩阵,所述共享网络层实现卷积操作,参数由经验值或训练得到;再对两个权值矩阵进行逐元素相加操作,得到权重特征向量;并将得到的权重特征向量作为激活函数Relu的输入,所述激活函数Relu为:The global average pooling intermediate matrix and the global maximum pooling intermediate matrix are input to the shared network layer, and the correlation between each candidate frame and the current frame is scored; through the shared network layer, the global average pooling and maximum are obtained respectively Value pooled weight matrix, the shared network layer implements convolution operation, and the parameters are obtained by experience or training; then two weight matrixes are added element by element to obtain the weight feature vector; and the weight obtained The feature vector is used as the input of the activation function Relu, and the activation function Relu is:
Figure PCTCN2019105275-appb-000022
Figure PCTCN2019105275-appb-000022
其中,x指输入的所述权重特征向量,α为系数,Where x refers to the input weight feature vector, α is the coefficient,
所述时序打分模型是由卷积神经网络模型根据损失函数进行训练的。The time series scoring model is trained by the convolutional neural network model according to the loss function.
进一步地,所述时序打分模型是由卷积神经网络模型根据损失函数进行训 练的,所述损失函数为:Further, the time series scoring model is trained by a convolutional neural network model according to a loss function, and the loss function is:
l(y,v)=log(1+exp(-yv))l(y,v)=log(1+exp(-yv))
其中v表示训练集中等待训练的图像的候选响应图每个点的真实值,y∈{+1,-1}表示标准跟踪框的标签;通过最小化上述损失函数来不断学习、训练,当所述损失函数趋于稳定时,所述时序打分模型训练完毕,得到所述时序打分模型的系数,利用训练好的时序打分模型对候选检测帧的权重数值进行计算,从而得到候选检测帧时序权重。Where v represents the true value of each point in the candidate response graph of the image waiting to be trained in the training set, and y∈{+1,-1} represents the label of the standard tracking frame; continuous learning and training are performed by minimizing the above loss function. When the loss function becomes stable, the training of the time series scoring model is completed, the coefficients of the time series scoring model are obtained, and the weight values of candidate detection frames are calculated using the trained time series scoring model to obtain the candidate detection frame time sequence weights.
进一步地,为了更好的提取候选检测帧的图像特征,所述共享网络层中的卷积神经滤波器采用可变形的卷积计算,在传统的卷积操作的作用区域上,加入了一个可学习的参数Δpn。Further, in order to better extract the image features of the candidate detection frame, the convolutional neural filter in the shared network layer adopts deformable convolution calculation, and adds a variable to the traditional convolution operation area. The learned parameter Δpn.
本发明实施例进一步给出一种融合光流信息和Siamese框架的目标跟踪系统,包括:The embodiment of the present invention further provides a target tracking system integrating optical flow information and Siamese framework, including:
处理器,用于执行多条指令;Processor, used to execute multiple instructions;
存储器,用于存储多条指令;Memory, used to store multiple instructions;
其中,所述多条指令,用于由所述存储器存储,并由所述处理器加载并执行如上所述的融合光流信息和Siamese框架的目标跟踪方法。Wherein, the multiple instructions are used to be stored by the memory and loaded by the processor to execute the target tracking method of fusing optical flow information and Siamese framework as described above.
本发明实施例进一步给出一种计算机可读存储介质,所述存储介质中存储有多条指令;所述多条指令,用于由处理器加载并执行如上所述的融合光流信息和Siamese框架的目标跟踪方法。The embodiment of the present invention further provides a computer-readable storage medium in which multiple instructions are stored; the multiple instructions are used by a processor to load and execute the above-mentioned fused optical flow information and Siamese Framework's target tracking method.
需要说明的是,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。It should be noted that the embodiments of the present invention and the features in the embodiments can be combined with each other if there is no conflict.
在本发明所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或 直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined. Or it can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机装置(可以是个人计算机,实体机服务器,或者网络云服务器等,需安装Windows或者Windows Server操作系统)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The above-mentioned software functional unit is stored in a storage medium and includes several instructions to make a computer device (which can be a personal computer, a physical machine server, or a network cloud server, etc., need to install Windows or Windows Server operating system) to execute each of the present invention Part of the steps of the method described in the embodiment. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
以上所述,仅是本发明的较佳实施例而已,并非对本发明作任何形式上的限制,依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与修饰,均仍属于本发明技术方案的范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the present invention in any form. Any simple modifications, equivalent changes and modifications made to the above embodiments based on the technical essence of the present invention still belong to the present invention. Within the scope of the technical solution of the invention.

Claims (8)

  1. 一种融合光流信息和Siamese框架的目标跟踪方法,其特征在于,所述方法包括:A target tracking method integrating optical flow information and Siamese framework is characterized in that the method includes:
    S101:获取当前帧,当前帧为第N帧,其中N>3,再获取当前帧的前面的三帧,分别是第N-3帧、第N-2帧、第N-1帧,所述第N-3帧、第N-2帧、第N-1帧分别和当前第N帧使用TVNet光流网络来计算光流,得到Flow1、Flow2、Flow3;并对Flow1、Flow2及Flow3进行裁剪(Crop)操作,得到22×22的光流矢量图P1、P2、P3;将当前帧输入特征网络,得到22×22的当前帧特征图F N;将当前帧特征图F N分别与光流矢量图P1、P2、P3结合,再对结合后的结果进行变形(Warp)操作,得到变形后的特征图F 1、F 2、F 3S101: Obtain the current frame, the current frame is the Nth frame, where N>3, and then obtain the previous three frames of the current frame, namely the N-3th frame, the N-2th frame, and the N-1th frame. Frame N-3, Frame N-2, Frame N-1 and the current frame N respectively use the TVNet optical flow network to calculate the optical flow to obtain Flow1, Flow2, Flow3; and crop Flow1, Flow2, and Flow3 ( Crop) operation to obtain 22×22 optical flow vector diagrams P1, P2, P3; input the current frame into the feature network to obtain a 22×22 current frame feature map F N ; separate the current frame feature map F N with the optical flow vector Figures P1, P2, and P3 are combined, and then warp the combined result to obtain the deformed feature maps F 1 , F 2 , F 3 ;
    S102:将变形后的特征图F 1、F 2、F 3与当前帧特征图F N作为检测帧输入时序打分模型,得到所述候选检测帧的特征权重,并将所述候选检测帧的特征权重与融合了光流特征的候选检测帧按照公式(1)相乘得到最终的检测帧; S102: Use the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N as the detection frame input timing scoring model to obtain the feature weights of the candidate detection frames, and combine the features of the candidate detection frames The weight and the candidate detection frame fused with optical flow features are multiplied according to formula (1) to obtain the final detection frame;
    Figure PCTCN2019105275-appb-100001
    Figure PCTCN2019105275-appb-100001
    i表示当前帧的序号,I i指当前帧第i帧,I j指在当前帧I i前面的某一帧如第j帧,j∈{i-T,…,i-2,i-1},T=3,即当前帧的前面的三帧;
    Figure PCTCN2019105275-appb-100002
    是当前帧通过融合其他帧光流信息后得到的最终的检测帧,w j->i表示由时序打分模型计算并输出的候选检测帧的特征权重;f j->i是将第j帧中的运动信息通过光流网络映射到第i帧,然后再将得出的光流结果图与第j帧图像进行变形(Warp)操作;
    i represents the sequence number of the current frame, I i refers to the i-th frame of the current frame, I j refers to a certain frame before the current frame I i , such as the j-th frame, j∈{iT,...,i-2,i-1}, T=3, that is, the previous three frames of the current frame;
    Figure PCTCN2019105275-appb-100002
    It is the final detection frame obtained by fusing the optical flow information of other frames in the current frame, w j->i represents the feature weight of the candidate detection frame calculated and output by the timing scoring model; f j->i is the jth frame The motion information of is mapped to the i-th frame through the optical flow network, and then the resulting optical flow result image and the j-th frame image are warped (Warp) operation;
    将第j帧中的运动信息通过光流网络映射到第i帧定义为Mapping the motion information in the j-th frame to the i-th frame through the optical flow network is defined as
    f j→i=W(f j,M i→j)=W(f j,F(I i,I j)) f j→i =W(f j ,M i→j )=W(f j ,F(I i ,I j ))
    其中,F(I i,I j)是通过所述光流网络对I i和I j进行光流计算,得出的结果实现了将第j帧中的运动信息映射到第i帧;f j是第i帧的特征图,W(,)是对所述光流计算得出的结果与I j帧融合,对融合后的信息,进行变形(Warp)操作,应用到每个通道特征映射定位的线性形变方程进行变形(Warp)操作; Wherein, F(I i , I j ) is the optical flow calculation of I i and I j through the optical flow network, and the result obtained realizes the mapping of the motion information in the jth frame to the i-th frame; f j Is the feature map of the i-th frame, W(,) is the result of the optical flow calculation and the I j frame is fused, the fused information is warped (Warp) operation, and it is applied to the feature map positioning of each channel Warp operation with the linear deformation equation;
    其中,所述时序打分模型输入为未经打分的各个时段的变形后的特征图F 1、F 2、F 3与当前帧特征图F N,输出为候选检测帧的权重数值; Wherein, the input of the time series scoring model is the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N in each time period without scoring, and the output is the weight value of the candidate detection frame;
    所述时序打分模型具有池化层,其中的池化层可以执行全局平均池化操作和全局最大值池化操作,通过全局平均池化操作和全局最大值池化,对每个候选检测帧包含物体的信息量进行打分,得到操作后的中间矩阵,The time series scoring model has a pooling layer, where the pooling layer can perform a global average pooling operation and a global maximum pooling operation. Through the global average pooling operation and the global maximum pooling, each candidate detection frame contains The information volume of the object is scored, and the intermediate matrix after the operation is obtained,
    所述全局平均池化操作为:The global average pooling operation is:
    Figure PCTCN2019105275-appb-100003
    Figure PCTCN2019105275-appb-100003
    其中G S-GA(...)表示全局平均池化过程。q T表示T个候选检测帧,q x和q y表示特征图中的像素点,H表示输入到全局平均池化操作前特征图的高,W表示输入到全局平均池化操作前特征图的宽; Where G S-GA (...) represents the global average pooling process. q T represents T candidate detection frames, q x and q y represent pixels in the feature map, H represents the height of the feature map before the input to the global average pooling operation, and W represents the input to the feature map before the global average pooling operation width;
    所述全局最大值池化操作为:The global maximum pooling operation is:
    G S-GM(q T)=Max(q T(q x,q y)) G S-GM (q T )=Max(q T (q x ,q y ))
    G S-GM(...)表示全局最大值池化过程; G S-GM (...) represents the global maximum pooling process;
    所述这全局平均池化操作输出一个T×1维的向量,构成全局平均池化中间矩阵,所述全局最大值池化操作也输出一个T×1维的向量,构成全局最大值池化中间矩阵;The global average pooling operation outputs a T×1 dimensional vector to form a global average pooling intermediate matrix, and the global maximum pooling operation also outputs a T×1 dimensional vector to form the global maximum pooling intermediate matrix. matrix;
    将所述全局平均池化中间矩阵和所述全局最大值池化中间矩阵输入共享网络层,对每个候选帧与当前帧的关联性进行打分;通过共享网络层分别得到全局平均池化和最大值池化的权值矩阵,所述共享网络层实现卷积操作,参数由经验值或训练得到;再对两个权值矩阵进行逐元素相加操作,得到权重特征向量;并将得到的权重特征向量作为激活函数Relu的输入,所述激活函数Relu为:The global average pooling intermediate matrix and the global maximum pooling intermediate matrix are input to the shared network layer, and the correlation between each candidate frame and the current frame is scored; through the shared network layer, the global average pooling and maximum are obtained respectively Value pooled weight matrix, the shared network layer implements convolution operation, and the parameters are obtained by experience or training; then two weight matrixes are added element by element to obtain the weight feature vector; and the weight obtained The feature vector is used as the input of the activation function Relu, and the activation function Relu is:
    Figure PCTCN2019105275-appb-100004
    Figure PCTCN2019105275-appb-100004
    其中,x指输入的所述权重特征向量,α为系数,Where x refers to the input weight feature vector, α is the coefficient,
    所述时序打分模型是由卷积神经网络模型根据损失函数进行训练的。The time series scoring model is trained by the convolutional neural network model according to the loss function.
  2. 如权利要求1所述的融合光流信息和Siamese框架的目标跟踪方法,其特征在于,所述时序打分模型是由卷积神经网络模型根据损失函数进行训练的,所述损失函数为:The target tracking method of fusing optical flow information and Siamese framework according to claim 1, wherein the time series scoring model is trained by a convolutional neural network model according to a loss function, and the loss function is:
    Figure PCTCN2019105275-appb-100005
    Figure PCTCN2019105275-appb-100005
    其中v表示训练集中等待训练的图像的候选响应图每个点的真实值,y∈{+1,-1}表示标准跟踪框的标签;通过最小化上述损失函数来不断学习、训练,当所述损失函数趋于稳定时,所述时序打分模型训练完毕,得到所述时序打分模型的系数,利用训练好的时序打分模型对候选检测帧的权重数值进行计算,从而得到候选检测帧时序权重。Where v represents the true value of each point in the candidate response graph of the image waiting to be trained in the training set, and y∈{+1,-1} represents the label of the standard tracking frame; continuous learning and training are performed by minimizing the above loss function. When the loss function becomes stable, the training of the time series scoring model is completed, the coefficients of the time series scoring model are obtained, and the weight values of candidate detection frames are calculated using the trained time series scoring model to obtain the candidate detection frame time sequence weights.
  3. 如权利要求1所述的融合光流信息和Siamese框架的目标跟踪方法,其特征在于,The target tracking method combining optical flow information and Siamese framework as claimed in claim 1, characterized in that:
    为了更好的提取候选检测帧的图像特征,所述共享网络层中的卷积神经滤波器采用可变形的卷积计算,在传统的卷积操作的作用区域上,加入了一个可学习的参数Δpn。In order to better extract the image features of the candidate detection frame, the convolutional neural filter in the shared network layer adopts deformable convolution calculation, and adds a learnable parameter to the traditional convolution operation area. Δpn.
  4. 一种融合光流信息和Siamese框架的目标跟踪装置,其特征在于,所述装置包括:A target tracking device fusing optical flow information and Siamese framework, characterized in that the device includes:
    获取特征模块:用于获取当前帧,当前帧为第N帧,其中N>3,再获取当前帧的前面的三帧,分别是第N-3帧、第N-2帧、第N-1帧,所述第N-3帧、第N-2帧、第N-1帧分别和当前第N帧使用TVNet光流网络来计算光流,得到Flow1、Flow2、Flow3;并对Flow1、Flow2及Flow3进行裁剪(Crop)操作,得到22×22的光流矢量图P1、P2、P3;将当前帧输入特征网络,得到22×22的当前帧特征图F N;将当前帧特征图F N分别与光流矢量图P1、P2、P3结合,再对结合后的结果进行变形(Warp)操作,得到变形后的特征图F 1、F 2、F 3Obtain feature module: used to obtain the current frame, the current frame is the Nth frame, where N>3, and then obtain the previous three frames of the current frame, which are the N-3th frame, the N-2th frame, and the N-1th frame. Frame, the N-3th, N-2th, N-1th frame and the current Nth frame respectively use TVNet optical flow network to calculate the optical flow to obtain Flow1, Flow2, Flow3; and for Flow1, Flow2 and Flow3 performs cropping (Crop) operation to obtain 22×22 optical flow vector diagrams P1, P2, P3; input the current frame into the feature network to obtain a 22×22 current frame feature map F N ; separate the current frame feature map F N Combine with the optical flow vector diagrams P1, P2, P3, and then perform a warp operation on the combined result to obtain the deformed feature maps F 1 , F 2 , F 3 ;
    权重计算模块:用于将变形后的特征图F 1、F 2、F 3与当前帧特征图F N作为检测帧输入时序打分模型,得到所述候选检测帧的特征权重,并将所述候选检测帧的特征权重与融合了光流特征的候选检测帧按照公式(1)相乘得到最终的检测帧; Weight calculation module: used to use the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N as the detection frame input timing scoring model to obtain the feature weight of the candidate detection frame, and compare the candidate The feature weight of the detection frame is multiplied by the candidate detection frame fused with optical flow features according to formula (1) to obtain the final detection frame;
    Figure PCTCN2019105275-appb-100006
    Figure PCTCN2019105275-appb-100006
    i表示当前帧的序号,I i指当前帧第i帧,I j指在当前帧I i前面的某一帧如第j帧,j∈{i-T,…,i-2,i-1},T=3,即当前帧的前面的三帧;
    Figure PCTCN2019105275-appb-100007
    是当前帧通过融合 其他帧光流信息后得到的最终的检测帧,w j->i表示由时序打分模型计算并输出的候选检测帧的特征权重;f j->i是将第j帧中的运动信息通过光流网络映射到第i帧,然后再将得出的光流结果图与第j帧图像进行变形(Warp)操作;
    i represents the sequence number of the current frame, I i refers to the i-th frame of the current frame, I j refers to a certain frame before the current frame I i , such as the j-th frame, j∈{iT,...,i-2,i-1}, T=3, that is, the previous three frames of the current frame;
    Figure PCTCN2019105275-appb-100007
    It is the final detection frame obtained by fusing the optical flow information of other frames in the current frame, w j->i represents the feature weight of the candidate detection frame calculated and output by the time series scoring model; f j->i is the jth frame The motion information of is mapped to the i-th frame through the optical flow network, and then the resulting optical flow result image and the j-th frame image are warped (Warp) operation;
    将第j帧中的运动信息通过光流网络映射到第i帧定义为Mapping the motion information in the j-th frame to the i-th frame through the optical flow network is defined as
    f j→i=W(f j,M i→j)=W(f j,F(I i,I j)) f j→i =W(f j ,M i→j )=W(f j ,F(I i ,I j ))
    其中,F(I i,I j)是通过所述光流网络对I i和I j进行光流计算,得出的结果实现了将第j帧中的运动信息映射到第i帧;f j是第i帧的特征图,W(,)是对所述光流计算得出的结果与I j帧融合,对融合后的信息,进行变形(Warp)操作,应用到每个通道特征映射定位的线性形变方程进行变形(Warp)操作; Wherein, F(I i , I j ) is the optical flow calculation of I i and I j through the optical flow network, and the result obtained realizes the mapping of the motion information in the jth frame to the i-th frame; f j Is the feature map of the i-th frame, W(,) is the result of the optical flow calculation and the I j frame is fused, the fused information is warped (Warp) operation, and it is applied to the feature map positioning of each channel Warp operation with the linear deformation equation;
    其中,所述时序打分模型输入为未经打分的各个时段的变形后的特征图F 1、F 2、F 3与当前帧特征图F N,输出为候选检测帧的权重数值; Wherein, the input of the time series scoring model is the deformed feature maps F 1 , F 2 , F 3 and the current frame feature map F N in each time period without scoring, and the output is the weight value of the candidate detection frame;
    所述时序打分模型具有池化层,其中的池化层可以执行全局平均池化操作和全局最大值池化操作,通过全局平均池化操作和全局最大值池化,对每个候选检测帧包含物体的信息量进行打分,得到操作后的中间矩阵,The time series scoring model has a pooling layer, where the pooling layer can perform a global average pooling operation and a global maximum pooling operation. Through the global average pooling operation and the global maximum pooling, each candidate detection frame contains The information volume of the object is scored, and the intermediate matrix after the operation is obtained,
    所述全局平均池化操作为:The global average pooling operation is:
    Figure PCTCN2019105275-appb-100008
    Figure PCTCN2019105275-appb-100008
    其中G S-GA(...)表示全局平均池化过程。q T表示T个候选检测帧,q x和q y表示特征图中的像素点,H表示输入到全局平均池化操作前特征图的高,W表示输入到全局平均池化操作前特征图的宽; Where G S-GA (...) represents the global average pooling process. q T represents T candidate detection frames, q x and q y represent pixels in the feature map, H represents the height of the feature map before the input to the global average pooling operation, and W represents the input to the feature map before the global average pooling operation width;
    所述全局最大值池化操作为:The global maximum pooling operation is:
    G S-GM(q T)=Max(q T(q x,q y)) G S-GM (q T )=Max(q T (q x ,q y ))
    G S-GM(...)表示全局最大值池化过程; G S-GM (...) represents the global maximum pooling process;
    所述这全局平均池化操作输出一个T×1维的向量,构成全局平均池化中间矩阵,所述全局最大值池化操作也输出一个T×1维的向量,构成全局最大值池化中间矩阵;The global average pooling operation outputs a T×1 dimensional vector to form a global average pooling intermediate matrix, and the global maximum pooling operation also outputs a T×1 dimensional vector to form the global maximum pooling intermediate matrix. matrix;
    将所述全局平均池化中间矩阵和所述全局最大值池化中间矩阵输入共享网 络层,对每个候选帧与当前帧的关联性进行打分;通过共享网络层分别得到全局平均池化和最大值池化的权值矩阵,所述共享网络层实现卷积操作,参数由经验值或训练得到;再对两个权值矩阵进行逐元素相加操作,得到权重特征向量;并将得到的权重特征向量作为激活函数Relu的输入,所述激活函数Relu为:The global average pooling intermediate matrix and the global maximum pooling intermediate matrix are input to the shared network layer, and the correlation between each candidate frame and the current frame is scored; through the shared network layer, the global average pooling and maximum are obtained respectively Value pooled weight matrix, the shared network layer implements convolution operation, and the parameters are obtained by experience or training; then two weight matrixes are added element by element to obtain the weight feature vector; and the weight obtained The feature vector is used as the input of the activation function Relu, and the activation function Relu is:
    Figure PCTCN2019105275-appb-100009
    Figure PCTCN2019105275-appb-100009
    其中,x指输入的所述权重特征向量,α为系数,Where x refers to the input weight feature vector, α is the coefficient,
    所述时序打分模型是由卷积神经网络模型根据损失函数进行训练的。The time series scoring model is trained by the convolutional neural network model according to the loss function.
  5. 如权利要求4所述的融合光流信息和Siamese框架的目标跟踪装置,其特征在于,The target tracking device integrating optical flow information and Siamese framework according to claim 4, characterized in that:
    所述时序打分模型是由卷积神经网络模型根据损失函数进行训练的,所述损失函数为:The time series scoring model is trained by a convolutional neural network model according to a loss function, and the loss function is:
    Figure PCTCN2019105275-appb-100010
    Figure PCTCN2019105275-appb-100010
    其中v表示训练集中等待训练的图像的候选响应图每个点的真实值,y∈{+1,-1}表示标准跟踪框的标签;通过最小化上述损失函数来不断学习、训练,当所述损失函数趋于稳定时,所述时序打分模型训练完毕,得到所述时序打分模型的系数,利用训练好的时序打分模型对候选检测帧的权重数值进行计算,从而得到候选检测帧时序权重。Where v represents the true value of each point in the candidate response graph of the image waiting to be trained in the training set, and y∈{+1,-1} represents the label of the standard tracking frame; continuous learning and training are performed by minimizing the above loss function. When the loss function becomes stable, the training of the time series scoring model is completed, the coefficients of the time series scoring model are obtained, and the weight values of candidate detection frames are calculated using the trained time series scoring model to obtain the candidate detection frame time sequence weights.
  6. 如权利要求4所述的融合光流信息和Siamese框架的目标跟踪装置,其特征在于,The target tracking device integrating optical flow information and Siamese framework according to claim 4, characterized in that:
    为了更好的提取候选检测帧的图像特征,所述共享网络层中的卷积神经滤波器采用可变形的卷积计算,在传统的卷积操作的作用区域上,加入了一个可学习的参数Δpn。In order to better extract the image features of the candidate detection frame, the convolutional neural filter in the shared network layer adopts deformable convolution calculation, and adds a learnable parameter to the traditional convolution operation area. Δpn.
  7. 一种融合光流信息和Siamese框架的目标跟踪系统,其特征在于,包括:A target tracking system integrating optical flow information and Siamese framework is characterized in that it comprises:
    处理器,用于执行多条指令;Processor, used to execute multiple instructions;
    存储器,用于存储多条指令;Memory, used to store multiple instructions;
    其中,所述多条指令,用于由所述存储器存储,并由所述处理器加载并执行如权利要求1-3任一所述的融合光流信息和Siamese框架的目标跟踪方法。Wherein, the multiple instructions are used to be stored by the memory, loaded by the processor and executed according to any one of claims 1 to 3 of the target tracking method fusing optical flow information and Siamese framework.
  8. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有多条指令;所述多条指令,用于由处理器加载并执行如权利要求1-3任一所述的融合光流信息和Siamese框架的目标跟踪方法。A computer-readable storage medium, characterized in that a plurality of instructions are stored in the storage medium; the plurality of instructions are used to be loaded by a processor and execute the fusion optics according to any one of claims 1-3. Flow information and target tracking method of Siamese framework.
PCT/CN2019/105275 2019-08-23 2019-09-11 Target tracking method and device fusing optical flow information and siamese framework WO2021035807A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910783618.2 2019-08-23
CN201910783618.2A CN110619655B (en) 2019-08-23 2019-08-23 Target tracking method and device integrating optical flow information and Simese framework

Publications (1)

Publication Number Publication Date
WO2021035807A1 true WO2021035807A1 (en) 2021-03-04

Family

ID=68922462

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/105275 WO2021035807A1 (en) 2019-08-23 2019-09-11 Target tracking method and device fusing optical flow information and siamese framework

Country Status (2)

Country Link
CN (1) CN110619655B (en)
WO (1) WO2021035807A1 (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269699A (en) * 2021-04-22 2021-08-17 天津(滨海)人工智能军民融合创新中心 Optical flow estimation method and system based on fusion of asynchronous event flow and gray level image
CN113569757A (en) * 2021-07-29 2021-10-29 西安交通大学 Time sequence action positioning method, system, terminal equipment and readable storage medium
CN113628696A (en) * 2021-07-19 2021-11-09 武汉大学 Drug connection graph score prediction method and device based on double-graph convolution fusion model
CN113723279A (en) * 2021-08-30 2021-11-30 东南大学 Multi-target tracking acceleration method based on time-space optimization in edge computing environment
CN113793359A (en) * 2021-08-25 2021-12-14 西安工业大学 Target tracking method fusing twin network and related filtering
CN113792633A (en) * 2021-09-06 2021-12-14 北京工商大学 Face tracking system and method based on neural network and optical flow method
CN113920159A (en) * 2021-09-15 2022-01-11 河南科技大学 Infrared aerial small target tracking method based on full convolution twin network
CN114220061A (en) * 2021-12-28 2022-03-22 青岛科技大学 Multi-target tracking method based on deep learning
CN114339030A (en) * 2021-11-29 2022-04-12 北京工业大学 Network live broadcast video image stabilization method based on self-adaptive separable convolution
CN114419102A (en) * 2022-01-25 2022-04-29 江南大学 Multi-target tracking detection method based on frame difference time sequence motion information
CN115222771A (en) * 2022-07-05 2022-10-21 北京建筑大学 Target tracking method and device
CN115273908A (en) * 2022-08-05 2022-11-01 东北农业大学 Live pig cough sound identification method based on classifier fusion
CN115953740A (en) * 2023-03-14 2023-04-11 深圳市睿创科数码有限公司 Security control method and system based on cloud
CN116486107A (en) * 2023-06-21 2023-07-25 南昌航空大学 Optical flow calculation method, system, equipment and medium
CN116636423A (en) * 2023-07-26 2023-08-25 云南农业大学 Efficient cultivation method of poria cocos strain
CN116647946A (en) * 2023-07-27 2023-08-25 济宁九德半导体科技有限公司 Semiconductor-based heating control system and method thereof
CN116703980A (en) * 2023-08-04 2023-09-05 南昌工程学院 Target tracking method and system based on pyramid pooling transducer backbone network
CN117252904A (en) * 2023-11-15 2023-12-19 南昌工程学院 Target tracking method and system based on long-range space perception and channel enhancement
CN113920159B (en) * 2021-09-15 2024-05-10 河南科技大学 Infrared air small and medium target tracking method based on full convolution twin network

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110595466B (en) * 2019-09-18 2020-11-03 电子科技大学 Lightweight inertial-assisted visual odometer implementation method based on deep learning
CN111127532B (en) * 2019-12-31 2020-12-22 成都信息工程大学 Medical image deformation registration method and system based on deep learning characteristic optical flow
CN111640136B (en) * 2020-05-23 2022-02-25 西北工业大学 Depth target tracking method in complex environment
CN111814604A (en) * 2020-06-23 2020-10-23 浙江理工大学 Pedestrian tracking method based on twin neural network
CN111915573A (en) * 2020-07-14 2020-11-10 武汉楚精灵医疗科技有限公司 Digestive endoscopy focus tracking method based on time sequence feature learning
CN112085767B (en) * 2020-08-28 2023-04-18 安徽清新互联信息科技有限公司 Passenger flow statistical method and system based on deep optical flow tracking
CN112215079B (en) * 2020-09-16 2022-03-15 电子科技大学 Global multistage target tracking method
CN112561969B (en) * 2020-12-25 2023-07-25 哈尔滨工业大学(深圳) Mobile robot infrared target tracking method and system based on unsupervised optical flow network
CN113744314B (en) * 2021-09-06 2023-09-22 郑州海威光电科技有限公司 Target tracking method based on target-interference sensing
CN117011343B (en) * 2023-08-09 2024-04-05 北京航空航天大学 Optical flow guiding multi-target tracking method for crowded scene

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101673404A (en) * 2009-10-19 2010-03-17 北京中星微电子有限公司 Target detection method and device
CN107038713A (en) * 2017-04-12 2017-08-11 南京航空航天大学 A kind of moving target method for catching for merging optical flow method and neutral net
CN109460707A (en) * 2018-10-08 2019-03-12 华南理工大学 A kind of multi-modal action identification method based on deep neural network
CN109711316A (en) * 2018-12-21 2019-05-03 广东工业大学 A kind of pedestrian recognition methods, device, equipment and storage medium again
CN110111366A (en) * 2019-05-06 2019-08-09 北京理工大学 A kind of end-to-end light stream estimation method based on multistage loss amount

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767456A (en) * 2019-01-09 2019-05-17 上海大学 A kind of method for tracking target based on SiameseFC frame and PFP neural network
CN109993095B (en) * 2019-03-26 2022-12-20 东北大学 Frame level feature aggregation method for video target detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101673404A (en) * 2009-10-19 2010-03-17 北京中星微电子有限公司 Target detection method and device
CN107038713A (en) * 2017-04-12 2017-08-11 南京航空航天大学 A kind of moving target method for catching for merging optical flow method and neutral net
CN109460707A (en) * 2018-10-08 2019-03-12 华南理工大学 A kind of multi-modal action identification method based on deep neural network
CN109711316A (en) * 2018-12-21 2019-05-03 广东工业大学 A kind of pedestrian recognition methods, device, equipment and storage medium again
CN110111366A (en) * 2019-05-06 2019-08-09 北京理工大学 A kind of end-to-end light stream estimation method based on multistage loss amount

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269699A (en) * 2021-04-22 2021-08-17 天津(滨海)人工智能军民融合创新中心 Optical flow estimation method and system based on fusion of asynchronous event flow and gray level image
CN113269699B (en) * 2021-04-22 2023-01-03 天津(滨海)人工智能军民融合创新中心 Optical flow estimation method and system based on fusion of asynchronous event flow and gray level image
CN113628696A (en) * 2021-07-19 2021-11-09 武汉大学 Drug connection graph score prediction method and device based on double-graph convolution fusion model
CN113628696B (en) * 2021-07-19 2023-10-31 武汉大学 Medicine connection graph score prediction method and device based on double-graph convolution fusion model
CN113569757B (en) * 2021-07-29 2024-04-05 西安交通大学 Time sequence action positioning method, system, terminal equipment and readable storage medium
CN113569757A (en) * 2021-07-29 2021-10-29 西安交通大学 Time sequence action positioning method, system, terminal equipment and readable storage medium
CN113793359A (en) * 2021-08-25 2021-12-14 西安工业大学 Target tracking method fusing twin network and related filtering
CN113793359B (en) * 2021-08-25 2024-04-05 西安工业大学 Target tracking method integrating twin network and related filtering
CN113723279B (en) * 2021-08-30 2022-11-01 东南大学 Multi-target tracking acceleration method based on time-space optimization in edge computing environment
CN113723279A (en) * 2021-08-30 2021-11-30 东南大学 Multi-target tracking acceleration method based on time-space optimization in edge computing environment
CN113792633A (en) * 2021-09-06 2021-12-14 北京工商大学 Face tracking system and method based on neural network and optical flow method
CN113792633B (en) * 2021-09-06 2023-12-22 北京工商大学 Face tracking system and method based on neural network and optical flow method
CN113920159B (en) * 2021-09-15 2024-05-10 河南科技大学 Infrared air small and medium target tracking method based on full convolution twin network
CN113920159A (en) * 2021-09-15 2022-01-11 河南科技大学 Infrared aerial small target tracking method based on full convolution twin network
CN114339030A (en) * 2021-11-29 2022-04-12 北京工业大学 Network live broadcast video image stabilization method based on self-adaptive separable convolution
CN114339030B (en) * 2021-11-29 2024-04-02 北京工业大学 Network live video image stabilizing method based on self-adaptive separable convolution
CN114220061B (en) * 2021-12-28 2024-04-23 青岛科技大学 Multi-target tracking method based on deep learning
CN114220061A (en) * 2021-12-28 2022-03-22 青岛科技大学 Multi-target tracking method based on deep learning
CN114419102B (en) * 2022-01-25 2023-06-06 江南大学 Multi-target tracking detection method based on frame difference time sequence motion information
CN114419102A (en) * 2022-01-25 2022-04-29 江南大学 Multi-target tracking detection method based on frame difference time sequence motion information
CN115222771A (en) * 2022-07-05 2022-10-21 北京建筑大学 Target tracking method and device
CN115273908A (en) * 2022-08-05 2022-11-01 东北农业大学 Live pig cough sound identification method based on classifier fusion
CN115273908B (en) * 2022-08-05 2023-05-12 东北农业大学 Live pig cough voice recognition method based on classifier fusion
CN115953740A (en) * 2023-03-14 2023-04-11 深圳市睿创科数码有限公司 Security control method and system based on cloud
CN116486107A (en) * 2023-06-21 2023-07-25 南昌航空大学 Optical flow calculation method, system, equipment and medium
CN116486107B (en) * 2023-06-21 2023-09-05 南昌航空大学 Optical flow calculation method, system, equipment and medium
CN116636423A (en) * 2023-07-26 2023-08-25 云南农业大学 Efficient cultivation method of poria cocos strain
CN116636423B (en) * 2023-07-26 2023-09-26 云南农业大学 Efficient cultivation method of poria cocos strain
CN116647946A (en) * 2023-07-27 2023-08-25 济宁九德半导体科技有限公司 Semiconductor-based heating control system and method thereof
CN116647946B (en) * 2023-07-27 2023-10-13 济宁九德半导体科技有限公司 Semiconductor-based heating control system and method thereof
CN116703980B (en) * 2023-08-04 2023-10-24 南昌工程学院 Target tracking method and system based on pyramid pooling transducer backbone network
CN116703980A (en) * 2023-08-04 2023-09-05 南昌工程学院 Target tracking method and system based on pyramid pooling transducer backbone network
CN117252904A (en) * 2023-11-15 2023-12-19 南昌工程学院 Target tracking method and system based on long-range space perception and channel enhancement
CN117252904B (en) * 2023-11-15 2024-02-09 南昌工程学院 Target tracking method and system based on long-range space perception and channel enhancement

Also Published As

Publication number Publication date
CN110619655B (en) 2022-03-29
CN110619655A (en) 2019-12-27

Similar Documents

Publication Publication Date Title
WO2021035807A1 (en) Target tracking method and device fusing optical flow information and siamese framework
KR102302725B1 (en) Room Layout Estimation Methods and Techniques
CN110570371B (en) Image defogging method based on multi-scale residual error learning
WO2021043168A1 (en) Person re-identification network training method and person re-identification method and apparatus
WO2021249255A1 (en) Grabbing detection method based on rp-resnet
CN113286194A (en) Video processing method and device, electronic equipment and readable storage medium
CN111723707B (en) Gaze point estimation method and device based on visual saliency
WO2022179581A1 (en) Image processing method and related device
CN113205595B (en) Construction method and application of 3D human body posture estimation model
WO2022001372A1 (en) Neural network training method and apparatus, and image processing method and apparatus
CN110060286B (en) Monocular depth estimation method
CN113449612B (en) Three-dimensional target point cloud identification method based on sub-flow sparse convolution
CN114036969B (en) 3D human body action recognition algorithm under multi-view condition
WO2021143569A1 (en) Dense optical flow calculation system and method based on fpga
WO2022052782A1 (en) Image processing method and related device
CN113673545A (en) Optical flow estimation method, related device, equipment and computer readable storage medium
CN114581502A (en) Monocular image-based three-dimensional human body model joint reconstruction method, electronic device and storage medium
CN112184555B (en) Stereo image super-resolution reconstruction method based on deep interactive learning
CN111611869A (en) End-to-end monocular vision obstacle avoidance method based on serial deep neural network
US20220215617A1 (en) Viewpoint image processing method and related device
WO2023086398A1 (en) 3d rendering networks based on refractive neural radiance fields
CN116888605A (en) Operation method, training method and device of neural network model
CN118037906A (en) Monocular RBG video-oriented human body movement redirection method
CN117809048A (en) Intelligent image edge extraction system and method
CN117934308A (en) Lightweight self-supervision monocular depth estimation method based on graph convolution network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19943251

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19943251

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09.08.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19943251

Country of ref document: EP

Kind code of ref document: A1