CN114581485A - Target tracking method based on language modeling pattern twin network - Google Patents

Target tracking method based on language modeling pattern twin network Download PDF

Info

Publication number
CN114581485A
CN114581485A CN202210199168.4A CN202210199168A CN114581485A CN 114581485 A CN114581485 A CN 114581485A CN 202210199168 A CN202210199168 A CN 202210199168A CN 114581485 A CN114581485 A CN 114581485A
Authority
CN
China
Prior art keywords
target
network
frame
tracked
twin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210199168.4A
Other languages
Chinese (zh)
Inventor
傅衡成
何为
李凤荣
胡育昱
魏智
纪立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hansuo Information Technology Co ltd
Original Assignee
Shanghai Hansuo Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hansuo Information Technology Co ltd filed Critical Shanghai Hansuo Information Technology Co ltd
Priority to CN202210199168.4A priority Critical patent/CN114581485A/en
Publication of CN114581485A publication Critical patent/CN114581485A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention relates to a target tracking method based on a language modeling twin network, which comprises the following steps: step S1, acquiring a video containing continuous movement of a target, and making a training data set according to the video; step S2, training a twin neural network according to the training data set; step S3, keeping the parameters of the twin neural network unchanged, and training a target position extraction network; step S4, performing combined training on the trained twin neural network and the trained target position extraction network to obtain a language modeling type twin network; and step S5, acquiring a real-time image of the target to be tracked, and tracking the target to be tracked in real time by using the language modeling mode twin network. The invention does not need to integrate expert experience knowledge into the algorithm, and the realization process is simpler and more convenient. In addition, the invention has strong expansibility and can be accessed to a more general intelligent system.

Description

Target tracking method based on language modeling pattern twin network
Technical Field
The invention relates to the technical field of video target tracking, in particular to a target tracking method based on a language modeling twin network.
Background
The target tracking technology has wide application value in the civil and national defense fields, and has important significance for the development of the fields of robots, aircrafts, unmanned driving, security protection and the like. For example, in the security field, a camera tracks pedestrians in a field of view, and the pedestrians are analyzed and processed through a series of subsequent intelligent algorithms, so that the monitoring system can better sense and understand human postures, actions and behavior intentions, and intelligent, timely and efficient monitoring is achieved. The automatic following means that a tracked target is selected from an initialization picture, then the tracked target is tracked, and the posture of the tracked target and the distance between the tracked target and the target are adjusted according to the position of the target, so that the tracked target is ensured to be well imaged.
Current tracking algorithms can be divided into two broad categories: one class is based on traditional machine learning algorithms, such as correlation filtering, support vector machines, etc., which mainly rely on online training classifiers to distinguish targets from backgrounds, and then use the classifiers to locate the targets from candidate regions. The second type is based on deep learning algorithms, such as convolutional neural networks, twin neural networks, and the like, and such algorithms firstly perform offline training on a large-scale data set and then track a target. From the performance of various algorithms on a test data set, the deep learning tracking algorithm relies on strong feature representation capability, and the tracking accuracy of the deep learning tracking algorithm is far superior to that of the traditional tracking algorithm.
As target tracking is a basic sub-field in computer vision, the target tracking can have better application value by being combined with more vision processing algorithms, such as human body posture estimation, pedestrian re-recognition, action recognition and the like. The current tracking algorithm is limited to providing a rectangular box for a subsequent algorithm, has poor expansibility and can only access some specific intelligent systems. Moreover, the current tracking algorithm needs to incorporate a large amount of carefully designed and highly task-customized expert experience knowledge into the algorithm, and the implementation process is complicated.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a target tracking method based on a language modeling pattern twin network, which can be accessed into a more general intelligent system, does not need to incorporate expert experience knowledge into an algorithm, and is simple and convenient in implementation process.
The invention provides a target tracking method based on a language modeling twin network, which comprises the following steps:
step S1, acquiring a video containing continuous movement of a target, and making a training data set according to the video;
step S2, training a twin neural network according to the training data set;
step S3, keeping the parameters of the twin neural network unchanged, and training a target position extraction network;
step S4, performing combined training on the trained twin neural network and the trained target position extraction network to obtain a language modeling type twin network;
and step S5, acquiring a real-time image of the target to be tracked, and tracking the target to be tracked in real time by using the language modeling mode twin network.
Further, the creating a training data set in step S1 includes:
in step S11, the video containing the continuous motion of the object is framed into an image sequence, and the bounding box of the object in each image is marked.
Further, the training the twin neural network in step S2 includes:
step S21, randomly selecting two frames from the image sequence, obtaining a template branch from one frame, inputting the template picture into the twin neural network, and outputting a template feature map by the template branch; acquiring a candidate branch of a candidate region picture input to the twin neural network from another frame, wherein the candidate branch outputs a candidate feature map;
step S22, carrying out convolution operation on the template characteristic diagram and the candidate characteristic diagram, and outputting an encoding result diagram;
step S23: and determining a loss function according to the coding result graph, and training the twin neural network by using the loss function.
Further, the training of the target location extraction network in step S3 includes:
step S31, expanding the convolution result obtained in the step S22 into a vector form, inputting the vector into a feature dimension compression sub-network of a target position extraction network, and obtaining a compression result vector;
and step S32, inputting the compressed vector to a Transformer decoder of the target position extraction network to obtain the predicted coordinate of the target to be tracked, calculating loss according to the predicted coordinate of the target to be tracked and the actual coordinate of the target to be tracked, and training the target position extraction network by using a gradient back propagation algorithm.
Further, the step S5 of tracking the target to be tracked in real time includes:
step S51, initializing i ═ 2;
step S52, acquiring a boundary frame of the target to be tracked in the i-1 frame image, and extracting a template picture of the target to be tracked in the i-1 frame image; inputting a template picture of a target to be tracked in the (i-1) th frame image into a template branch of a twin neural network in a language modeling type twin network to obtain an (i-1) th frame template feature map;
step S53, taking the position of the target to be tracked in the i-1 th frame image as the center, cutting a picture with the size of 255 x 255 pixels from the i-th frame image as an i-th frame candidate area picture;
step S54, inputting the candidate region picture of the ith frame into a candidate branch of a twin neural network in a language modeling mode twin network to obtain a candidate feature picture of the ith frame;
step S55, performing convolution operation on the ith frame candidate feature map and the (i-1) th frame template feature map, unfolding the feature map obtained through the convolution operation into vectors, and then sending the vectors into a target position extraction network in a language modeling twin network to obtain a target frame tracked by the ith frame;
step S56, extracting the target picture of the ith frame from the target frame tracked by the ith frame, and acquiring the predicted target position characteristic diagram of the ith frame according to the target picture of the ith frame;
step S57, acquiring a template feature map of the ith frame according to the template feature map of the ith-1 frame and the predicted target feature map of the ith frame;
step S58, judging whether the ith frame is the last frame image of the target to be tracked according to the real-time image of the target to be tracked, if so, ending the process; if not, let i equal i +1, and repeat steps S53-S57.
The invention models the target tracking problem into a language modeling problem based on pixel input, and effectively fuses an image and language sequence method to track a specific target. Compared with the traditional pure vision-based target tracking method, the method does not need to incorporate expert experience knowledge into the algorithm, and the implementation process is simpler and more convenient. In addition, the invention has strong expansibility, and the discrete language sequence output can be used as a language interface to be accessed into a more general intelligent system.
Drawings
Fig. 1 is a flowchart of a target tracking method based on a language modeling pattern twin network according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Because human beings strongly depend on natural language for deep-level wide-range perception and analysis of environment, the invention solves the tracking problem in computer vision by using a processing method in the natural language, and provides a target tracking method based on a language-modeling twin network, as shown in fig. 1, the method comprises the following steps:
step S1, acquiring a video containing continuous motion of the target, and creating a training data set according to the video containing continuous motion of the target.
Specifically, the method for making the training data set comprises the following steps:
step S11, framing the video containing the continuous motion of the target into an image sequence, and marking the bounding box of the target to be tracked in each image. According to the marked bounding box, the center point, the length and the width of the bounding box can be obtained.
And step S12, acquiring a template picture and a candidate area picture according to the boundary frame of the target to be tracked in each image. The template picture is a picture of an area in the bounding box. The method for acquiring the candidate area picture comprises the following steps: and taking the position of the randomly shifted central point of the boundary frame as the center of the target, and expanding a square area of 255 x 255 pixels around the center of the target, wherein the square area is a candidate area, and the picture in the square area is a candidate area picture.
And step S13, constructing a word list according to the candidate area picture, wherein the word list stores the actual coordinates of the target to be tracked. The size of the vocabulary is determined according to the size of the candidate region, so that the size of the vocabulary is 255 × 255, that is, the vocabulary stores 65025 coordinates including the positions of the upper left corner point and the lower right corner point of the bounding box. And in the subsequent supervision training, the prediction result of the target position extraction network and the truth value labeled by the word list are input into a loss function to calculate loss, so that the network parameters are updated by a gradient back propagation algorithm, and in the subsequent real-time tracking process, the prediction result is also represented by words in the word list. The invention determines the position of the target in the candidate region picture with the pixel-level precision, thereby ensuring the image precision and the positioning precision.
And step S2, training the twin neural network according to the produced training data set.
Specifically, training the twin neural network comprises the following steps:
step S21, randomly selecting two frames from the image sequence of step S11, obtaining a template branch from one frame, inputting the template picture into the twin neural network, and outputting a template feature map by the template branch; and acquiring a candidate region picture from another frame, inputting the candidate region picture into a candidate branch of the twin neural network, and outputting a candidate feature map by the candidate branch.
And step S22, performing convolution operation on the template characteristic diagram and the candidate characteristic diagram to obtain a convolution result, and outputting a coding result diagram. The encoding result graph corresponds to a feature graph of the target to be tracked, values in the feature graph reflect the possibility of the target to be tracked at the current position, and the larger the value of the feature graph is, the higher the possibility of the target at the current position is.
Step S23: and training a loss function according to the coding result graph and the labeling result, and training a twin neural network by using a gradient back propagation algorithm.
It should be noted that, when the twin neural network is trained, a gaussian label with the center of the target to be tracked as the mean value is used as the real label. The loss function for training the twin neural network is shown as follows:
Figure BDA0003528543120000051
in the formula, N represents the number of elements of the encoding result graph D, u represents the element position of the encoding result graph D, y ∈ {1,0} represents a real label, v represents an actual value in the encoding result graph, and log represents a logistic function with 2 as low.
And step S3, keeping the parameters of the twin neural network unchanged, and training the target position extraction network.
Parameters of the twin neural network include filter parameters and bias terms, and the two parameters are set to be in a non-training state in the process of training the target position extraction network. The target position extraction network comprises a feature dimension compression sub-network and a Transformer decoder, and the training of the target position extraction network comprises the following steps:
and step S31, expanding the convolution characteristic diagram obtained in the step S22 into vectors, and inputting the vectors into the characteristic dimension compression sub-network to obtain a compression result vector. The characteristic dimension compression sub-network consists of two fully-connected layers and is used for compressing the dimension of a convolution result and reducing the calculation amount of a subsequent network.
Step S32, the compressed result vector is input to the transform decoder, and the transform decoder outputs a final prediction result, where the prediction result includes the prediction coordinates (the upper left corner point coordinate and the lower right corner point coordinate) of the target to be tracked. Then, calculating loss according to the predicted coordinates of the target to be tracked and the actual coordinates of the target to be tracked, and training a target position extraction network by using a gradient back propagation algorithm
The loss function for training the target location extraction network is shown as follows:
Figure BDA0003528543120000061
in the formula, ωjRepresents the weight of the jth coordinate in the vocabulary, L represents the total number of coordinates included in the image sequence of the target, x represents the vector of the encoding result (i.e., the vector into which the convolution feature map is developed),
Figure BDA0003528543120000062
denotes the j-th predicted coordinate, yjRepresenting the jth coordinate of the object in the image sequence. In the present embodiment, the weights of all coordinates are equal. In other embodiments, the weights may also be set according to the location of the coordinates.
And step S4, performing combined training on the trained twin neural network and the trained target position extraction network to obtain the language modeling type twin network.
Specifically, the entire network is jointly trained in a multitasking manner, wherein the convolution output of the twin neural network is trained in a relay supervision manner, and the label and the loss function thereof are consistent with the step S2.
The loss function for training the language modeling twin network is shown as follows:
loss=λ1loss12loss2 (3)
wherein λ is1、λ2Respectively representing the proportion of the two loss functions in the total loss function.
And step S5, acquiring a real-time image of the target to be tracked, and tracking the target to be tracked in real time by using the language building mode twin network.
Specifically, the real-time tracking of the target to be tracked comprises the following steps:
in step S51, the initialization i is 2.
Step S52, acquiring a boundary frame of the target to be tracked in the i-1 frame image, and extracting a template picture of the target to be tracked in the i-1 frame image; and inputting the template picture of the target to be tracked in the (i-1) th frame image into a template branch of a twin neural network in the language modeling type twin network to obtain the (i-1) th frame template characteristic diagram.
Step S53, taking the position of the target to be tracked in the i-1 th frame image as the center, cutting a picture with the size of 255 × 255 pixels from the i-th frame image as the i-th frame candidate area picture.
And step S54, inputting the candidate region picture of the ith frame into a candidate branch of a twin neural network in the language modeling mode twin network to obtain a candidate feature map of the ith frame.
And step S55, performing convolution operation on the ith frame candidate feature map and the (i-1) th frame template feature map, unfolding the feature map obtained through the convolution operation into vectors, inputting the vectors into a target position extraction network in the language modeling type twin network, and obtaining a target frame tracked by the ith frame. According to the target frame, the center position, the length and the width of the target frame can be obtained.
And step S56, extracting the target picture of the ith frame from the target frame tracked by the ith frame, and inputting the target picture of the ith frame into a template branch of the twin network to obtain the target feature map of the ith frame.
And step S57, acquiring the ith frame template feature map according to the ith-1 frame template feature map and the target feature map of the ith frame.
The template feature map for the ith frame may be obtained by the following expression:
Fi=ωFi-1+(1-ω)fi
in the formula, FiA template feature map representing the ith frame, fiRepresents the predicted target characteristic diagram of the ith frame, and omega epsilon [ 01 ∈]The template feature map of the i-1 th frame is shown inThe proportion of the template feature map of the ith frame.
Step S58, judging whether the ith frame is the last frame image of the target to be tracked according to the real-time image of the target to be tracked, if so, ending the process; if not, let i equal i +1, and repeat steps S53-S57.
The method models the target tracking problem into the language modeling problem based on pixel input, and effectively fuses the image and language sequence method to track the specific target. Compared with the traditional target tracking method based on pure vision, the method does not need to integrate expert experience knowledge into the algorithm, is easier to analyze the tracked target by using a language tool, and has simpler and more convenient realization process. In addition, the invention has strong expansibility, and the discrete language sequence output can be used as a language interface to access a more general intelligent system, so that a follow-up model can analyze a tracking target by means of a language tool more easily.
The above embodiments are merely preferred embodiments of the present invention, which are not intended to limit the scope of the present invention, and various changes may be made in the above embodiments of the present invention. All simple and equivalent changes and modifications made according to the claims and the content of the specification of the present application fall within the scope of the claims of the present patent application. The invention has not been described in detail in the conventional technical content.

Claims (5)

1. A target tracking method based on a language modeling pattern twin network is characterized by comprising the following steps:
step S1, acquiring a video containing continuous movement of a target, and making a training data set according to the video;
step S2, training a twin neural network according to the training data set;
step S3, keeping the parameters of the twin neural network unchanged, and training a target position extraction network;
step S4, performing combined training on the trained twin neural network and the trained target position extraction network to obtain a language modeling type twin network;
and step S5, acquiring a real-time image of the target to be tracked, and tracking the target to be tracked in real time by using the language modeling mode twin network.
2. The target tracking method based on the language modeling twin network as claimed in claim 1, wherein the step S1 of creating the training data set includes:
in step S11, the video containing the continuous motion of the object is framed into an image sequence, and the bounding box of the object in each image is marked.
3. The target tracking method based on the language modeling twin network as claimed in claim 2, wherein the training of the twin neural network in step S2 includes:
step S21, randomly selecting two frames from the image sequence, obtaining a template branch from one frame, inputting the template picture into the twin neural network, and outputting a template feature map by the template branch; acquiring a candidate branch of a candidate region picture input to the twin neural network from another frame, wherein the candidate branch outputs a candidate feature map;
step S22, carrying out convolution operation on the template characteristic diagram and the candidate characteristic diagram, and outputting an encoding result diagram;
step S23: and determining a loss function according to the coding result graph, and training the twin neural network by using the loss function.
4. The target tracking method based on the language modeling twin network as claimed in claim 3, wherein the training of the target position extracting network in step S3 includes:
step S31, expanding the convolution result obtained in the step S22 into a vector form, inputting the vector into a feature dimension compression sub-network of a target position extraction network, and obtaining a compression result vector;
and step S32, inputting the compressed vector to a Transformer decoder of the target position extraction network to obtain the predicted coordinate of the target to be tracked, calculating loss according to the predicted coordinate of the target to be tracked and the actual coordinate of the target to be tracked, and training the target position extraction network by using a gradient back propagation algorithm.
5. The target tracking method based on the language modeling twin network as claimed in claim 1, wherein the step S5 of tracking the target to be tracked in real time comprises:
step S51, initializing i ═ 2;
step S52, acquiring a boundary frame of the target to be tracked in the i-1 frame image, and extracting a template picture of the target to be tracked in the i-1 frame image; inputting a template picture of a target to be tracked in the (i-1) th frame image into a template branch of a twin neural network in a language modeling type twin network to obtain an (i-1) th frame template feature map;
step S53, taking the position of the target to be tracked in the i-1 th frame image as the center, cutting a picture with the size of 255 x 255 pixels from the i-th frame image as an i-th frame candidate area picture;
step S54, inputting the candidate region picture of the ith frame into a candidate branch of a twin neural network in a language modeling mode twin network to obtain a candidate feature map of the ith frame;
step S55, performing convolution operation on the ith frame candidate feature map and the (i-1) th frame template feature map, unfolding the feature map obtained through the convolution operation into vectors, and then sending the vectors into a target position extraction network in a language modeling twin network to obtain a target frame tracked by the ith frame;
step S56, extracting the target picture of the ith frame from the target frame tracked by the ith frame, and acquiring the predicted target position characteristic diagram of the ith frame according to the target picture of the ith frame;
step S57, acquiring a template feature map of the ith frame according to the template feature map of the ith-1 frame and the predicted target feature map of the ith frame;
step S58, judging whether the ith frame is the last frame image of the target to be tracked or not according to the real-time image of the target to be tracked, if so, ending the process; if not, let i equal i +1, and repeat steps S53-S57.
CN202210199168.4A 2022-03-02 2022-03-02 Target tracking method based on language modeling pattern twin network Pending CN114581485A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210199168.4A CN114581485A (en) 2022-03-02 2022-03-02 Target tracking method based on language modeling pattern twin network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210199168.4A CN114581485A (en) 2022-03-02 2022-03-02 Target tracking method based on language modeling pattern twin network

Publications (1)

Publication Number Publication Date
CN114581485A true CN114581485A (en) 2022-06-03

Family

ID=81777281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210199168.4A Pending CN114581485A (en) 2022-03-02 2022-03-02 Target tracking method based on language modeling pattern twin network

Country Status (1)

Country Link
CN (1) CN114581485A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115620150A (en) * 2022-12-05 2023-01-17 海豚乐智科技(成都)有限责任公司 Multi-modal image ground building identification method and device based on twin transform

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115620150A (en) * 2022-12-05 2023-01-17 海豚乐智科技(成都)有限责任公司 Multi-modal image ground building identification method and device based on twin transform
CN115620150B (en) * 2022-12-05 2023-08-04 海豚乐智科技(成都)有限责任公司 Multi-mode image ground building identification method and device based on twin transformers

Similar Documents

Publication Publication Date Title
Ahmed et al. Salient segmentation based object detection and recognition using hybrid genetic transform
Kishore et al. Indian classical dance action identification and classification with convolutional neural networks
JP6625220B2 (en) Method and system for detecting the action of an object in a scene
EP3971772B1 (en) Model training method and apparatus, and terminal and storage medium
Buxton Learning and understanding dynamic scene activity: a review
CN107122736B (en) Human body orientation prediction method and device based on deep learning
CN111310659B (en) Human body action recognition method based on enhanced graph convolution neural network
CN100573548C (en) The method and apparatus of tracking bimanual movements
KR100421740B1 (en) Object activity modeling method
CN110210551A (en) A kind of visual target tracking method based on adaptive main body sensitivity
KR102462934B1 (en) Video analysis system for digital twin technology
US20060036399A1 (en) Adaptive discriminative generative model and application to visual tracking
CN106909938B (en) Visual angle independence behavior identification method based on deep learning network
Naik et al. Deep-violence: individual person violent activity detection in video
CN116343330A (en) Abnormal behavior identification method for infrared-visible light image fusion
CN108537825B (en) Target tracking method based on transfer learning regression network
Kishore et al. Selfie sign language recognition with convolutional neural networks
CN115578770A (en) Small sample facial expression recognition method and system based on self-supervision
CN113963026A (en) Target tracking method and system based on non-local feature fusion and online updating
CN112906520A (en) Gesture coding-based action recognition method and device
CN114581485A (en) Target tracking method based on language modeling pattern twin network
CN112507859B (en) Visual tracking method for mobile robot
CN111531546B (en) Robot pose estimation method, device, equipment and storage medium
CN111652181B (en) Target tracking method and device and electronic equipment
CN117115911A (en) Hypergraph learning action recognition system based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination