CN114240999A - Motion prediction method based on enhanced graph attention and time convolution network - Google Patents

Motion prediction method based on enhanced graph attention and time convolution network Download PDF

Info

Publication number
CN114240999A
CN114240999A CN202111373469.6A CN202111373469A CN114240999A CN 114240999 A CN114240999 A CN 114240999A CN 202111373469 A CN202111373469 A CN 202111373469A CN 114240999 A CN114240999 A CN 114240999A
Authority
CN
China
Prior art keywords
attention
feature
graph
module
graph attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111373469.6A
Other languages
Chinese (zh)
Inventor
刘盛
张少波
高飞
陈胜勇
柯正昊
柯程远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202111373469.6A priority Critical patent/CN114240999A/en
Publication of CN114240999A publication Critical patent/CN114240999A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a motion prediction method based on an enhanced graph attention and time convolution network, which estimates the future motion attitude of a human body by aggregating spatio-temporal information, constructs an enhanced graph attention module and a reconstructed TCN module, generates a channel attention diagram by using the channel relation of input features, and extracts local symmetry, local connection and global semantic information by respectively using a local graph attention convolution network and a global graph attention convolution network based on the channel attention diagram. The reconstructed TCN can effectively capture complex, highly dynamic time information. And finally, performing channel compression and dimension combination processing to obtain a post-processing result, performing cutting processing on the original time sequence human skeleton data to obtain a residual error, and performing element addition on the post-processing result and the residual error to obtain a final prediction result. The invention can effectively reduce the discontinuity of the posture and the accumulation of errors in the human motion prediction process.

Description

Motion prediction method based on enhanced graph attention and time convolution network
Technical Field
The application belongs to the technical field of motion prediction, and particularly relates to a motion prediction method based on an enhanced graph attention and time convolution network.
Background
The human motion prediction aims at predicting future dynamic motion changes according to historical human skeleton postures, and the development of the technology is very beneficial to a plurality of applications such as human-machine interaction, autonomous driving, public safety, medical care, motion monitoring and the like. Perception and prediction of human motion plays an indispensable role for interactive robots, and also leads to a trend of future robot research. However, in human motion prediction, the discontinuity and error accumulation of the predicted posture can greatly influence the practical application progress thereof.
Discontinuities and error accumulation in the predicted pose are typically caused by insufficient characterization capabilities of the model in the spatial and temporal dimensions, respectively. In order to achieve high accuracy of human motion prediction, there have been many excellent prior works for encoding spatiotemporal information of human skeletal sequences. Mathematical models of human bones are typically constructed based on the human body's primary joints, each of which is an independent point to be observed. Meanwhile, the joint points are mutually connected. The convolutional neural network has good spatial structure perception capability on two-dimensional regular data, is usually used for image recognition and segmentation, but cannot achieve good effect when facing topological irregular data such as human bones, and the like, and the Graph Convolutional Network (GCN) can well construct and represent an irregular data structure.
Various GCN-based algorithms are widely applied to the fields of pose estimation, motion prediction and the like, but the validity of a model in sequence data processing cannot be guaranteed only by spatial information. The Recurrent Neural Network (RNN) has strong processing capacity on sequence data, is designed in the NLP field at first and then widely applied to the fields of motion recognition and motion prediction based on videos, and the like, but the final prediction precision of the RNN and the subsequent LSTM and GRU variants is seriously influenced by the lack of spatial information. Discrete Cosine Transform (DCT) was also introduced for characterization of the time dimension features, but many experimental applications show that increasing the observable frame number of DCT does not significantly improve the final prediction result, which is clearly contrary to common knowledge.
Disclosure of Invention
The application provides a motion prediction method based on an enhanced graph attention and time convolution network, which is used for reducing the problems of discontinuity of postures and accumulation of errors in the human motion prediction process.
In order to achieve the purpose, the technical scheme of the application is as follows:
a motion prediction method based on an enhanced graph attention and time convolution network comprises the following steps:
expanding the input original time sequence human body skeleton data into data with preset dimensionality through linear transformation, and completing data initialization through two-dimensional normalization, channel expansion, two-dimensional normalization and a Relu function in sequence;
inputting initialized data into a first enhanced graph attention module, outputting a first graph attention feature, inputting the first graph attention feature into a first reconstruction TCN module to obtain a first time sequence feature, performing element addition on the first graph attention feature and the first time sequence feature after performing cutting operation on the first graph attention feature, and outputting a first fusion feature;
inputting the first fusion feature into a second enhanced graph attention module, outputting a second graph attention feature, inputting the second graph attention feature into a second reconstruction TCN module to obtain a second time sequence feature, performing element addition on the second graph attention feature and the second time sequence feature after performing cutting operation on the second graph attention feature, and outputting a second fusion feature;
inputting the second fusion feature into a third enhanced graph attention module, and outputting a third graph attention feature;
and performing channel compression and dimension combination processing on the attention characteristics of the third graph to obtain a post-processing result, performing cutting processing on the original time sequence human skeleton data to obtain a residual error, and performing element addition on the post-processing result and the residual error to obtain a final prediction result.
Further, the enhanced graph attention module performs the following operations:
inputting the initialized data into a channel attention module to generate a channel attention diagram;
the channel attention maps are respectively input into a local attention module and a global map attention module, and then aggregated with input data to generate map attention features.
Further, the channel attention module performs the following operations:
extracting spatial and temporal features by using average pooling and maximum pooling simultaneously, and aggregating the results of the two using a MLP layer with shared weights to form a final channel attention diagram, which is expressed as follows:
Mc(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))
sigma represents a Sigmod activation function, MLP (AvgPool (F)) represents that MLP operation is performed after input features F are subjected to average pooling operation, MLP (MaxPool (F)) represents that MLP operation is performed after input features F are subjected to maximum pooling operation, and M represents thatc(F) A channel attention map is shown.
Further, the operation of the local attention module is represented as:
Figure BDA0003363189990000031
where σ denotes a Sigmod activation function, W is a learnable transformation matrix used to transform input channels into output channels, M is a learnable mask matrix,
Figure BDA0003363189990000032
is a graph convolution kernel, where A is a first-order adjacency matrix of human skeleton nodes, and I is a self-connection matrix of nodes,
Figure BDA0003363189990000033
representing the multiplication of matrix elements one by one, and Y1 is the output of the local graph attention module;
the operation of the global graph attention module is represented as:
Figure BDA0003363189990000034
k is the number of heads of the multi-head attention system, BkIs an adaptive global adjacency matrix, CkIs a learnable global adjacency matrix, WkIs a conversion matrix of input and output channels that can be learned, and Y2 is the output of the global graph attention module.
Further, the restructuring TCN module performs the following operations:
and sequentially performing density convolution, BatchNorm2D, ReLU, two-dimensional convolution, BatchNorm2D, ReLU activation function and Dropout function operation, and outputting the timing characteristics.
The invention provides a motion prediction method based on an enhanced graph attention and time convolution network, which is characterized in that an enhanced graph attention module and a reconstructed TCN module are constructed and combined into a human motion prediction method based on the enhanced graph attention and time convolution network.
Drawings
FIG. 1 is a flow chart of a human motion prediction method based on an enhanced graph attention and time convolution network;
FIG. 2 is an exemplary diagram of an overall network based on an enhanced graph attention and time convolution network;
FIG. 3 is a diagram of an enhanced graph attention module network.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, a motion prediction method based on an enhanced graph attention and time convolution network is proposed, which includes:
and step S1, expanding the input original time sequence human body skeleton data into data with preset dimensionality through linear transformation, and completing data initialization through two-dimensional normalization, channel expansion, two-dimensional normalization and Relu function in sequence.
The human skeleton sequence data input into the network is preprocessed, such as the input data (b,66,10) in fig. 2, b represents the batch size of model training as b,66 represents the size of skeleton data per frame as 66, and 10 represents that the whole sequence is composed of 10 frames in the time dimension. The data are expanded into data with preset dimensionality through linear transformation, namely, the time dimensionality 10 is mapped and expanded into 64 through a full-connection network, each frame of skeleton data 66 is divided into two dimensionalities 3 and 22, 3 represents an xyz three channel, 22 represents a total of 22 skeleton nodes, and the finally obtained data are in a format (b,3,64 and 22) so as to meet the requirement of subsequent separate calculation of the channels and the nodes. And then sequentially carrying out two-dimensional normalization (BatchNorm2D), channel expansion (3, (3,1),256), two-dimensional normalization (BatchNorm2D) and Relu function (ReLU) on the data to finish data preprocessing.
According to the method, the dimensionality of the split skeleton node is two dimensionalities, the time sequence dimensionality is expanded from 10 to 64, and more operable space is provided for subsequent time sequence feature extraction.
Step S2, inputting the initialized data into the first enhanced graph attention module, outputting a first graph attention feature, inputting the first graph attention feature into the first reconstructed TCN module to obtain a first timing feature, performing a cutting operation on the first graph attention feature, performing element addition on the first timing feature, and outputting a first fusion feature.
As shown in fig. 2, the first enhanced graph attention module (AGA Block1) processes the initialized data and outputs a first graph attention feature. The following operations are performed:
inputting the initialized data into a channel attention module to generate a channel attention diagram;
the channel attention maps are respectively input into a local attention module and a global map attention module, and then aggregated with input data to generate map attention features.
Specifically, as shown in fig. 3, the first enhanced graph attention module inputs initialized data to the channel attention module, and the channel attention module extracts spatial and temporal features by using Average pooling (Average Pool) and maximum pooling (Max Pool) operations at the same time, and aggregates the results of the spatial and temporal features by using an MLP layer with shared weights to form a final channel attention graph.
The output of Average pooling (Average Pool) and Max pooling (Max Pool) is processed by MLP layer, and data fusion is completed by element addition (i ≧ matrix element one-to-one addition). And forming a channel attention diagram through a Sigmoid activation function.
The MLP layer is formed by connecting one-dimensional convolutions (256,1,256), ReLU and one-dimensional convolutions (256,1,256) in series.
The above process is expressed by the following formula:
Mc(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))
where σ denotes a Sigmod activation function. MLP (AvgPool (F)) represents that MLP operation is performed after average pooling operation is performed on the input features F, MLP (Maxpool (F)) represents that MLP operation is performed after maximum pooling operation is performed on the input features F, and Mc(F) A channel attention map is shown.
Then, the channel attention maps are input into the local map attention module and the global map attention module, respectively. As shown in fig. 3, the partial view attention module includes a first branch and a second branch, the first branch and the second branch respectively including: the first-order adjacency matrix GCN Connection, the two-dimensional normalized BatchNorm2D and the ReLU activation function, the outputs of the first branch and the second branch are multiplied by elements and then input to the two-dimensional convolution (512, (1,1), 256), the two-dimensional normalized BatchNorm2D, the ReLU and the Dropout function.
The partial map attention module may be expressed as:
Figure BDA0003363189990000051
where σ denotes a Sigmod activation function, X denotes input data, W is a learnable transformation matrix used to transform input channels into output channels, M is a learnable mask matrix,
Figure BDA0003363189990000052
is a graph convolution kernel, where A is a first-order adjacency matrix (GCN Connection) of the human skeleton nodes, and I is a self-Connection matrix (GCN Symmetry) of the nodes,
Figure BDA0003363189990000053
representing the matrix elements multiplied by one, Y1 is the output of the local map attention module.
As shown in FIG. 3, the Global Graph Attention module includes Global Graph Attention, two-dimensional convolution (256 (1,1), 256), two-dimensional normalized BatchNorm2D, a ReLU activation function, and a Dropout function.
The global graph attention module may be represented as:
Figure BDA0003363189990000054
k is the number of heads of the multi-head attention system, BkIs an adaptive global adjacency matrix, CkIs a learnable global adjacency matrix, WkIs a conversion matrix of input and output channels that can be learned, and Y2 is the output of the global graph attention module. K is 1 to K.
And finally, adding the outputs of the local graph attention module and the global graph attention module to the input data elements of the first enhanced graph attention module to form a final enhanced graph attention feature (first graph attention feature).
And then, inputting the first graph attention feature into a first reconstruction TCN module to obtain a first time sequence feature, performing element addition on the first graph attention feature and the first time sequence feature after performing cutting operation on the first graph attention feature, and outputting a first fusion feature.
The first reconstructed TCN module replaces the expansion convolution with density convolution (two-dimensional convolution (256 (7,1) and 256)) on the basis of the original TCN, namely, a convolution kernel has no holes, so that the reconstruction TCN module has better characterization capability on sequence skeleton data. As shown in fig. 2, the reconstructed TCN module sequentially goes through density convolution (256, (7,1), 256), BatchNorm2D, ReLU, two-dimensional convolution (256, (1,1), 256), BatchNorm2D, ReLU activation function, Dropout function operations, and outputs a timing characteristic. And simultaneously cutting (b,256,56,22) from the end of the first graph attention feature (b,256,62,22) by using a cutting (Slice) operation, and adding the residual error and the time sequence feature elements one by one to form a final module output result, wherein the final module output result represents the matrix element one by one addition.
And step S3, inputting the first fusion feature into a second enhanced graph attention module, outputting a second graph attention feature, inputting the second graph attention feature into a second reconstruction TCN module to obtain a second time sequence feature, performing element addition on the second graph attention feature after cutting operation, and outputting a second fusion feature.
The specific operation of this step is the same as the previous step, and is not described herein again.
And step S4, inputting the second fusion feature into a third enhancement map attention module, and outputting a third map attention feature.
This step continues to enhance the attention of the drawing, and the specific operation of the third enhanced drawing attention module is the same as that of the first enhanced drawing attention module, and is not described herein again.
And S5, performing channel contraction and dimension combination processing on the third graph attention feature to obtain a post-processing result, performing cutting processing on the original time sequence human skeleton data to obtain a residual error, and performing element addition on the post-processing result and the residual error to obtain a final prediction result.
In the step, post-processing is carried out on the attention characteristics of the third graph, and a predicted human skeleton sequence is output. As shown in fig. 2, the third attention feature is subjected to a two-dimensional convolution (256, (1,1), 3), the channel is shrunk from 256 to xyz three channels of the original data to obtain a result (b,3,20,22), and then xyz (second dimension) is merged with node (fourth dimension) (Linear Projection) to obtain a post-processing result. And cutting (b,66,1) from the end of the original input data by using Slice operation, adding the residual error and the post-processing result one by one to obtain a final prediction result (b,66,22), and adding behaviorindicates that matrix elements are added one by one.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (5)

1. An enhancement map attention and time convolution network-based motion prediction method, comprising:
expanding the input original time sequence human body skeleton data into data with preset dimensionality through linear transformation, and completing data initialization through two-dimensional normalization, channel expansion, two-dimensional normalization and a Relu function in sequence;
inputting initialized data into a first enhanced graph attention module, outputting a first graph attention feature, inputting the first graph attention feature into a first reconstruction TCN module to obtain a first time sequence feature, performing element addition on the first graph attention feature and the first time sequence feature after performing cutting operation on the first graph attention feature, and outputting a first fusion feature;
inputting the first fusion feature into a second enhanced graph attention module, outputting a second graph attention feature, inputting the second graph attention feature into a second reconstruction TCN module to obtain a second time sequence feature, performing element addition on the second graph attention feature and the second time sequence feature after performing cutting operation on the second graph attention feature, and outputting a second fusion feature;
inputting the second fusion feature into a third enhanced graph attention module, and outputting a third graph attention feature;
and performing channel compression and dimension combination processing on the attention characteristics of the third graph to obtain a post-processing result, performing cutting processing on the original time sequence human skeleton data to obtain a residual error, and performing element addition on the post-processing result and the residual error to obtain a final prediction result.
2. The motion prediction method based on the enhancement map attention and time convolution network of claim 1, wherein the enhancement map attention module performs the following operations:
inputting the initialized data into a channel attention module to generate a channel attention diagram;
the channel attention maps are respectively input into a local attention module and a global map attention module, and then aggregated with input data to generate map attention features.
3. The motion prediction method based on the enhancement map attention and time convolution network of claim 2, wherein the channel attention module performs the following operations:
extracting spatial and temporal features by using average pooling and maximum pooling simultaneously, and aggregating the results of the two using a MLP layer with shared weights to form a final channel attention diagram, which is expressed as follows:
Mc(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))
sigma represents a Sigmod activation function, MLP (AvgPool (F)) represents that MLP operation is performed after input features F are subjected to average pooling operation, MLP (MaxPool (F)) represents that MLP operation is performed after input features F are subjected to maximum pooling operation, and M represents thatc(F) A channel attention map is shown.
4. The motion prediction method based on the enhancement map attention and time convolution network of claim 2, wherein the operation of the local attention module is represented as:
Figure FDA0003363189980000021
where σ denotes a Sigmod activation function, W is a learnable transformation matrix used to transform input channels into output channels, M is a learnable mask matrix,
Figure FDA0003363189980000022
is a graph convolution kernel, where A is a first-order adjacency matrix of human skeleton nodes, and I is a self-connection matrix of nodes,
Figure FDA0003363189980000023
representing the multiplication of matrix elements one by one, and Y1 is the output of the local graph attention module;
the operation of the global graph attention module is represented as:
Figure FDA0003363189980000024
k is the number of heads of the multi-head attention system, BkIs an adaptive global adjacency matrix, CkIs a learnable global adjacency matrix, WkIs a conversion matrix of input and output channels that can be learned, and Y2 is the output of the global graph attention module.
5. The motion prediction method based on the enhanced graph attention and time convolution network of claim 1, wherein the reconstructing TCN module performs the following operations:
and sequentially performing density convolution, BatchNorm2D, ReLU, two-dimensional convolution, BatchNorm2D, ReLU activation function and Dropout function operation, and outputting the timing characteristics.
CN202111373469.6A 2021-11-19 2021-11-19 Motion prediction method based on enhanced graph attention and time convolution network Pending CN114240999A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111373469.6A CN114240999A (en) 2021-11-19 2021-11-19 Motion prediction method based on enhanced graph attention and time convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111373469.6A CN114240999A (en) 2021-11-19 2021-11-19 Motion prediction method based on enhanced graph attention and time convolution network

Publications (1)

Publication Number Publication Date
CN114240999A true CN114240999A (en) 2022-03-25

Family

ID=80750063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111373469.6A Pending CN114240999A (en) 2021-11-19 2021-11-19 Motion prediction method based on enhanced graph attention and time convolution network

Country Status (1)

Country Link
CN (1) CN114240999A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117475518A (en) * 2023-12-27 2024-01-30 华东交通大学 Synchronous human motion recognition and prediction method and system
CN118427562A (en) * 2024-07-04 2024-08-02 国网山东省电力公司信息通信公司 Multi-device federal dynamic graph-oriented multi-element time sequence prediction method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117475518A (en) * 2023-12-27 2024-01-30 华东交通大学 Synchronous human motion recognition and prediction method and system
CN117475518B (en) * 2023-12-27 2024-03-22 华东交通大学 Synchronous human motion recognition and prediction method and system
CN118427562A (en) * 2024-07-04 2024-08-02 国网山东省电力公司信息通信公司 Multi-device federal dynamic graph-oriented multi-element time sequence prediction method and system

Similar Documents

Publication Publication Date Title
CN111310707B (en) Bone-based graph annotation meaning network action recognition method and system
CN110427877B (en) Human body three-dimensional posture estimation method based on structural information
CN111047548B (en) Attitude transformation data processing method and device, computer equipment and storage medium
CN107492121B (en) Two-dimensional human body bone point positioning method of monocular depth video
CN114882421B (en) Skeleton behavior recognition method based on space-time characteristic enhancement graph convolution network
Guo et al. JointPruning: Pruning networks along multiple dimensions for efficient point cloud processing
CN110544297A (en) Three-dimensional model reconstruction method for single image
CN115482241A (en) Cross-modal double-branch complementary fusion image segmentation method and device
KR20230104737A (en) Multi-Resolution Attention Network for Video Behavior Recognition
CN111814719A (en) Skeleton behavior identification method based on 3D space-time diagram convolution
CN109598732B (en) Medical image segmentation method based on three-dimensional space weighting
CN114240999A (en) Motion prediction method based on enhanced graph attention and time convolution network
CN112329525A (en) Gesture recognition method and device based on space-time diagram convolutional neural network
CN114283495B (en) Human body posture estimation method based on binarization neural network
CN111178142A (en) Hand posture estimation method based on space-time context learning
CN114708665A (en) Skeleton map human behavior identification method and system based on multi-stream fusion
CN112906853A (en) Method, device, equipment and storage medium for automatic model optimization
CN111210382A (en) Image processing method, image processing device, computer equipment and storage medium
CN115546888A (en) Symmetric semantic graph convolution attitude estimation method based on body part grouping
CN116704596A (en) Human behavior recognition method based on skeleton sequence
CN112712019A (en) Three-dimensional human body posture estimation method based on graph convolution network
CN111539288A (en) Real-time detection method for gestures of both hands
Gao et al. Edge Devices Friendly Self-Supervised Monocular Depth Estimation Via Knowledge Distillation
CN116452750A (en) Object three-dimensional reconstruction method based on mobile terminal
Zhang et al. Mask encoding: A general instance mask representation for object segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination