CN116665308B - Double interaction space-time feature extraction method - Google Patents

Double interaction space-time feature extraction method Download PDF

Info

Publication number
CN116665308B
CN116665308B CN202310741806.5A CN202310741806A CN116665308B CN 116665308 B CN116665308 B CN 116665308B CN 202310741806 A CN202310741806 A CN 202310741806A CN 116665308 B CN116665308 B CN 116665308B
Authority
CN
China
Prior art keywords
time
space
feature
double interaction
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310741806.5A
Other languages
Chinese (zh)
Other versions
CN116665308A (en
Inventor
王正友
张硕
高新月
韩学丛
庄珊娜
王辉
白晶
朱佩祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shijiazhuang Sanpang Technology Co ltd
Shijiazhuang Tiedao University
Original Assignee
Shijiazhuang Sanpang Technology Co ltd
Shijiazhuang Tiedao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shijiazhuang Sanpang Technology Co ltd, Shijiazhuang Tiedao University filed Critical Shijiazhuang Sanpang Technology Co ltd
Priority to CN202310741806.5A priority Critical patent/CN116665308B/en
Publication of CN116665308A publication Critical patent/CN116665308A/en
Application granted granted Critical
Publication of CN116665308B publication Critical patent/CN116665308B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a double interaction space-time feature extraction method, and relates to the technical field of machine vision. The method comprises the following steps: preprocessing skeleton data of a dataset, and extracting double interaction action categories to obtain action tensors; extracting double interaction space-time characteristics through a space-time diagram convolution network, and capturing global and local information; performing feature fusion processing on the feature tensor through STCP based on three-branch pooling to obtain a fine-granularity double interaction space-time feature tensor; the final feature tensor is used for helping the network convergence through the full connection layer and the Softmax layer so as to output the double interaction category, and the method has the advantage of high recognition precision.

Description

Double interaction space-time feature extraction method
Technical Field
The invention relates to the technical field of machine vision, in particular to a double interaction space-time feature extraction method based on a transducer and multi-scale position sensing.
Background
In the field of Computer Vision (CV) research with human focus, human motion recognition (Human Action Recognition, HAR) task has become an important research topic in the field of Computer Vision due to its wide application in many fields such as man-machine interaction, smart home, autopilot, virtual reality, etc. At present, single person behavior recognition research based on videos is relatively more, and interaction behavior recognition research based on double persons is still in an exploration stage. Compared with the single-person action, the double-person interaction behavior recognition not only needs to deal with the problems of illumination change, scene switching, camera visual angle conversion and the like, but also considers the problems of relative relation change, limb shielding, space-time relation change and the like between two persons in the double-person interaction process. Therefore, double interactive behavior recognition is still a challenging problem in the field of computer vision, and how to effectively extract features and build a reasonable motion recognition model is always the focus of research of related researchers at home and abroad.
The traditional motion recognition mainly comprises a feature extraction part and a classifier, and people manually design features to pertinently extract the features of the pictures. However, with the development of motion recognition, more motion data expression forms are developed from two-dimensional plane diagrams to three-dimensional skeleton data, so that the classification of single motions is further increased, the interactive recognition of double motions and even group motions is further increased, and the scene of motion recognition is also more and more complex. With the development of deep learning, models of neural networks, particularly deep networks, have been widely successful in complex motion recognition. To create a skeleton diagram representing a two-person relationship, liu Xing et al propose to represent a single skeleton and an interactive relationship skeleton, respectively, in a coordinate system using a method of relative views. Pei Xiaomin et al propose to use cameras as the center of coordinates to find euclidean distances of the single and double skeletons themselves and the interaction joints, respectively, to represent double skeleton features. Li Jianan et al propose constructing knowledge-given graphs, knowledge-learning graphs, and natural connection graphs to learn interactions with minimal prior knowledge. Zhu L et al propose constructing a binary relationship interaction graph to generate a relationship adjacency matrix to model double interaction. Yoshiki Ito et al propose that intra-and inter-construct maps be input to a multi-flow network, respectively, to extract interactions. However, none of these methods take into account the influence of long-range joint characteristic information and long-range dependence on recognition accuracy, and the problem that fine local joint information is ignored.
Disclosure of Invention
The technical problem to be solved by the invention is how to provide a double interaction space-time feature extraction method with high recognition accuracy based on a transducer and multi-scale position sensing.
In order to solve the technical problems, the invention adopts the following technical scheme: a double interaction space-time feature extraction method comprises the following steps:
s1: preprocessing skeleton data of a dataset, and extracting double interaction action categories to obtain action tensors;
s2: extracting double interaction space-time characteristics through a space-time diagram convolution network, and capturing global and local information;
s3: performing feature fusion processing on the feature tensor through STCP based on three-branch pooling to obtain a fine-granularity double interaction space-time feature tensor;
s4: and the finally obtained characteristic tensor helps the network to converge through the full connection layer and the Softmax layer so as to output the double interaction category.
The further technical proposal is that: constructing a double interaction space feature extraction module combining a transducer and a light space diagram convolution, and extracting double interaction space features; and constructing a multi-scale position sensing time graph convolution module which has a larger time receptive field and focuses on important joint position information, and extracting the double interaction time characteristics.
The further technical proposal is that: performing feature fusion processing on the feature tensor by using the STCP module based on three-branch pooling; the module comprises three branches of space, time and channels for processing the feature tensor, and a more accurate feature map is obtained by a three-branch parallel method and a method for fusing features by adopting cascading.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in: in the method, in the space-time feature extraction process, the combination of local information and global information is realized, fine important joint details are captured, the accuracy of identifying double interaction actions by a reference model is improved, and the proposed module has high embedding property and can be conveniently embedded into other network models.
Drawings
The invention will be described in further detail with reference to the drawings and the detailed description.
FIG. 1 is a flow chart of a method according to an embodiment of the invention
FIG. 2 is a schematic block diagram of a transducer-based spatial feature extraction module in a method according to an embodiment of the present invention;
FIG. 3 is a schematic block diagram of a multi-scale location-aware temporal feature extraction module according to an embodiment of the present invention;
fig. 4 is a schematic block diagram of an STCP attention module in the method according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
As shown in fig. 1, the embodiment of the invention discloses a double interaction space-time feature extraction method, which comprises the following steps:
s1: preprocessing skeleton data of a dataset, and extracting double interaction action categories so as to obtain action tensors;
s2: extracting double interaction space-time characteristics through a space-time diagram convolution network, and capturing global and local information;
the double interaction space-time feature extraction method comprises the steps of extracting double interaction space features based on a transform space diagram convolution and extracting double interaction time features based on a multi-scale position sensing time diagram convolution through a space-time diagram convolution network, so that deep extraction of the space-time features is realized, and local information and global information are captured.
As shown in fig. 2, the embodiment of the invention further discloses a transducer-based space map convolution module for extracting the spatial features of the bone joint points. Firstly, carrying out 1×1 convolution on an input skeleton diagram, introducing more nonlinear factors, carrying out preliminary double interaction space feature extraction on an input vector by utilizing light space diagram convolution, and then normalizing the feature vector by a batch processing layer Batch Normalization. The process is defined as:
(1)
(2)
in the middle ofFor giving up adjacency matrix>Normalizing (I/O)>And->Representing input-output characteristics,/->Representing a graph distance metric function up to 2,/for>Representing a batch normalization layer;
the encoder is a transducer encoder which enters a transducer immediately, the number of layers of the encoder is defined as 2, the module consists of a multi-head attention mechanism, a feedforward neural network, a normalization layer and residual error connection, the most core of the encoder part is the multi-head attention module, a single sub-attention mechanism can be split into a plurality of subspaces, the sub-attention mechanism is executed on each subspace, so that the characteristics of different layers and different angles can be captured better, global modeling can be carried out on the spatial characteristics, and different weights can be distributed to the characteristic diagram in a self-adaptive mode; the feed-forward neural network sublayer consists of two linear transforms and an activation function, wherein the first linear transform converts the input vector into an intermediate representation vector and the second linear transform converts the intermediate representation vector into a final representation vector; the residual connection and normalization layer is used for accelerating model convergence and improving model expression capacity, the residual connection can enable the model to be trained more easily, gradient disappearance and gradient explosion are avoided, the normalization layer can accelerate model convergence, and meanwhile robustness and generalization capacity of the model are improved. In addition, an additional residual error connection is adopted for the whole module, so that the over fitting of the model is prevented, the network parameter number is reduced, and the time complexity is reduced. The process is defined as:
(3)
(4)
(5)
(6)
wherein:for inputting information +.>For content information->For the information itself +.>The attention moment array is converted into standard state distribution, < >>Normalization is achieved. />The input to each layer of neurons is translated into a mean variance,representing the input vector +.>Representing input vector +.>Representing the final characteristics of the output through the encoder,
as shown in fig. 3, the embodiment of the invention also discloses a convolution module based on the multi-scale position sensing map for extracting the time characteristics of the bone joint points. Firstly, 4 parallel time convolution branches are carried out on the extracted space feature diagram, and each branch starts with convolution of 1 multiplied by 1; then pass through batch layer and ReLU activation function pairNormalizing the feature map; the first two branches are then convolved with 2 3 x 1 times and 2 different syndromes are applied to fuse features between different channels to obtain a multi-scale time-receptive field; while the third branch extracts the most significant feature information in successive frames through a 3 x 1 max pooling layer; the last branch contains a residual connection to maintain the gradient during back propagation; the four branches are subjected to multi-scale feature fusion through product operation; residual connection is added to the outer layer of the multi-scale convolution to help the network to converge rapidly, and the multi-scale convolution is combined with the multi-scale features through weighted summation operation; taking as input a multi-scale temporal feature, using a pooled convolution kernel of two spatial dimensionsOr->Each channel is encoded along a horizontal coordinate and a vertical coordinate, respectively, to generate a pair of feature maps with direction sensing capability. The process is defined as:
(7)
(8)
in the method, in the process of the invention,and->Represents->The height in the individual channels is +.>Width is +.>Output of->Indicate->Characteristic tensor of the channel.
Concatenating and transmitting the generated fusion profile to a shared 1 x 1 convolution transfer functionAmong them. The process is defined as:
(9)
in the middle ofRepresenting concatenation along the spatial dimension,/->Representing a nonlinear activation function +.>An intermediate feature map representing spatial information encoded in the horizontal and vertical directions.
Then along the spatial dimensionDivided into two independent tensors->And->Through two 1X 1 convolution transforms +.>And->Conversion to input tensor with the same number of channels +.>. The process is defined as:
(10)
(11)
in the method, in the process of the invention,representing a sigmoid activation function.
Will outputAnd->Used as attention weight, finally coordinate attention block +.>And outputting. The process is defined as:
(12)
s3: the feature fusion processing is carried out on the feature tensor by the STCP module based on three-branch pooling, so that the joint with the most abundant information in the specific frame is distinguished from the whole time frame sequence, and the fine-granularity double interaction space-time feature tensor is obtained. Firstly, carrying out average pooling operation on input features on a frame level and an articular level respectively, and carrying out local average pooling and local partial pooling on the feature vectors subjected to time dimension pooling to obtain the articulation point data with different importance corresponding to double interaction actions. The process is defined as:
(13)
(14)
(15)
in the middle ofIndicated are pooling operations of the corresponding dimensions.
The space-time dimension feature vectors are then pooled as input through the channel dimension, followed by combining and concatenating the three branch feature vectors together, and compressing the information through the fully concatenated layer. The process is defined as:
(16)
(17)
in the middle ofRepresenting dot product->Indicating the connection operation +_>Representing HardSwish activation function, +.>Representing a trainable parameter;
and then, three independent full-connection layers are utilized to obtain the attention scores of the time frame dimension, the joint dimension and the channel dimension, and finally, the attention scores are multiplied to obtain the local attention map of the space-time channel as the attention score of the whole action sequence.
(18)
In the middle ofRepresenting a sigmoid activation function,/->Representing the Swish activation function.
S4: and the finally obtained characteristic tensor helps the network to converge through the full connection layer and the Softmax layer so as to output the double interaction category.
In order to extract the double interaction space features more effectively, the method adds a transducer into a backbone network, extracts the space feature vectors again through a transducer feature extractor after the primary feature extraction is carried out on the space map convolution, and captures the lost important joint information, so that the space map convolution part of the backbone network can fully retain the detail information, residual connection is added inside, and the model training time is greatly shortened. The model is therefore superior to other networks in terms of performance in the spatial feature extraction part.
In order to increase the receptive field of time dimension and solve the long-time dependence problem, the method introduces multi-scale convolution to obtain multi-scale information, and simultaneously, in order to enhance the sensitivity of a network model to an information channel and improve the position sensing capability, a position sensing attention module is added.
After the method is used for constructing a double interaction space-time feature extraction method based on a transducer and multi-scale position sensing, the importance of different body parts in the whole action sequence, time frames and importance of channels on weighted bone joints in different action stages are considered, an STCP module is designed, the module is divided into a three-branch structure of a time dimension, a space dimension and a channel dimension, pooling operation is respectively carried out in space and time, and in the time dimension, joint point data with different importance corresponding to double interaction actions is obtained through partial segmentation and partial pooling; then carrying out dimension pooling operation on the two; the obtained feature vectors are connected and the attention scores on the space local joints, time and channels are obtained through three full-connection layers; finally, multiplying the three to obtain the local attention map of the space-time channel.
In summary, the method realizes the combination of local information and global information and captures fine important joint details in the space-time feature extraction process, improves the accuracy of identifying double interaction actions by a reference model, has high embedding property, and can be conveniently embedded into other network models.

Claims (1)

1. A double interaction space-time feature extraction method is characterized by comprising the following steps:
s1: preprocessing skeleton data of a dataset, and extracting double interaction action categories to obtain action tensors;
s2: extracting double interaction space-time characteristics through a space-time diagram convolution network, and capturing global and local information;
s3: performing feature fusion processing on the feature tensor through STCP based on three-branch pooling to obtain a fine-granularity double interaction space-time feature tensor;
s4: the finally obtained characteristic tensor is used for helping the network convergence through the full connection layer and the Softmax layer so as to output the double interaction action category;
the extracting the double interaction space-time characteristics through the space-time diagram convolution network comprises the following steps:
double interaction space features are convolutionally extracted based on a transducer space diagram, double interaction time features are convolutionally extracted based on a multi-scale position sensing time diagram, deep extraction of space-time features is achieved, and local information and global information are captured;
the method for extracting the double interaction space features comprises the following steps:
firstly, carrying out 1×1 convolution on an input skeleton diagram, introducing more nonlinear factors, carrying out preliminary double interaction space feature extraction on an input vector by utilizing light space diagram convolution, and then normalizing the feature vector by a batch processing layer Batch Normalization, wherein the process is defined as follows:
F out =BN(f out ) (2)
middle lambda d For adjacency matrix A d Normalization, f in And f out Representing input and output characteristics, d represents a graph distance measurement function, and BN represents a batch normalization layer up to 2;
an encoder fransformaerencoder immediately after entering the fransformater, the encoder layer number defined as 2, wherein the encoder includes a multi-headed attention mechanism, a feed forward neural network, a normalization layer, and a residual;
an additional residual connection is adopted for the whole encoder to prevent the model from being overfitted, reduce the network parameter quantity and reduce the time complexity, and the process is defined as:
X Add =LayerNorm(X+MultiHeadAttention(X)) (4)
FFN(Z)=max(0,ZW 1 +b 1 )W 2 +b 2 (5)
Y=add(f in ,f tran ) (6)
wherein: q is input information, K is content information, V is information itself,converting the attention moment array into standard state distribution, and realizing normalization by softmax; layerNorm converts the input of each layer of neurons into a mean variance, Z represents the input vector, f in Representing an input vector f tran Representing the double interaction space-time characteristics output by the encoder;
the extraction method of the double interaction time characteristics comprises the following steps:
firstly, 4 parallel time convolution branches are carried out on the extracted space feature diagram, and each branch starts with convolution of 1 multiplied by 1; then normalizing the feature map through a batch layer and a ReLU activation function; the first two branches are then convolved with 2 3 x 1 times and 2 different syndromes are applied to fuse features between different channels to obtain a multi-scale time-receptive field; while the third branch extracts the most significant feature information in successive frames through a 3 x 1 max pooling layer; the last branch contains a residual connection to maintain the gradient during back propagation; the four branches are subjected to multi-scale feature fusion through product operation; residual connection is added to the outer layer of the multi-scale convolution to help the network to converge rapidly, and the multi-scale convolution is combined with the multi-scale features through weighted summation operation; taking a multi-scale time feature as an input, encoding each channel along a horizontal coordinate and a vertical coordinate respectively by using a pooled convolution kernel (H, 1) or (1, W) with two spatial dimensions to generate a pair of feature maps with direction perception capability, wherein the process is defined as:
in the method, in the process of the invention,and->Represents the output of the c-th channel with height h and width w, x c Representing a characteristic tensor of the c-th channel;
connecting and transmitting the generated fusion feature diagram to a shared 1×1 convolution transfer function F 1 Among these, the process is defined as:
in the formula [ · ], ·]Representing concatenation along the spatial dimension, delta represents a nonlinear activation function,an intermediate feature map representing encoded spatial information in the horizontal direction and the vertical direction;
then divide f into two independent tensors along the spatial dimensionAnd->Through two 1X 1 convolution transforms F h And F w Transformed into an input tensor X with the same number of channels, the process is defined as:
g h =σ(F h (f h )) (10)
g w =σ(F w (f w )) (11)
wherein sigma represents a sigmoid activation function;
g to be output h And g w As the attention weight, the coordinate attention block Y is finally output, the process being defined as:
in the step S3:
firstly, carrying out average pooling operation on input features on a frame level and an articular level respectively, and carrying out local average pooling and local partial pooling on feature vectors subjected to time dimension pooling to obtain articular point data with different importance corresponding to double interaction actions, wherein the process is defined as follows:
f t =pool t (f in ) (13)
f v =pool v (f in ) (14)
f p =pool p (f t ) (15)
wherein pool represents pooling operation of corresponding dimension;
then, the space-time dimension feature vectors are used as input and are subjected to channel dimension pooling, and then three branch feature vectors are combined and connected together, and information is compressed through a full connection layer; the process is defined as:
f c =pool p (f t )⊙pool v (f in ) (16)
in the formula, +.,representing the operation of the connection, θ represents the HardSwish activation function, WRepresenting a trainable parameter;
the method comprises the steps of obtaining attention scores of a time frame dimension, a joint dimension and a channel dimension by utilizing three independent full-connection layers, and finally multiplying the three to obtain a space-time channel local attention map as the attention score of the whole action sequence;
where σ represents the sigmoid activation function and φ represents the Swish activation function.
CN202310741806.5A 2023-06-21 2023-06-21 Double interaction space-time feature extraction method Active CN116665308B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310741806.5A CN116665308B (en) 2023-06-21 2023-06-21 Double interaction space-time feature extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310741806.5A CN116665308B (en) 2023-06-21 2023-06-21 Double interaction space-time feature extraction method

Publications (2)

Publication Number Publication Date
CN116665308A CN116665308A (en) 2023-08-29
CN116665308B true CN116665308B (en) 2024-01-23

Family

ID=87727903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310741806.5A Active CN116665308B (en) 2023-06-21 2023-06-21 Double interaction space-time feature extraction method

Country Status (1)

Country Link
CN (1) CN116665308B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104011723A (en) * 2011-12-15 2014-08-27 美光科技公司 Boolean logic in a state machine lattice
CN111680606A (en) * 2020-06-03 2020-09-18 淮河水利委员会水文局(信息中心) Low-power-consumption water level remote measuring system based on artificial intelligence cloud identification water gauge
CN111950540A (en) * 2020-07-24 2020-11-17 浙江师范大学 Knowledge point extraction method, system, device and medium based on deep learning
CN112560712A (en) * 2020-12-18 2021-03-26 西安电子科技大学 Behavior identification method, device and medium based on time-enhanced graph convolutional network
CN112906545A (en) * 2021-02-07 2021-06-04 广东省科学院智能制造研究所 Real-time action recognition method and system for multi-person scene
CN113657349A (en) * 2021-09-01 2021-11-16 重庆邮电大学 Human body behavior identification method based on multi-scale space-time graph convolutional neural network
CN114694174A (en) * 2022-03-02 2022-07-01 北京邮电大学 Human body interaction behavior identification method based on space-time diagram convolution
CN114882421A (en) * 2022-06-01 2022-08-09 江南大学 Method for recognizing skeleton behavior based on space-time feature enhancement graph convolutional network
WO2023024438A1 (en) * 2021-08-24 2023-03-02 上海商汤智能科技有限公司 Behavior recognition method and apparatus, electronic device, and storage medium
CN115841697A (en) * 2022-09-19 2023-03-24 上海大学 Motion recognition method based on skeleton and image data fusion

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104011723A (en) * 2011-12-15 2014-08-27 美光科技公司 Boolean logic in a state machine lattice
CN111680606A (en) * 2020-06-03 2020-09-18 淮河水利委员会水文局(信息中心) Low-power-consumption water level remote measuring system based on artificial intelligence cloud identification water gauge
CN111950540A (en) * 2020-07-24 2020-11-17 浙江师范大学 Knowledge point extraction method, system, device and medium based on deep learning
CN112560712A (en) * 2020-12-18 2021-03-26 西安电子科技大学 Behavior identification method, device and medium based on time-enhanced graph convolutional network
CN112906545A (en) * 2021-02-07 2021-06-04 广东省科学院智能制造研究所 Real-time action recognition method and system for multi-person scene
WO2023024438A1 (en) * 2021-08-24 2023-03-02 上海商汤智能科技有限公司 Behavior recognition method and apparatus, electronic device, and storage medium
CN113657349A (en) * 2021-09-01 2021-11-16 重庆邮电大学 Human body behavior identification method based on multi-scale space-time graph convolutional neural network
CN114694174A (en) * 2022-03-02 2022-07-01 北京邮电大学 Human body interaction behavior identification method based on space-time diagram convolution
CN114882421A (en) * 2022-06-01 2022-08-09 江南大学 Method for recognizing skeleton behavior based on space-time feature enhancement graph convolutional network
CN115841697A (en) * 2022-09-19 2023-03-24 上海大学 Motion recognition method based on skeleton and image data fusion

Also Published As

Publication number Publication date
CN116665308A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN108520535B (en) Object classification method based on depth recovery information
Liu et al. Two-stream 3d convolutional neural network for skeleton-based action recognition
CN108596039B (en) Bimodal emotion recognition method and system based on 3D convolutional neural network
CN108154194B (en) Method for extracting high-dimensional features by using tensor-based convolutional network
CN109948475B (en) Human body action recognition method based on skeleton features and deep learning
CN110414432A (en) Training method, object identifying method and the corresponding device of Object identifying model
CN112801015B (en) Multi-mode face recognition method based on attention mechanism
CN110728183A (en) Human body action recognition method based on attention mechanism neural network
CN114596520A (en) First visual angle video action identification method and device
CN112329525A (en) Gesture recognition method and device based on space-time diagram convolutional neural network
CN111695523B (en) Double-flow convolutional neural network action recognition method based on skeleton space-time and dynamic information
CN112036260A (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN112906520A (en) Gesture coding-based action recognition method and device
CN113743544A (en) Cross-modal neural network construction method, pedestrian retrieval method and system
CN113221663A (en) Real-time sign language intelligent identification method, device and system
CN115719510A (en) Group behavior recognition method based on multi-mode fusion and implicit interactive relation learning
CN111401116B (en) Bimodal emotion recognition method based on enhanced convolution and space-time LSTM network
Sun et al. 3-D Facial Feature Reconstruction and Learning Network for Facial Expression Recognition in the Wild
CN114333002A (en) Micro-expression recognition method based on deep learning of image and three-dimensional reconstruction of human face
Hsieh et al. Online human action recognition using deep learning for indoor smart mobile robots
CN113850182A (en) Action identification method based on DAMR-3 DNet
CN117333908A (en) Cross-modal pedestrian re-recognition method based on attitude feature alignment
CN116665308B (en) Double interaction space-time feature extraction method
CN110782503B (en) Face image synthesis method and device based on two-branch depth correlation network
CN116863241A (en) End-to-end semantic aerial view generation method, model and equipment based on computer vision under road scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant