CN117373116A

CN117373116A - Human body action detection method based on lightweight characteristic reservation of graph neural network

Info

Publication number: CN117373116A
Application number: CN202311302738.9A
Authority: CN
Inventors: 卫星; 蒋文豪; 翟琰; 周浩伟; 钟浩然; 刘敏睿; 夏炅; 杨帆; 赵冲; 陆阳; 毕翔
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2023-10-10
Filing date: 2023-10-10
Publication date: 2024-01-09

Abstract

The invention discloses a human body action detection method based on lightweight characteristic reservation of a graph neural network, which belongs to the technical fields of computer vision and artificial intelligence, and comprises the following steps: acquiring a human behavior video data set, and processing the human behavior video data set to acquire a skeleton diagram data set; constructing a backbone space-time diagram convolutional neural network, inputting the skeleton diagram data set into the backbone space-time diagram convolutional neural network for training and then optimizing to obtain a lightweight motion recognition model; constructing a cyclic generation network, inputting the skeleton diagram data set into the cyclic generation network, training and optimizing to obtain an action recognition model of important characteristics; and fusing the lightweight motion recognition model and the motion recognition model of the important features to obtain a human motion detection model, and detecting human motion based on the human motion detection model. Compared with the current mainstream motion detection method, the model provided by the invention has better performance.

Description

Human body action detection method based on lightweight characteristic reservation of graph neural network

Technical Field

The invention belongs to the field of computer vision and artificial intelligence, and particularly relates to a human body action detection method based on lightweight characteristic preservation of a graph neural network.

Background

In recent years, human motion gesture recognition has become a hot problem of research, and is a very challenging task in the field of computer vision at present. Human motion recognition mainly refers to recognizing the motion and motion category of human individuals from visual information such as the posture and the motion of human bodies through computer vision and machine learning algorithms, and classifying or describing the motion and the motion category. Human motion recognition technology can be applied to many fields such as intelligent monitoring, video content understanding, man-machine interaction, virtual reality and the like.

Human behavior is very complex and diverse, with extremely high dimensionality and variability. The behavior of humans in everyday life encompasses many aspects, with many varying details, and various background noise and interference may occur in different environments and situations. Early studies focused on extracting hand-made spatial and temporal features from skeleton sequences for human motion recognition, hand-made based methods can be broadly divided into joint-based and body-part-based methods, depending on the feature extraction technique used. These methods merely represent skeletal data as a sequence of vectors processed by the RNN, and do not fully mimic the complex spatiotemporal configuration and correlation of body joints.

Most of the methods only use the characteristic information of the self skeleton movement, but the problems of sensitivity to ambient light, easy error in the process of identifying the actual actions and local information loss in the process of extracting the information exist. Meanwhile, the existing network model based on GCN has large parameter quantity and calculation quantity, and cannot be deployed in embedded equipment and edge computing equipment.

Disclosure of Invention

The invention aims to provide a human body action detection method based on lightweight characteristic preservation of a graph neural network, so as to solve the problems in the prior art.

In order to achieve the above object, the present invention provides a human motion detection method based on lightweight class feature preservation of a graph neural network, comprising:

acquiring a human behavior video data set, and processing the human behavior video data set to acquire a skeleton diagram data set;

constructing a backbone space-time diagram convolutional neural network, inputting the skeleton diagram data set into the backbone space-time diagram convolutional neural network for training and then optimizing to obtain a lightweight motion recognition model;

constructing a cyclic generation network, inputting the skeleton diagram data set into the cyclic generation network for training and then optimizing to obtain an action recognition model of important characteristics;

and fusing the lightweight motion recognition model and the motion recognition model of the important features to obtain a human motion detection model, and detecting human motion based on the human motion detection model.

Preferably, the process of obtaining the skeleton map dataset includes:

processing the human behavior data set into a serialized picture set;

carrying out gesture estimation on the serialized picture set to obtain a human body joint point set and a joint point edge set;

and constructing a non-directional space-time diagram of the human body joint point set and the joint point edge set to obtain the skeleton diagram data set.

Preferably, the process of obtaining the lightweight motion recognition model includes:

constructing the backbone space-time diagram convolutional neural network, and inputting the skeleton diagram data set to a time domain and a space domain of the backbone space-time diagram convolutional neural network for training;

cutting and compressing the trained trunk space-time diagram convolutional neural network based on a singular value decomposition method to obtain a lightweight model;

and performing cyclic training on the lightweight model to obtain the lightweight motion recognition model.

Preferably, the expression of the total objective function for clipping and compressing the trained main space-time diagram convolutional neural network based on the singular value decomposition method is as follows:

wherein,is to decompose training loss of network hierarchy, +.>Is the sparsity loss function of the vector s,representing an orthogonal regularization process on the matrix to be decomposed. B is the total number of network hierarchies, lambda ₀ And lambda (lambda) _h Is an attenuation parameter->s is a decomposition variable representing the original convolution kernel or weight matrix, where +.> The number of input channels of the convolution layer is +.>The number of output channels is +.>Convolution kernel size k ₁ ×k ₂ J represents the matrix +.>Andis a rank of (c).

Preferably, the process of obtaining the motion recognition model of the important feature includes:

constructing a gating-based loop generation network, adding an attention mechanism to the gating-based loop generation network, and obtaining a GRU network with the attention mechanism;

and inputting the skeleton diagram data set into the GRU network with the attention adding mechanism for training, and obtaining the action recognition model of the important features.

Preferably, the expression of the attention mechanism is:

e _ij ＝w _i tanh(W _i h _i-1 +V _i x _j +b _i )

wherein e _ij Is a verification model e _i· Represents the i-th hidden layer state vector h _i The determined value of the attention probability distribution of each node, j represents the node sequence number, x _j Representing the attention value of the j-th node. a, a _ij For calculating intermediate quantity, normalizing node value x by a calculation mode similar to softMax _j WhereinAnd the sum of all node distribution values at the ith moment is represented. h is a _i-1 Represents the hidden state at the i-1 th moment, w _i ,W _i ,V _i Respectively represent the total weight coefficient matrix at the ith moment and the hidden state h _i-1 And node value x _j Weight coefficient matrix of b) _i Indicating the offset of the corresponding time. Calculating an important feature vector S containing node information at the ith moment by the formula _i 。

Preferably, the process of obtaining the human motion detection model includes:

the weight calculation is carried out on the feature vectors in the lightweight motion recognition model and the motion recognition model of the important features based on a gradient descent algorithm, so that lightweight vectors and important feature vectors are obtained;

calculating the lightweight vector and the important feature vector to obtain a fusion vector;

and calculating the fusion vector to obtain the human motion detection model.

Preferably, the process of obtaining the fusion vector includes:

splicing the lightweight vector and the important feature vector to obtain a spliced feature vector;

convolving the lightweight vector and the important feature vector based on a preset convolution kernel to obtain two new feature vectors, and adding the two new feature vectors to obtain a convolution feature vector;

and adding the spliced characteristic vector and the convolution characteristic vector, and then calculating to obtain the human motion detection model.

The invention has the technical effects that:

the application provides a human body action detection method based on lightweight characteristic preservation of a graph neural network. The model trained by the method is provided with a feature retaining module, and the important features in human body movement are retained through the gating unit and the attention mechanism, so that the problem of possible feature loss in general training is solved.

In the training of the neural network, the model in training is decomposed by using an SVD compression method, so that the complexity of the model is greatly reduced on the premise of ensuring the accuracy, and the model can be deployed on edge equipment for application.

The model is subjected to experiments on a data set disclosed in the current action detection field, and compared with the current mainstream action detection method, the proposed model has better performance, can be deployed into embedded equipment for application, and has strong robustness and generalization capability.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:

FIG. 1 is a block flow diagram of a human motion detection method based on lightweight feature preservation of a graph neural network in an embodiment of the invention;

FIG. 2 is a block diagram of a step S4 feature fusion module according to an embodiment of the present invention

FIG. 3 is a constructed spatiotemporal skeleton diagram in an embodiment of the invention;

FIG. 4 is an overall block diagram of the present method in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a singular value-based model compression module in an embodiment of the present invention;

fig. 6 is a schematic diagram of a feature preserving module based on a gating unit and an attention mechanism according to an embodiment of the present invention.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

Example 1

The human body motion detection method disclosed by the embodiment will be described with reference to fig. 1 and fig. 4, and the human body motion detection method based on the lightweight characteristic reservation of the graph neural network comprises the following steps:

step S1, acquiring a human behavior video data set with a category label, and preprocessing the video data set to obtain a skeleton diagram.

And S2, inputting skeleton information into a main space-time diagram convolutional neural network for training, extracting general features of human body movement, decomposing vectors of a convolutional kernel by using a singular value decomposition-based method, and compressing a model.

And S3, respectively inputting the human skeleton information into a characteristic retaining module based on the gate control unit for training.

And S4, fusing the obtained two feature vectors with different channel numbers by a novel feature fusion method, and distinguishing the importance of different features by using weight parameters to obtain a model with stronger expression capability.

In step S1, the data set required for training should include RGB images and depth images, and basic information such as height and age of the subject, and key point labeling information, and the data set should cover the daily activities of the human being.

It should be noted that we use PP-TinyPose human body gesture algorithm to obtain dynamic skeleton diagram for the input video, and extract 18 nodes of human body. PP-TinyPose uses spatiotemporal graphics to form a hierarchical representation of the skeletal sequence and constructs a non-oriented spatiotemporal graph g= (V, E) on the skeletal sequence with N nodes and T frames. Where V represents the set of nodes and E represents the set of edges connecting the nodes. Each node is represented as a triplet having three dimensions (x, y, t), where x and y are the positions of the joint in space and t is the frame index of the joint on the time axis. The space-time graph models the spatial and temporal relationship between joint points in a skeleton sequence, and provides a basis for subsequent motion recognition and gesture estimation tasks. In this undirected time-space diagram, the node set v= { V _ti T=1, …, T, i=1, …, N } contains all nodes in the bone sequence. Node v _ti Comprises the coordinate vector and the estimated confidence of the ith node on the T-th frame. The edge set E includes two parts, one part representing the physical and spatial connections between the various nodes within a single frame, if (v _ti ,v _tj ) E V then (V) _ti ,v _tj )∈E _S . Another part represents the connection in the time dimension between the same nodes in different frames if (v) _ti ,v _si ) E V and (V) _ti ,v _si )∈E _T . As shown in figure 3, the constructed space-time skeleton diagramI.e. can be used in the training of the graph neural network.

After all the motion skeleton diagrams in the video data set are processed, executing step 2 to train the backbone network so as to obtain a lightweight motion detection model, wherein the step specifically comprises the following steps:

step S201, the obtained human skeleton information is sent to a multi-mode graph neural network ST-GCN for training, and learning is performed on a time domain and a space domain respectively;

step S202, after each convolutional network training is completed, a model is cut by a singular value decomposition-based method, and parameters and calculated amount of the model are reduced;

referring to fig. 5, in the compression process of the model, the number of input channels isThe number of output channels is +.>Convolution kernel size k ₁ ×k ₂ Is->The layer can be->Interpreted as a linear layer->The corresponding rank j is approximately shaped as +.>And->Is a linear layer of the first layer. If it is mapped back to convolution, it corresponds to oneConvolution kernel followed by one/>And (5) a convolution kernel.

If the accuracy similar to the original model is to be obtained, then the model is to be compared with the original modelWith SVD full rank decomposition, we enable the decomposition to be performed more easily by adding orthogonal regularization to the training process. />Decomposing to obtain-> And->The weight matrix of (2) can be directly derived from +.>And->And (5) reconstructing. The regularization loss in the training process is shown in the following formula:

wherein I II _F Represents the F-norm and j represents the matrixAnd->I represents the identity matrix.

It should be noted that in the SVD training process, each hierarchy uses decomposition variablesInstead of the original convolution kernel or weight matrix. Forward transfer is by way of +.>The conversion to two successive network layers is accomplished, reverse transfer and optimization is directly at +.>And (3) performing the process. When singular vector matrix->When orthogonal, reducing the rank of the decomposition network amounts to making the singular value vector s for each network layer as sparse as possible. The sparsity loss during training is shown in the following formula:

based on the above analysis we propose a total objective function for the decomposition training:

wherein the method comprises the steps ofIs the training loss to resolve the network hierarchy, and B is the total number of network hierarchies. Lambda (lambda) ₀ And lambda (lambda) _h Is a decay parameter, a trade-off can be made between accuracy and parameter quantity,to obtain a low rank model.

Repeating the steps S201 and S202, circularly training until the model learns general characteristics of skeleton actions, and obtaining a lightweight action recognition model by a singular value decomposition method.

Step S3 is performed to preserve some features that may be ignored by the neural network by a gating cell based approach. Wherein, step S3 includes:

step S301: the input skeleton graph is processed using a gating-based loop generation network GRU.

Step S302: and adding an attention mechanism in the GRU network, and distributing different weights to the input at each moment when the model processes the input sequence, so that the model pays attention to a specific part in the skeleton sequence, and an action recognition model with the important characteristics reserved is obtained.

It should be noted that, referring to fig. 6, in step S301, the GRU has two gating units: an update gate and a reset gate. The update gate controls the degree of update between the input features at the current time and the hidden states at the previous time, thereby helping the network to better capture the long-term dependencies of the nodes in the skeleton graph. The reset gate then controls whether the hidden state should be reset to the original state, thereby helping the network to better handle different skeleton sequences. The procedure of feature preservation obeys the following calculation formula:

z _t ＝σ(W _xz f+W _hz h _t-1 )

r _t ＝σ(W _xr f+W _hr h _t-1 )

h _t ′＝tanh(W _hx f+r _t eW _hh h _t-1 )

wherein W represents a weight vector, z _t Representing an update gate, r _t Representing a forgetting gate, σ represents a sigmoid function by which data is transformed into a value in the range 0-1, acting as a gating signal, h _t-1 The hidden layer state at time t-1 is indicated and includes past information. h is a _t ' is candidate hidden layer state, when r _t When approaching zero, the model will discard the past hidden information, leaving only the information currently entered. r is (r) _t When approaching 1, the past information is considered to be useful and added to the current information.

Note that, in step S302, the calculation formula of the attention mechanism added for the GRU is as follows:

e _ij ＝w _i tanh(W _i h _i-1 +V _i x _j +b _i )

Finally, after the models learn the general features and the key retention features of the human actions in step S2 and step S3, the two models are fused in step S4, please refer to fig. 2, and step S4 includes the following steps:

in step S401, two feature vectors are respectively assigned with different weights lambda and 1-lambda by means of channel weight fusion, wherein lambda is a parameter automatically updated by a gradient descent algorithm.

Step S402, splicing the two feature vectors obtained in the step S401 in the channel dimension to obtain a new feature vector.

Step S403, using the dimensions asIs convolved with respect to the two eigenvectors of the input to generate two new eigenvectors, and adds the two eigenvectors.

And step S404, adding the feature vectors obtained in the step S402 and the step S403 to finally obtain a lightweight human motion detection model with the feature reserved.

In summary, the invention discloses a human motion detection method with light weight feature preservation, which comprises the following steps: preprocessing a human motion video data set to obtain a space-time skeleton diagram of motion; cutting a model trained by a backbone network by using a singular value decomposition-based method, and obtaining a lightweight model on the premise of ensuring the progress; the important features in training are prevented from losing by using a gating unit and a feature retaining module of an attention mechanism; and splicing the models with the same dimension but different channels by using a channel weight fusion method to fuse the models. The model is tested on several public human body action detection data sets of the main stream and human body action recognition data sets in a locomotive driving scene, and compared with the model of the current main stream, better results are obtained, and the model can be deployed on edge embedded equipment to obtain better detection results, so that the model is proved to have higher performance and good applicability.

Example two

The embodiment provides a human body action detection method based on lightweight characteristic reservation of a graph neural network, which comprises the following steps:

the method for detecting the human body motion based on the lightweight characteristic reservation of the graph neural network is characterized by comprising the following steps of:

step S1, using a human behavior video data set with a category label, wherein the data set contains RGB images and depth images, basic information such as the height and age of a subject and key point labeling information, and the data set covers human daily activity related actions. Preprocessing the video data set to obtain a skeleton diagram.

And S2, inputting human skeleton information into a main space-time diagram convolutional neural network for training, extracting general characteristics of human motion, decomposing vectors of a convolutional kernel by using a singular value decomposition-based method, compressing the decomposed vectors, and effectively reducing the complexity of a model on the premise of ensuring the accuracy.

And S3, inputting the human skeleton information into a characteristic retaining module based on a gate control unit for training. The feature preservation module automatically selects feature vectors to be preserved and feature vectors to be preserved through a forgetting gate and an updating gate in the gating unit, and preserves important features in the skeleton data through continuous iteration.

Further optimizing the scheme, the step S1 comprises the following steps:

step S101, data preprocessing: processing human body action videos into a serialized picture set by taking frames as units, carrying out gesture estimation on each frame of picture by using a PP-TinyPose algorithm, extracting 18 joints of a human body, and constructing a non-directional space-time diagram G= (V, E) of a skeleton sequence, wherein V represents a joint set and V= { V _ti T=1, …, T, i=1, …, N }. E denotes the set of edges connecting these nodes, each node being represented as a triplet having three dimensions (x, y, t), where x and y are the positions of the joint in space and t is the frame index of the joint on the time axis.

Further optimizing the scheme, the step S2 comprises the following steps:

step S201: and sending the obtained human skeleton information to a multi-modal-diagram neural network ST-GCN for training, and respectively learning in a time domain and a space domain.

Step S202: after each convolutional network training is completed, a singular value decomposition-based method is used for cutting the model, and parameters and calculated amount of the model are reduced.

Step S203: and repeating the steps S201 and S202, and circularly training until the model learns general characteristics of skeleton actions, thereby obtaining the lightweight action recognition model.

The method for detecting a lightweight characteristic preserving human motion according to claim 1, wherein step S3 comprises:

Further optimizing the scheme, the step S4 comprises the following steps:

step S401: and respectively giving different weights lambda and 1-lambda to the two feature vectors in a channel weight fusion mode, wherein lambda is a parameter automatically updated by a gradient descent algorithm.

Step S402: and (3) splicing the two feature vectors obtained in the step (S401) in the channel dimension to obtain a new feature vector.

Step S403: the size of the used dimension isIs convolved with respect to the two eigenvectors of the input to generate two new eigenvectors, and adds the two eigenvectors.

Step S404: the feature vectors obtained in step S402 and step S403 are added.

The model compression method based on singular value decomposition is specifically as follows:

the weight matrix of the convolution kernel is decomposed into products of three matrices, and gradient update is directly carried out on the three matrices in the fine tuning process. For an input channel number ofThe number of output channels is +.>Convolution kernel size k ₁ ×k ₂ Is a convolution layer of (2)We can add this layer->Interpreted as a linear layer->The corresponding rank j-is approximately shaped as +.>And->Is a linear layer of the first layer. Mapping back the convolution, which corresponds to one +.>Convolution followed by oneAnd (5) convolution. To avoid repetition of SVD operations in each step, we are +.>SVD total rank decomposition is carried out to obtain At the same time, orthogonal regularization is introduced to solve the problem that matrix orthogonality property is not easy to obtain in decomposition, and the decomposition method is shown in the formula:

wherein I II _F Represents the F-norm and j represents the matrixAnd->Is a rank of (c).

When singular vector matrixWhen the two layers are orthogonal, reducing the rank of the decomposition network is equivalent to making the singular value vector s of each network layer as sparse as possible, so the sparsity is represented by the following formula:

the resulting overall objective function of the decomposition training is:

wherein the method comprises the steps ofIs the training loss to resolve the network hierarchy, and B is the total number of network hierarchies. Lambda (lambda) ₀ And lambda (lambda) _h Is a fading parameter, and can be weighted between accuracy and parameter quantity to obtain low rankAnd (5) a model. Finally, a lightweight graph rolling network action recognition model is obtained after multiple times of cyclic training.

Further optimizing scheme, skeleton information processing network based on door control unit includes:

the GRU has two gating units: an update gate (update gate) and a reset gate (reset gate). The update gate controls the degree of update between the input features at the current time and the hidden states at the previous time, thereby helping the network to better capture the long-term dependencies of the nodes in the skeleton graph. The reset gate then controls whether the hidden state should be reset to the original state, thereby helping the network to better handle different skeleton sequences. The specific calculation rules follow the following formula:

z _t ＝σ(W _xz f+W _hz h _t-1 )

r _t ＝σ(W _xr f+W _hr h _t-1 )

h _t ′＝tanh(W _hx f+r _t ⊙W _hh h _t-1 )

wherein W represents a weight vector, z _t Representing an update gate, r _t Representing a forgetting gate, σ represents a sigmoid function by which data is transformed into a value in the range 0-1, acting as a gating signal, h _t-1 The hidden layer state at time t-1 is indicated and includes past information. h is a _t ' is candidate hidden layer state, when r _t When approaching zero, the model will discard the past hidden information, leaving only the information currently entered. r is (r) _t When approaching 1, the past information is considered to be useful and added to the current information. By using update and reset gates, the GRU can effectively retain the important features of the input, thereby improving the performance and application of the model.

Further optimizing scheme, the method for introducing the attention mechanism in the GRU network comprises the following steps:

the model assigns a different weight to each input at each time instant when processing the input sequence, which tells the model which parts of the input skeleton sequence should be focused on, thereby extracting useful feature information more efficiently. Wherein the calculation formula of the attention mechanism is as follows:

e _ij ＝w _i tanh(W _i h _i-1 +V _i x _j +b _i )

The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The human body action detection method based on the lightweight characteristic reservation of the graph neural network is characterized by comprising the following steps of:

2. The human action detection method based on lightweight feature preservation of a graph neural network of claim 1, wherein the process of obtaining a skeleton graph dataset comprises:

processing the human behavior data set into a serialized picture set;

3. The human motion detection method based on lightweight feature preservation of a graph neural network of claim 1, wherein the process of obtaining a lightweight motion recognition model comprises:

4. The human motion detection method based on the lightweight class feature preservation of the graph neural network according to claim 3, wherein the expression of the total objective function for clipping and compressing the trained backbone space-time graph convolutional neural network based on the singular value decomposition method is:

wherein,is to decompose training loss of network hierarchy, +.>Is the sparsity loss function of vector s, +.>Representing an orthogonal regularization process on the matrix to be decomposed; b is the total number of network hierarchies, lambda ₀ And lambda (lambda) _h Is an attenuation parameter->s is a decomposition variable representing the original convolution kernel or weight matrix, where +.> The number of input channels of the convolution layer is +.>The number of output channels is +.>Convolution kernel size k ₁ ×k ₂ J represents the matrix +.>And->Is used for the control of the rank of (c),representing the overall objective function of clipping and compression.

5. The human motion detection method based on lightweight feature preservation of a graph neural network according to claim 1, wherein the process of obtaining the motion recognition model of important features comprises:

6. The human motion detection method based on lightweight feature preservation of a graph neural network of claim 5, wherein the expression of the attention mechanism is:

e _ij ＝w _i tanh(W _i h _i-1 +V _i x _j +b _i )

wherein e _ij Is a verification model e _i· Represents the i-th hidden layer state vector h _i The determined value of the attention probability distribution of each node, j represents the node sequence number, x _j An attention value representing the j-th node; a, a _ij For calculating intermediate quantity, normalizing node value x by a calculation mode similar to softMax _j WhereinRepresenting the sum of all node distribution values at the ith moment; h is a _i-1 Represents the hidden state at the i-1 th moment, w _i ,W _i ,V _i Respectively represent the total weight coefficient matrix at the ith moment and the hidden state h _i-1 And node value x _j Weight coefficient matrix of b) _i Representing the offset of the corresponding time; calculating an important feature vector S containing node information at the ith moment by the formula _i 。

7. The human motion detection method based on lightweight feature preservation of a graph neural network of claim 1, wherein the process of obtaining a human motion detection model comprises:

and calculating the fusion vector to obtain the human motion detection model.

8. The human motion detection method based on lightweight feature preservation of a graph neural network of claim 7, wherein the process of obtaining a fusion vector comprises: