CN116386148B

CN116386148B - Knowledge graph guide-based small sample action recognition method and system

Info

Publication number: CN116386148B
Application number: CN202310619753.XA
Authority: CN
Inventors: 徐波; 钟幼平; 刘嘉; 刘家豪; 林谋; 丁元
Original assignee: Super High Voltage Branch Of State Grid Jiangxi Electric Power Co ltd; State Grid Corp of China SGCC
Current assignee: Super High Voltage Branch Of State Grid Jiangxi Electric Power Co ltd; State Grid Corp of China SGCC
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2023-08-11
Anticipated expiration: 2043-05-30
Also published as: CN116386148A

Abstract

The application belongs to the technical field of action recognition, and relates to a small sample action recognition method and system based on knowledge graph guidance. The system of the application is composed of a knowledge graph construction module for action recognition, an information transmission module based on a graph convolution neural network and an action information recognition module. According to the method, a knowledge graph and a graph convolution neural network are constructed for training, and a video set of unknown action categories is divided into a support sample and a query sample; obtaining a video prototype feature vector through a support sample and a knowledge graph; and calculating cosine similarity between the video prototype feature vector and the video feature vector of the query sample, and taking a label corresponding to the video prototype feature vector with the maximum cosine similarity as a predicted action category. The application improves the accuracy of small sample motion recognition.

Description

Knowledge graph guide-based small sample action recognition method and system

Technical Field

The application belongs to the technical field of action recognition, and particularly relates to a small sample action recognition method and system based on knowledge graph guidance.

Background

In recent years, research on small samples has attracted attention from many domestic and foreign top scientific institutions and even national government institutions. On the one hand, many application scenes in reality face the problem that data are difficult to collect or training data are insufficient due to high labeling cost. For example, in the medical imaging field, data of rare diseases is often difficult to collect, and professionals cannot be easily found to effectively mark the data; in the unmanned field, data samples of various emergency situations are particularly rare; in the financial investment field, data is generally distributed in long tails, and it is difficult to obtain enough training samples for tail scenes. The theory and technology of developing small sample learning can help the deep learning technology to fall to the ground in the application scene lacking data, and has wide application prospect in a plurality of fields. In order to promote the development of the small sample learning technology, the research of new generation artificial intelligence takes the lead position, and institutions at home and abroad issue research plans aiming at the small sample learning.

At present, for a small sample learning task, some related research work has been performed and some progress has been made. Existing methods can be broadly divided into three categories depending on the emphasis of the method: based on a small sample recognition technology of meta learning, the learning experience of a model on a large number of learning tasks is mainly researched so as to realize small sample recognition; based on a small sample recognition technology of data enhancement, a main study is how to design a method to expand a limited data set so as to improve the performance of a constructed model; the small sample recognition technology of the semantic relation is introduced, and the relation among visual concepts is established mainly by the aid of the relation among high-level semantic concepts so as to better perform small sample recognition. In small sample learning, a priori knowledge can help the model effectively utilize the existing learning experience, and rapid learning on a small number of samples is realized. Thus, the introduction of a priori knowledge is important for small sample learning. Only the last of the three above methods exploits a priori knowledge, however the current research is limited to the exploitation of text semantic concepts. Since semantic text relationships do not adequately reflect visual relationships, the assistance to small sample visual recognition tasks is often limited. Therefore, the visual priori knowledge is fully mined, a method for identifying the small sample based on the multi-modal knowledge is explored, the knowledge is developed to drive the small sample learning theory and technology, and the method has important research significance and scientific value.

Based on the knowledge, the inventor is inspired by the learning process of the biological brain to research the knowledge-driven small sample visual recognition theory and technology. Biological studies have shown that the learning process of the biological brain does not start from scratch, but rather has important a priori knowledge at the beginning of the learning, including what species learn during evolution (biology is called phylogenetic) and key knowledge about the real world that individuals learn during life. These knowledge plays a very important role in the learning process of the biological brain. This also forms the theoretical basis for knowledge-driven small sample visual recognition tasks. However, how to construct, represent and utilize a priori knowledge in the task of visual recognition, so that models can learn effectively with a small number of samples, is a significant scientific problem.

The Chinese patent publication CN112766354A discloses a small sample picture identification method based on a knowledge graph, which constructs the knowledge graph containing all kinds of labels in a training picture set, and performs feature extraction on the knowledge graph through a graph convolutional neural network to obtain the kind of the picture to be identified as a picture identification result.

According to the method, only the label information of the image is utilized, only one node of one image category corresponds to the label information in the knowledge graph, fine granularity information in the video cannot be focused, and therefore the accuracy is low. For example, a video of playing basketball, such as the method of CN112766354a, uses the action category "playing basketball" as the node of the knowledge graph, but attributes such as height, age, position of the athlete are likely to influence the result of action recognition, which may lead to inaccurate recognition.

Disclosure of Invention

The application aims to provide a small sample action recognition system and method based on knowledge graph guidance, aims at videos with richer semantic information (content), establishes a knowledge graph with finer granularity aiming at various semantemes in the videos, integrates various attributes into the construction of the knowledge graph, uses the knowledge graph in subsequent action recognition, and improves the accuracy of small sample action recognition.

The small sample action recognition method based on knowledge graph guidance comprises the following steps: constructing a knowledge graph for identifying the actions, wherein the knowledge graph comprises actions and attributes of the actions;

selecting a video set of known action categories as a training set, and selecting a part of videos in each action category of the training set as a support sample and the other part of videos as a query sample; extracting the characteristics of all relevant nodes in the knowledge graph through a graph convolution neural network, and taking the characteristics as knowledge graph characteristics; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesVideo feature vectorThe corresponding label is->The method comprises the steps of carrying out a first treatment on the surface of the Calculating video prototype feature vector FC with video feature vector of query sample +.>Cosine loss of (2)The method comprises the steps of carrying out a first treatment on the surface of the Training the direction propagation of the graph convolution neural network;

dividing a video set of unknown action categories into a support sample and a query sample; extracting the characteristics of all relevant nodes in the knowledge graph through the trained graph convolution neural network, and taking the characteristics as the characteristics of the knowledge graph; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesThe method comprises the steps of carrying out a first treatment on the surface of the Calculating the feature vector FC and the feature vector +.>And taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category.

Further preferably, the step of constructing the knowledge graph is as follows:

s11, combining the structural characteristics of the motion recognition corpus, and designing a mode layer aiming at the motion category video; the mode layer comprises word categories which need to be extracted from the active corpus and connection relations between the word categories; the word categories are all action categories, scenes where actions occur, objects applying actions and objects related to the actions; taking each word as a node in the knowledge graph; taking the action category as a center, and taking a scene, an object and an object as attributes of the action to be connected with the action; the knowledge graph is designed into an undirected graph so as to facilitate knowledge transfer;

s12, extracting an alternative entity from the action recognition corpus by using an entity extraction technology: extracting all entities from the introduction pages of each action on the action recognition corpus by using an entity extraction algorithm, and then screening four categories of entities of actions, scenes, objects and objects from all the entities by using a part-of-speech selection algorithm to serve as candidate entities; manually selecting a plurality of words which are strongly related to the action category from the candidate entities;

s13, finding a new entity similar to the manually selected entity from the motion recognition corpus according to the cosine distance of the word vector between the candidate entity and the manually selected entity;

s14, filtering the discovered new entity by using an entity disambiguation technology, screening out ambiguous words, and then manually screening again;

s15, taking the video action as a central node, taking important action attributes as common nodes, and taking the relation between the actions and the attributes as edges to obtain a knowledge graph.

Further preferably, the method for extracting the knowledge graph features comprises the following steps: modeling the knowledge graph by adopting an adjacency matrix, wherein the transverse dimension and the longitudinal dimension of the adjacency matrix represent related nodes, the related nodes with connection are marked as 1 in the knowledge graph, and otherwise, the related nodes with connection are marked as 0; the graph convolution neural network takes word vectors of all relevant nodes as input, information is transmitted to all the nodes through the knowledge diffusion characteristic of the graph convolution neural network, the information is output as final characteristics of each node, and the final characteristics of all the nodes form knowledge graph characteristics.

Further preferably, the cosine loss is calculated as follows:

；

wherein ,video feature vector representing video prototype feature vector FC with query samples>And (3) compared with cosine similarity, wherein Norm is L2 regularization.

The application also provides a small sample motion recognition system based on knowledge graph guidance, which comprises a knowledge graph construction module, an information transmission module based on a graph convolution neural network and a motion information recognition module, wherein the knowledge graph construction module extracts relevant corpus of motion attributes from a motion recognition corpus and constructs a knowledge graph; the information transmission module utilizes a knowledge graph and combines a graph convolution neural network to enable motion related information to be transmitted among different devices; dividing a video set of unknown action categories into a support sample and a query sample; extracting the characteristics of all relevant nodes in the knowledge graph through the trained graph convolution neural network, and taking the characteristics as the characteristics of the knowledge graph; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesThe method comprises the steps of carrying out a first treatment on the surface of the The motion information recognition module calculates a video prototype feature vector FC and a video feature vector +.>And taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category.

The application also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the small sample action recognition method based on the knowledge graph guidance.

The present application also provides a computer program product comprising a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the knowledge-graph-guided small sample action recognition method described above.

The application also provides an electronic device, comprising: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a knowledge-graph guided small sample action recognition method.

The application has the beneficial effects that: the method comprises the steps of firstly extracting relevant linguistic data of action attributes from an action recognition corpus and constructing a knowledge graph. The information transmission module combines the constructed knowledge graph with the graph convolution neural network so that the action information can be transmitted among different devices; and finally, classifying the propagated information by an action information recognition module. The method comprises the steps of training a knowledge graph and a graph convolution neural network, and dividing a video set of unknown action categories into a support sample and a query sample; obtaining a video prototype feature vector through a support sample and a knowledge graph; and calculating the cosine similarity between the video prototype feature vector and the video feature vector of the query sample, and taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category, thereby improving the accuracy of small sample action recognition.

Drawings

Fig. 1 is a schematic diagram of the present application.

Detailed Description

The application is further elucidated in detail below in connection with the accompanying drawings.

Referring to fig. 1, the method for identifying the small sample motion based on knowledge graph guidance comprises the following steps:

s1, constructing a knowledge graph in a semi-automatic mode:

s11, combining the structural characteristics of the motion recognition corpus, and firstly designing a mode (schema) layer aiming at the motion category video. The pattern layer contains the word categories that need to be extracted from the active corpus and the connection relations between them. The required word categories are the categories of all actions, the scenes in which the actions occur, the objects to which the actions are applied (e.g. basketball players), the objects associated with the actions (e.g. basketball, basketball baskets); taking each word as a node in the knowledge graph; taking the action category as a center, and connecting other scenes, objects and objects with the action as attributes of the action; the knowledge graph is designed as an undirected graph to facilitate knowledge transfer.

S12, extracting an alternative entity from the action recognition corpus by using an entity extraction technology: extracting all entities (words with all meanings) from the introduction page of each action on the action recognition corpus by using an entity extraction algorithm, and then screening four categories of entities of actions, scenes, objects and objects from all the entities by using a part-of-speech selection algorithm to serve as candidate entities; and manually select from the candidate entities a number of words that are strongly related to the action category.

s14, filtering the discovered new entity by using an entity disambiguation technology, screening out some ambiguous words, and then manually screening again;

Step S2, training a graph convolutional neural network:

s21, dividing a video set with known action categories into a training set and a testing set; n (k+q) segments of video are sampled from the training set. N is the action category number. K pieces of video in each action category are used as support samples, and q pieces of video are used as query samples.

S22, obtaining video prototype vectors of each action category, extracting features of all relevant nodes in a knowledge graph through a graph convolution neural network, and taking the features as knowledge graph features (the relevant nodes are determined according to categories to be classified, for example, two actions of playing basketball and playing football are classified, and the nodes of the basketball and the football and all nodes connected with the nodes of the basketball and the football are taken as relevant nodes, wherein the relevant nodes are attributes of videos), and the specific method is as follows:

s221, extracting video features of the support samples of each action category by the feature extraction network, and averaging in the dimension of the action category, so as to obtain n average video features, wherein n video features form video feature vectors of the support samples.

S222, modeling the knowledge graph by adopting an adjacent matrix, wherein the transverse dimension and the longitudinal dimension of the adjacent matrix represent related nodes, the related nodes with connection are marked as 1 in the knowledge graph, and otherwise, the related nodes with connection are marked as 0; the graph convolution neural network takes word vectors of all relevant nodes as input, information is transmitted to all the nodes through the knowledge diffusion characteristic of the graph convolution neural network, the information is output as final characteristics of each node, and the final characteristics of all the nodes form knowledge graph characteristics. The specific mathematical expression is as follows:

X ^t+1 =AX ^t W ^t

wherein t is the number of layers of the graph roll-up neural network, W ^t The parameters of the neural network are rolled for the t layer diagram, A is an adjacent matrix and X ^t For the input of the layer t graph convolution neural network, X when t is 1 ¹ Neural network parameters are rolled for a randomly initialized map. X is X ^t+1 For the output of the t layer of the graph-convolution neural network, and also for the input of the t+1 layer of the graph-convolution neural network, the graph-convolution neural network of the embodiment has 3 layers, and the output X of the 3 layer of the graph-convolution neural network ⁴ As an output of the entire graph convolution neural network. The dimension of the knowledge graph features is m x d, m is the number of nodes related to the action category, d is the dimension of the output features, and the dimensions are 2048 dimensions as the dimension of the video features.

S223, carrying out dot multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video: k= P.z, where K represents various attribute features of the video, P represents knowledge-graph features, and z represents video feature vectors of support samples extracted by the feature extraction network. Splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector, wherein the formula is as follows: fc=cat (z, K), where cat is a stitching operation and FC represents a video prototype feature vector.

S224, extracting video feature vectors of query samples by a feature extraction networkVideo feature vector->The corresponding label is->。

S225, calculating video prototype feature vectors FC and video feature vectors of query samplesCosine loss of (2)The method comprises the steps of carrying out a first treatment on the surface of the The graph roll-up neural network is directionally propagated to perform training. The concrete mathematical expression of cosine loss is as follows:

；

S3, identifying actions:

s31. the test set contains n (k+q) segments of video. Wherein n is the action category number. K pieces of video in each action category are used as support samples, and q pieces of video are used as query samples.

S32, obtaining video prototype vectors of each action category, extracting features of all relevant nodes in a knowledge graph through a graph convolution neural network, and taking the features as knowledge graph features (the relevant nodes are determined according to the categories to be classified, for example, basketball and football actions to be classified, and taking basketball and football nodes and all nodes connected with the basketball and football nodes as relevant nodes, wherein the relevant nodes are attributes of videos), and the specific method is as follows:

s321, extracting video features of the support samples of each action category by the feature extraction network, averaging in the dimension of the action category, so as to obtain n average video features, and forming video feature vectors of the support samples by the n video features.

S322, modeling the knowledge graph by adopting an adjacent matrix, wherein the transverse dimension and the longitudinal dimension of the adjacent matrix represent related nodes, the related nodes with the relation are marked as 1 in the knowledge graph, otherwise, the related nodes are marked as 0, the graph convolution neural network takes word vectors of all related nodes as input, information is transmitted to all the nodes through the characteristic of knowledge diffusion of the graph convolution neural network, the information is output as final characteristics of each node, and the final characteristics of all the nodes form knowledge graph characteristics. The specific mathematical expression is as follows:

X ^t+1 =AX ^t W ^t

S323, carrying out dot multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video: k= P.z, where K represents various attribute features of the video, P represents knowledge-graph features, and z represents video feature vectors of support samples extracted by the feature extraction network. Splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector, wherein the formula is as follows: fc=cat (z, K), where cat is a stitching operation and FC represents a video prototype feature vector.

S324, extracting video feature vectors of query samples。

S325, calculating a video prototype feature vector FC and a video feature vectorAnd taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category. The cosine similarity formula is as follows:。

the unlabeled query sample during testing requires the graph convolutional neural network to give out label prediction of the query sample as a result of action recognition. In actual use, the video set of unknown action categories is divided into a support sample and a query sample, and the processing is similar to that of the test set.

Referring to fig. 1, a small sample motion recognition system based on knowledge graph guidance comprises a knowledge graph construction module, an information transmission module based on a graph convolution neural network and a motion information recognition module, wherein the knowledge graph construction module extracts relevant corpora of motion attributes from a motion recognition corpus and constructs a knowledge graph; the information transmission module utilizes a knowledge graph and combines a graph convolution neural network to enable motion related information to be transmitted between different devicesSowing; dividing a video set of unknown action categories into a support sample and a query sample; extracting the characteristics of all relevant nodes in the knowledge graph through the trained graph convolution neural network, and taking the characteristics as the characteristics of the knowledge graph; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesThe method comprises the steps of carrying out a first treatment on the surface of the The motion information recognition module calculates a video prototype feature vector FC and a video feature vector +.>And taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category.

In another embodiment, a non-volatile computer storage medium is provided, the computer storage medium storing computer executable instructions that are capable of performing the knowledge-graph-guided small sample motion recognition method of any of the above embodiments.

The present embodiment also provides a computer program product comprising a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the knowledge-graph guided small sample action recognition method of the above embodiments.

The present embodiment provides an electronic device including: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a knowledge-graph guided small sample action recognition method.

The above-described specific embodiments further illustrate the objects, technical solutions and technical effects of the present application in detail. It should be understood that the foregoing is only illustrative of the present application and is not intended to limit the scope of the application, and that all equivalent changes and modifications that may be made by those skilled in the art without departing from the spirit and principles of the application shall fall within the scope of the application.

Claims

1. The small sample action recognition method based on knowledge graph guidance is characterized by constructing a knowledge graph for action recognition, wherein the knowledge graph comprises actions and attributes of the actions;

selecting a video set of known action categories as a training set, and selecting a part of videos in each action category of the training set as a support sample and the other part of videos as a query sample; extracting the characteristics of all relevant nodes in the knowledge graph through a graph convolution neural network, and taking the characteristics as knowledge graph characteristics; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesVideo feature vector->The corresponding label is->The method comprises the steps of carrying out a first treatment on the surface of the Calculating video prototype feature vector FC with video feature vector of query sample +.>Cosine of (2)Loss ofThe method comprises the steps of carrying out a first treatment on the surface of the Training the direction propagation of the graph convolution neural network;

when the motion is identified, dividing the video set into a support sample and a query sample, wherein the query sample is of an unknown motion type; extracting the characteristics of all relevant nodes in the knowledge graph through the trained graph convolution neural network, and taking the characteristics as the characteristics of the knowledge graph; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesThe method comprises the steps of carrying out a first treatment on the surface of the Calculating the feature vector FC and the feature vector +.>And taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category.

2. The knowledge-graph-guided-based small sample motion recognition method of claim 1, wherein the step of constructing a knowledge graph comprises:

s11, combining the structural characteristics of the motion recognition corpus, and designing a mode layer aiming at the motion category video; the mode layer comprises word categories extracted from the active corpus and connection relations between the word categories; the word categories are all action categories, scenes where actions occur, objects applying actions and objects related to the actions; taking each word as a node in the knowledge graph; taking the action category as a center, and taking a scene, an object and an object as attributes of the action to be connected with the action; the knowledge graph is designed into an undirected graph so as to facilitate knowledge transfer;

3. The knowledge-graph-guide-based small sample motion recognition method according to claim 1, wherein the method for extracting knowledge-graph features is as follows: modeling the knowledge graph by adopting an adjacency matrix, wherein the transverse dimension and the longitudinal dimension of the adjacency matrix represent related nodes, the related nodes with connection are marked as 1 in the knowledge graph, and otherwise, the related nodes with connection are marked as 0; the graph convolution neural network takes word vectors of all relevant nodes as input, information is transmitted to all the nodes through the knowledge diffusion characteristic of the graph convolution neural network, the information is output as final characteristics of each node, and the final characteristics of all the nodes form knowledge graph characteristics.

4. The knowledge-graph-guided small sample motion recognition method of claim 1, wherein cosine loss is calculated as:

；

5. A system for implementing the small sample motion recognition method based on knowledge graph guidance of claim 1, comprising a knowledge graph construction module, an information propagation module based on a graph convolution neural network, and a motion information recognition module, wherein the knowledge graph construction module extracts relevant corpora of motion attributes from a motion recognition corpus and constructs a knowledge graph; the information transmission module utilizes a knowledge graph and combines a graph convolution neural network to enable motion related information to be transmitted among different nodes; when the motion is identified, dividing the video set into a support sample and a query sample, wherein the query sample is of an unknown motion type; extracting the characteristics of all relevant nodes in the knowledge graph through the trained graph convolution neural network, and taking the characteristics as the characteristics of the knowledge graph; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesThe method comprises the steps of carrying out a first treatment on the surface of the The motion information recognition module calculates a video prototype feature vector FC and a video feature vector +.>And taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category.

6. A non-transitory computer storage medium storing computer executable instructions for performing the knowledge-based guided small sample motion recognition method of any one of claims 1-4.

7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, wherein the instructions are executable by the at least one processor to enable the at least one processor to perform the knowledge-graph guided small sample motion recognition method of any one of claims 1-4.