CN116386148B - Knowledge graph guide-based small sample action recognition method and system - Google Patents

Knowledge graph guide-based small sample action recognition method and system Download PDF

Info

Publication number
CN116386148B
CN116386148B CN202310619753.XA CN202310619753A CN116386148B CN 116386148 B CN116386148 B CN 116386148B CN 202310619753 A CN202310619753 A CN 202310619753A CN 116386148 B CN116386148 B CN 116386148B
Authority
CN
China
Prior art keywords
video
graph
knowledge
action
knowledge graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310619753.XA
Other languages
Chinese (zh)
Other versions
CN116386148A (en
Inventor
徐波
钟幼平
刘嘉
刘家豪
林谋
丁元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Super High Voltage Branch Of State Grid Jiangxi Electric Power Co ltd
State Grid Corp of China SGCC
Original Assignee
Super High Voltage Branch Of State Grid Jiangxi Electric Power Co ltd
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Super High Voltage Branch Of State Grid Jiangxi Electric Power Co ltd, State Grid Corp of China SGCC filed Critical Super High Voltage Branch Of State Grid Jiangxi Electric Power Co ltd
Priority to CN202310619753.XA priority Critical patent/CN116386148B/en
Publication of CN116386148A publication Critical patent/CN116386148A/en
Application granted granted Critical
Publication of CN116386148B publication Critical patent/CN116386148B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Psychiatry (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The application belongs to the technical field of action recognition, and relates to a small sample action recognition method and system based on knowledge graph guidance. The system of the application is composed of a knowledge graph construction module for action recognition, an information transmission module based on a graph convolution neural network and an action information recognition module. According to the method, a knowledge graph and a graph convolution neural network are constructed for training, and a video set of unknown action categories is divided into a support sample and a query sample; obtaining a video prototype feature vector through a support sample and a knowledge graph; and calculating cosine similarity between the video prototype feature vector and the video feature vector of the query sample, and taking a label corresponding to the video prototype feature vector with the maximum cosine similarity as a predicted action category. The application improves the accuracy of small sample motion recognition.

Description

Knowledge graph guide-based small sample action recognition method and system
Technical Field
The application belongs to the technical field of action recognition, and particularly relates to a small sample action recognition method and system based on knowledge graph guidance.
Background
In recent years, research on small samples has attracted attention from many domestic and foreign top scientific institutions and even national government institutions. On the one hand, many application scenes in reality face the problem that data are difficult to collect or training data are insufficient due to high labeling cost. For example, in the medical imaging field, data of rare diseases is often difficult to collect, and professionals cannot be easily found to effectively mark the data; in the unmanned field, data samples of various emergency situations are particularly rare; in the financial investment field, data is generally distributed in long tails, and it is difficult to obtain enough training samples for tail scenes. The theory and technology of developing small sample learning can help the deep learning technology to fall to the ground in the application scene lacking data, and has wide application prospect in a plurality of fields. In order to promote the development of the small sample learning technology, the research of new generation artificial intelligence takes the lead position, and institutions at home and abroad issue research plans aiming at the small sample learning.
At present, for a small sample learning task, some related research work has been performed and some progress has been made. Existing methods can be broadly divided into three categories depending on the emphasis of the method: based on a small sample recognition technology of meta learning, the learning experience of a model on a large number of learning tasks is mainly researched so as to realize small sample recognition; based on a small sample recognition technology of data enhancement, a main study is how to design a method to expand a limited data set so as to improve the performance of a constructed model; the small sample recognition technology of the semantic relation is introduced, and the relation among visual concepts is established mainly by the aid of the relation among high-level semantic concepts so as to better perform small sample recognition. In small sample learning, a priori knowledge can help the model effectively utilize the existing learning experience, and rapid learning on a small number of samples is realized. Thus, the introduction of a priori knowledge is important for small sample learning. Only the last of the three above methods exploits a priori knowledge, however the current research is limited to the exploitation of text semantic concepts. Since semantic text relationships do not adequately reflect visual relationships, the assistance to small sample visual recognition tasks is often limited. Therefore, the visual priori knowledge is fully mined, a method for identifying the small sample based on the multi-modal knowledge is explored, the knowledge is developed to drive the small sample learning theory and technology, and the method has important research significance and scientific value.
Based on the knowledge, the inventor is inspired by the learning process of the biological brain to research the knowledge-driven small sample visual recognition theory and technology. Biological studies have shown that the learning process of the biological brain does not start from scratch, but rather has important a priori knowledge at the beginning of the learning, including what species learn during evolution (biology is called phylogenetic) and key knowledge about the real world that individuals learn during life. These knowledge plays a very important role in the learning process of the biological brain. This also forms the theoretical basis for knowledge-driven small sample visual recognition tasks. However, how to construct, represent and utilize a priori knowledge in the task of visual recognition, so that models can learn effectively with a small number of samples, is a significant scientific problem.
The Chinese patent publication CN112766354A discloses a small sample picture identification method based on a knowledge graph, which constructs the knowledge graph containing all kinds of labels in a training picture set, and performs feature extraction on the knowledge graph through a graph convolutional neural network to obtain the kind of the picture to be identified as a picture identification result.
According to the method, only the label information of the image is utilized, only one node of one image category corresponds to the label information in the knowledge graph, fine granularity information in the video cannot be focused, and therefore the accuracy is low. For example, a video of playing basketball, such as the method of CN112766354a, uses the action category "playing basketball" as the node of the knowledge graph, but attributes such as height, age, position of the athlete are likely to influence the result of action recognition, which may lead to inaccurate recognition.
Disclosure of Invention
The application aims to provide a small sample action recognition system and method based on knowledge graph guidance, aims at videos with richer semantic information (content), establishes a knowledge graph with finer granularity aiming at various semantemes in the videos, integrates various attributes into the construction of the knowledge graph, uses the knowledge graph in subsequent action recognition, and improves the accuracy of small sample action recognition.
The small sample action recognition method based on knowledge graph guidance comprises the following steps: constructing a knowledge graph for identifying the actions, wherein the knowledge graph comprises actions and attributes of the actions;
selecting a video set of known action categories as a training set, and selecting a part of videos in each action category of the training set as a support sample and the other part of videos as a query sample; extracting the characteristics of all relevant nodes in the knowledge graph through a graph convolution neural network, and taking the characteristics as knowledge graph characteristics; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesVideo feature vectorThe corresponding label is->The method comprises the steps of carrying out a first treatment on the surface of the Calculating video prototype feature vector FC with video feature vector of query sample +.>Cosine loss of (2)The method comprises the steps of carrying out a first treatment on the surface of the Training the direction propagation of the graph convolution neural network;
dividing a video set of unknown action categories into a support sample and a query sample; extracting the characteristics of all relevant nodes in the knowledge graph through the trained graph convolution neural network, and taking the characteristics as the characteristics of the knowledge graph; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesThe method comprises the steps of carrying out a first treatment on the surface of the Calculating the feature vector FC and the feature vector +.>And taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category.
Further preferably, the step of constructing the knowledge graph is as follows:
s11, combining the structural characteristics of the motion recognition corpus, and designing a mode layer aiming at the motion category video; the mode layer comprises word categories which need to be extracted from the active corpus and connection relations between the word categories; the word categories are all action categories, scenes where actions occur, objects applying actions and objects related to the actions; taking each word as a node in the knowledge graph; taking the action category as a center, and taking a scene, an object and an object as attributes of the action to be connected with the action; the knowledge graph is designed into an undirected graph so as to facilitate knowledge transfer;
s12, extracting an alternative entity from the action recognition corpus by using an entity extraction technology: extracting all entities from the introduction pages of each action on the action recognition corpus by using an entity extraction algorithm, and then screening four categories of entities of actions, scenes, objects and objects from all the entities by using a part-of-speech selection algorithm to serve as candidate entities; manually selecting a plurality of words which are strongly related to the action category from the candidate entities;
s13, finding a new entity similar to the manually selected entity from the motion recognition corpus according to the cosine distance of the word vector between the candidate entity and the manually selected entity;
s14, filtering the discovered new entity by using an entity disambiguation technology, screening out ambiguous words, and then manually screening again;
s15, taking the video action as a central node, taking important action attributes as common nodes, and taking the relation between the actions and the attributes as edges to obtain a knowledge graph.
Further preferably, the method for extracting the knowledge graph features comprises the following steps: modeling the knowledge graph by adopting an adjacency matrix, wherein the transverse dimension and the longitudinal dimension of the adjacency matrix represent related nodes, the related nodes with connection are marked as 1 in the knowledge graph, and otherwise, the related nodes with connection are marked as 0; the graph convolution neural network takes word vectors of all relevant nodes as input, information is transmitted to all the nodes through the knowledge diffusion characteristic of the graph convolution neural network, the information is output as final characteristics of each node, and the final characteristics of all the nodes form knowledge graph characteristics.
Further preferably, the cosine loss is calculated as follows:
wherein ,video feature vector representing video prototype feature vector FC with query samples>And (3) compared with cosine similarity, wherein Norm is L2 regularization.
The application also provides a small sample motion recognition system based on knowledge graph guidance, which comprises a knowledge graph construction module, an information transmission module based on a graph convolution neural network and a motion information recognition module, wherein the knowledge graph construction module extracts relevant corpus of motion attributes from a motion recognition corpus and constructs a knowledge graph; the information transmission module utilizes a knowledge graph and combines a graph convolution neural network to enable motion related information to be transmitted among different devices; dividing a video set of unknown action categories into a support sample and a query sample; extracting the characteristics of all relevant nodes in the knowledge graph through the trained graph convolution neural network, and taking the characteristics as the characteristics of the knowledge graph; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesThe method comprises the steps of carrying out a first treatment on the surface of the The motion information recognition module calculates a video prototype feature vector FC and a video feature vector +.>And taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category.
The application also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the small sample action recognition method based on the knowledge graph guidance.
The present application also provides a computer program product comprising a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the knowledge-graph-guided small sample action recognition method described above.
The application also provides an electronic device, comprising: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a knowledge-graph guided small sample action recognition method.
The application has the beneficial effects that: the method comprises the steps of firstly extracting relevant linguistic data of action attributes from an action recognition corpus and constructing a knowledge graph. The information transmission module combines the constructed knowledge graph with the graph convolution neural network so that the action information can be transmitted among different devices; and finally, classifying the propagated information by an action information recognition module. The method comprises the steps of training a knowledge graph and a graph convolution neural network, and dividing a video set of unknown action categories into a support sample and a query sample; obtaining a video prototype feature vector through a support sample and a knowledge graph; and calculating the cosine similarity between the video prototype feature vector and the video feature vector of the query sample, and taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category, thereby improving the accuracy of small sample action recognition.
Drawings
Fig. 1 is a schematic diagram of the present application.
Detailed Description
The application is further elucidated in detail below in connection with the accompanying drawings.
Referring to fig. 1, the method for identifying the small sample motion based on knowledge graph guidance comprises the following steps:
s1, constructing a knowledge graph in a semi-automatic mode:
s11, combining the structural characteristics of the motion recognition corpus, and firstly designing a mode (schema) layer aiming at the motion category video. The pattern layer contains the word categories that need to be extracted from the active corpus and the connection relations between them. The required word categories are the categories of all actions, the scenes in which the actions occur, the objects to which the actions are applied (e.g. basketball players), the objects associated with the actions (e.g. basketball, basketball baskets); taking each word as a node in the knowledge graph; taking the action category as a center, and connecting other scenes, objects and objects with the action as attributes of the action; the knowledge graph is designed as an undirected graph to facilitate knowledge transfer.
S12, extracting an alternative entity from the action recognition corpus by using an entity extraction technology: extracting all entities (words with all meanings) from the introduction page of each action on the action recognition corpus by using an entity extraction algorithm, and then screening four categories of entities of actions, scenes, objects and objects from all the entities by using a part-of-speech selection algorithm to serve as candidate entities; and manually select from the candidate entities a number of words that are strongly related to the action category.
S13, finding a new entity similar to the manually selected entity from the motion recognition corpus according to the cosine distance of the word vector between the candidate entity and the manually selected entity;
s14, filtering the discovered new entity by using an entity disambiguation technology, screening out some ambiguous words, and then manually screening again;
s15, taking the video action as a central node, taking important action attributes as common nodes, and taking the relation between the actions and the attributes as edges to obtain a knowledge graph.
Step S2, training a graph convolutional neural network:
s21, dividing a video set with known action categories into a training set and a testing set; n (k+q) segments of video are sampled from the training set. N is the action category number. K pieces of video in each action category are used as support samples, and q pieces of video are used as query samples.
S22, obtaining video prototype vectors of each action category, extracting features of all relevant nodes in a knowledge graph through a graph convolution neural network, and taking the features as knowledge graph features (the relevant nodes are determined according to categories to be classified, for example, two actions of playing basketball and playing football are classified, and the nodes of the basketball and the football and all nodes connected with the nodes of the basketball and the football are taken as relevant nodes, wherein the relevant nodes are attributes of videos), and the specific method is as follows:
s221, extracting video features of the support samples of each action category by the feature extraction network, and averaging in the dimension of the action category, so as to obtain n average video features, wherein n video features form video feature vectors of the support samples.
S222, modeling the knowledge graph by adopting an adjacent matrix, wherein the transverse dimension and the longitudinal dimension of the adjacent matrix represent related nodes, the related nodes with connection are marked as 1 in the knowledge graph, and otherwise, the related nodes with connection are marked as 0; the graph convolution neural network takes word vectors of all relevant nodes as input, information is transmitted to all the nodes through the knowledge diffusion characteristic of the graph convolution neural network, the information is output as final characteristics of each node, and the final characteristics of all the nodes form knowledge graph characteristics. The specific mathematical expression is as follows:
X t+1 =AX t W t
wherein t is the number of layers of the graph roll-up neural network, W t The parameters of the neural network are rolled for the t layer diagram, A is an adjacent matrix and X t For the input of the layer t graph convolution neural network, X when t is 1 1 Neural network parameters are rolled for a randomly initialized map. X is X t+1 For the output of the t layer of the graph-convolution neural network, and also for the input of the t+1 layer of the graph-convolution neural network, the graph-convolution neural network of the embodiment has 3 layers, and the output X of the 3 layer of the graph-convolution neural network 4 As an output of the entire graph convolution neural network. The dimension of the knowledge graph features is m x d, m is the number of nodes related to the action category, d is the dimension of the output features, and the dimensions are 2048 dimensions as the dimension of the video features.
S223, carrying out dot multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video: k= P.z, where K represents various attribute features of the video, P represents knowledge-graph features, and z represents video feature vectors of support samples extracted by the feature extraction network. Splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector, wherein the formula is as follows: fc=cat (z, K), where cat is a stitching operation and FC represents a video prototype feature vector.
S224, extracting video feature vectors of query samples by a feature extraction networkVideo feature vector->The corresponding label is->
S225, calculating video prototype feature vectors FC and video feature vectors of query samplesCosine loss of (2)The method comprises the steps of carrying out a first treatment on the surface of the The graph roll-up neural network is directionally propagated to perform training. The concrete mathematical expression of cosine loss is as follows:
wherein ,video feature vector representing video prototype feature vector FC with query samples>And (3) compared with cosine similarity, wherein Norm is L2 regularization.
S3, identifying actions:
s31. the test set contains n (k+q) segments of video. Wherein n is the action category number. K pieces of video in each action category are used as support samples, and q pieces of video are used as query samples.
S32, obtaining video prototype vectors of each action category, extracting features of all relevant nodes in a knowledge graph through a graph convolution neural network, and taking the features as knowledge graph features (the relevant nodes are determined according to the categories to be classified, for example, basketball and football actions to be classified, and taking basketball and football nodes and all nodes connected with the basketball and football nodes as relevant nodes, wherein the relevant nodes are attributes of videos), and the specific method is as follows:
s321, extracting video features of the support samples of each action category by the feature extraction network, averaging in the dimension of the action category, so as to obtain n average video features, and forming video feature vectors of the support samples by the n video features.
S322, modeling the knowledge graph by adopting an adjacent matrix, wherein the transverse dimension and the longitudinal dimension of the adjacent matrix represent related nodes, the related nodes with the relation are marked as 1 in the knowledge graph, otherwise, the related nodes are marked as 0, the graph convolution neural network takes word vectors of all related nodes as input, information is transmitted to all the nodes through the characteristic of knowledge diffusion of the graph convolution neural network, the information is output as final characteristics of each node, and the final characteristics of all the nodes form knowledge graph characteristics. The specific mathematical expression is as follows:
X t+1 =AX t W t
wherein t is the number of layers of the graph roll-up neural network, W t The parameters of the neural network are rolled for the t layer diagram, A is an adjacent matrix and X t For the input of the layer t graph convolution neural network, X when t is 1 1 Neural network parameters are rolled for a randomly initialized map. X is X t+1 For the output of the t layer of the graph-convolution neural network, and also for the input of the t+1 layer of the graph-convolution neural network, the graph-convolution neural network of the embodiment has 3 layers, and the output X of the 3 layer of the graph-convolution neural network 4 As an output of the entire graph convolution neural network. The dimension of the knowledge graph features is m x d, m is the number of nodes related to the action category, d is the dimension of the output features, and the dimensions are 2048 dimensions as the dimension of the video features.
S323, carrying out dot multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video: k= P.z, where K represents various attribute features of the video, P represents knowledge-graph features, and z represents video feature vectors of support samples extracted by the feature extraction network. Splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector, wherein the formula is as follows: fc=cat (z, K), where cat is a stitching operation and FC represents a video prototype feature vector.
S324, extracting video feature vectors of query samples
S325, calculating a video prototype feature vector FC and a video feature vectorAnd taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category. The cosine similarity formula is as follows:
the unlabeled query sample during testing requires the graph convolutional neural network to give out label prediction of the query sample as a result of action recognition. In actual use, the video set of unknown action categories is divided into a support sample and a query sample, and the processing is similar to that of the test set.
Referring to fig. 1, a small sample motion recognition system based on knowledge graph guidance comprises a knowledge graph construction module, an information transmission module based on a graph convolution neural network and a motion information recognition module, wherein the knowledge graph construction module extracts relevant corpora of motion attributes from a motion recognition corpus and constructs a knowledge graph; the information transmission module utilizes a knowledge graph and combines a graph convolution neural network to enable motion related information to be transmitted between different devicesSowing; dividing a video set of unknown action categories into a support sample and a query sample; extracting the characteristics of all relevant nodes in the knowledge graph through the trained graph convolution neural network, and taking the characteristics as the characteristics of the knowledge graph; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesThe method comprises the steps of carrying out a first treatment on the surface of the The motion information recognition module calculates a video prototype feature vector FC and a video feature vector +.>And taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category.
In another embodiment, a non-volatile computer storage medium is provided, the computer storage medium storing computer executable instructions that are capable of performing the knowledge-graph-guided small sample motion recognition method of any of the above embodiments.
The present embodiment also provides a computer program product comprising a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the knowledge-graph guided small sample action recognition method of the above embodiments.
The present embodiment provides an electronic device including: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a knowledge-graph guided small sample action recognition method.
The above-described specific embodiments further illustrate the objects, technical solutions and technical effects of the present application in detail. It should be understood that the foregoing is only illustrative of the present application and is not intended to limit the scope of the application, and that all equivalent changes and modifications that may be made by those skilled in the art without departing from the spirit and principles of the application shall fall within the scope of the application.

Claims (7)

1. The small sample action recognition method based on knowledge graph guidance is characterized by constructing a knowledge graph for action recognition, wherein the knowledge graph comprises actions and attributes of the actions;
selecting a video set of known action categories as a training set, and selecting a part of videos in each action category of the training set as a support sample and the other part of videos as a query sample; extracting the characteristics of all relevant nodes in the knowledge graph through a graph convolution neural network, and taking the characteristics as knowledge graph characteristics; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesVideo feature vector->The corresponding label is->The method comprises the steps of carrying out a first treatment on the surface of the Calculating video prototype feature vector FC with video feature vector of query sample +.>Cosine of (2)Loss ofThe method comprises the steps of carrying out a first treatment on the surface of the Training the direction propagation of the graph convolution neural network;
when the motion is identified, dividing the video set into a support sample and a query sample, wherein the query sample is of an unknown motion type; extracting the characteristics of all relevant nodes in the knowledge graph through the trained graph convolution neural network, and taking the characteristics as the characteristics of the knowledge graph; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesThe method comprises the steps of carrying out a first treatment on the surface of the Calculating the feature vector FC and the feature vector +.>And taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category.
2. The knowledge-graph-guided-based small sample motion recognition method of claim 1, wherein the step of constructing a knowledge graph comprises:
s11, combining the structural characteristics of the motion recognition corpus, and designing a mode layer aiming at the motion category video; the mode layer comprises word categories extracted from the active corpus and connection relations between the word categories; the word categories are all action categories, scenes where actions occur, objects applying actions and objects related to the actions; taking each word as a node in the knowledge graph; taking the action category as a center, and taking a scene, an object and an object as attributes of the action to be connected with the action; the knowledge graph is designed into an undirected graph so as to facilitate knowledge transfer;
s12, extracting an alternative entity from the action recognition corpus by using an entity extraction technology: extracting all entities from the introduction pages of each action on the action recognition corpus by using an entity extraction algorithm, and then screening four categories of entities of actions, scenes, objects and objects from all the entities by using a part-of-speech selection algorithm to serve as candidate entities; manually selecting a plurality of words which are strongly related to the action category from the candidate entities;
s13, finding a new entity similar to the manually selected entity from the motion recognition corpus according to the cosine distance of the word vector between the candidate entity and the manually selected entity;
s14, filtering the discovered new entity by using an entity disambiguation technology, screening out ambiguous words, and then manually screening again;
s15, taking the video action as a central node, taking important action attributes as common nodes, and taking the relation between the actions and the attributes as edges to obtain a knowledge graph.
3. The knowledge-graph-guide-based small sample motion recognition method according to claim 1, wherein the method for extracting knowledge-graph features is as follows: modeling the knowledge graph by adopting an adjacency matrix, wherein the transverse dimension and the longitudinal dimension of the adjacency matrix represent related nodes, the related nodes with connection are marked as 1 in the knowledge graph, and otherwise, the related nodes with connection are marked as 0; the graph convolution neural network takes word vectors of all relevant nodes as input, information is transmitted to all the nodes through the knowledge diffusion characteristic of the graph convolution neural network, the information is output as final characteristics of each node, and the final characteristics of all the nodes form knowledge graph characteristics.
4. The knowledge-graph-guided small sample motion recognition method of claim 1, wherein cosine loss is calculated as:
wherein ,video feature vector representing video prototype feature vector FC with query samples>And (3) compared with cosine similarity, wherein Norm is L2 regularization.
5. A system for implementing the small sample motion recognition method based on knowledge graph guidance of claim 1, comprising a knowledge graph construction module, an information propagation module based on a graph convolution neural network, and a motion information recognition module, wherein the knowledge graph construction module extracts relevant corpora of motion attributes from a motion recognition corpus and constructs a knowledge graph; the information transmission module utilizes a knowledge graph and combines a graph convolution neural network to enable motion related information to be transmitted among different nodes; when the motion is identified, dividing the video set into a support sample and a query sample, wherein the query sample is of an unknown motion type; extracting the characteristics of all relevant nodes in the knowledge graph through the trained graph convolution neural network, and taking the characteristics as the characteristics of the knowledge graph; performing point multiplication on the knowledge-graph features and the video feature vectors of the support samples extracted by the feature extraction network, so that the video features and the knowledge-graph features interact to obtain various attribute features of the video; splicing various attribute features of the video with video feature vectors of support samples extracted by a feature extraction network to obtain a video prototype feature vector FC; feature extraction network extracts video feature vectors of query samplesThe method comprises the steps of carrying out a first treatment on the surface of the The motion information recognition module calculates a video prototype feature vector FC and a video feature vector +.>And taking the label corresponding to the video prototype feature vector with the maximum cosine similarity as the predicted action category.
6. A non-transitory computer storage medium storing computer executable instructions for performing the knowledge-based guided small sample motion recognition method of any one of claims 1-4.
7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, wherein the instructions are executable by the at least one processor to enable the at least one processor to perform the knowledge-graph guided small sample motion recognition method of any one of claims 1-4.
CN202310619753.XA 2023-05-30 2023-05-30 Knowledge graph guide-based small sample action recognition method and system Active CN116386148B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310619753.XA CN116386148B (en) 2023-05-30 2023-05-30 Knowledge graph guide-based small sample action recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310619753.XA CN116386148B (en) 2023-05-30 2023-05-30 Knowledge graph guide-based small sample action recognition method and system

Publications (2)

Publication Number Publication Date
CN116386148A CN116386148A (en) 2023-07-04
CN116386148B true CN116386148B (en) 2023-08-11

Family

ID=86980937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310619753.XA Active CN116386148B (en) 2023-05-30 2023-05-30 Knowledge graph guide-based small sample action recognition method and system

Country Status (1)

Country Link
CN (1) CN116386148B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117252262A (en) * 2023-09-28 2023-12-19 四川大学 Knowledge graph construction and patent information retrieval method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766354A (en) * 2021-01-13 2021-05-07 中国科学院计算技术研究所 Knowledge graph-based small sample picture identification method and system
CN113641797A (en) * 2021-08-30 2021-11-12 腾讯科技(深圳)有限公司 Data processing method, device, equipment, storage medium and computer program product
CN114333064A (en) * 2021-12-31 2022-04-12 江南大学 Small sample behavior identification method and system based on multidimensional prototype reconstruction reinforcement learning
CN114898466A (en) * 2022-05-13 2022-08-12 埃夫特智能装备股份有限公司 Video motion recognition method and system for smart factory
CN115129884A (en) * 2022-05-31 2022-09-30 国家计算机网络与信息安全管理中心 Knowledge graph completion method and system based on semantic interaction matching network
CN115294648A (en) * 2022-08-01 2022-11-04 中国农业银行股份有限公司 Man-machine gesture interaction method and device, mobile terminal and storage medium
CN115761576A (en) * 2022-11-04 2023-03-07 国家电网有限公司信息通信分公司 Video motion recognition method and device and storage medium
CN115965968A (en) * 2022-12-01 2023-04-14 西安电子科技大学 Small sample target detection and identification method based on knowledge guidance
CN116152554A (en) * 2023-01-16 2023-05-23 复旦大学 Knowledge-guided small sample image recognition system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766354A (en) * 2021-01-13 2021-05-07 中国科学院计算技术研究所 Knowledge graph-based small sample picture identification method and system
CN113641797A (en) * 2021-08-30 2021-11-12 腾讯科技(深圳)有限公司 Data processing method, device, equipment, storage medium and computer program product
CN114333064A (en) * 2021-12-31 2022-04-12 江南大学 Small sample behavior identification method and system based on multidimensional prototype reconstruction reinforcement learning
CN114898466A (en) * 2022-05-13 2022-08-12 埃夫特智能装备股份有限公司 Video motion recognition method and system for smart factory
CN115129884A (en) * 2022-05-31 2022-09-30 国家计算机网络与信息安全管理中心 Knowledge graph completion method and system based on semantic interaction matching network
CN115294648A (en) * 2022-08-01 2022-11-04 中国农业银行股份有限公司 Man-machine gesture interaction method and device, mobile terminal and storage medium
CN115761576A (en) * 2022-11-04 2023-03-07 国家电网有限公司信息通信分公司 Video motion recognition method and device and storage medium
CN115965968A (en) * 2022-12-01 2023-04-14 西安电子科技大学 Small sample target detection and identification method based on knowledge guidance
CN116152554A (en) * 2023-01-16 2023-05-23 复旦大学 Knowledge-guided small sample image recognition system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KDM: A knowledge-guided and data-driven method for few-shot video action recognition;Yanfei Qin, Baolin Liu;《Neurocomputing》;第69-78页 *

Also Published As

Publication number Publication date
CN116386148A (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN108804530B (en) Subtitling areas of an image
Kaymak et al. A brief survey and an application of semantic image segmentation for autonomous driving
Zhou et al. Contextual ensemble network for semantic segmentation
Li et al. Tobler’s First Law in GeoAI: A spatially explicit deep learning model for terrain feature detection under weak supervision
CN111444340A (en) Text classification and recommendation method, device, equipment and storage medium
CN112733866B (en) Network construction method for improving text description correctness of controllable image
US20220222918A1 (en) Image retrieval method and apparatus, storage medium, and device
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN116186317B (en) Cross-modal cross-guidance-based image-text retrieval method and system
CN112364174A (en) Patient medical record similarity evaluation method and system based on knowledge graph
WO2024041479A1 (en) Data processing method and apparatus
CN107480194B (en) Method and system for constructing multi-mode knowledge representation automatic learning model
Yang et al. Local label descriptor for example based semantic image labeling
CN116386148B (en) Knowledge graph guide-based small sample action recognition method and system
Chen et al. Binarized neural architecture search for efficient object recognition
CN113868448A (en) Fine-grained scene level sketch-based image retrieval method and system
CN115563327A (en) Zero sample cross-modal retrieval method based on Transformer network selective distillation
CN110598022A (en) Image retrieval system and method based on robust deep hash network
Jishan et al. Bangla language textual image description by hybrid neural network model
CN115187910A (en) Video classification model training method and device, electronic equipment and storage medium
Liu et al. Student behavior recognition from heterogeneous view perception in class based on 3-D multiscale residual dense network for the analysis of case teaching
CN110867225A (en) Character-level clinical concept extraction named entity recognition method and system
CN113869005A (en) Pre-training model method and system based on sentence similarity
CN114648005A (en) Multi-fragment machine reading understanding method and device for multitask joint learning
Tang et al. Fast semantic segmentation network with attention gate and multi-layer fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant