CN111401106B - Behavior identification method, device and equipment - Google Patents

Behavior identification method, device and equipment Download PDF

Info

Publication number
CN111401106B
CN111401106B CN201910000953.0A CN201910000953A CN111401106B CN 111401106 B CN111401106 B CN 111401106B CN 201910000953 A CN201910000953 A CN 201910000953A CN 111401106 B CN111401106 B CN 111401106B
Authority
CN
China
Prior art keywords
network
identified
behavior
training
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910000953.0A
Other languages
Chinese (zh)
Other versions
CN111401106A (en
Inventor
丁晓璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Ltd Research Institute filed Critical China Mobile Communications Group Co Ltd
Priority to CN201910000953.0A priority Critical patent/CN111401106B/en
Publication of CN111401106A publication Critical patent/CN111401106A/en
Application granted granted Critical
Publication of CN111401106B publication Critical patent/CN111401106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a behavior identification method, a behavior identification device and behavior identification equipment, wherein the behavior identification method comprises the following steps: acquiring key frame sequence data to be identified from bone sequence data to be identified; and identifying the behavior action corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network. The scheme acquires the sequence data of the key frames to be identified from the sequence data of the bones to be identified; identifying behavior actions corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network; redundant information interference can be reduced, and the workload of behavior identification is reduced; the time domain and space domain characteristics of the skeleton sequence are simultaneously excavated, the model design complexity is reduced, and the identification efficiency and accuracy are improved.

Description

Behavior identification method, device and equipment
Technical Field
The invention relates to the technical field of human behavior recognition, in particular to a behavior recognition method, a behavior recognition device and behavior recognition equipment.
Background
The following behavior recognition schemes mainly exist in the prior art:
in the first scheme, video behavior recognition: and directly taking video image data as input, and performing behavior recognition by using a deep learning method. The method can be approximately regarded as classifying images of each frame in multi-frame images, and providing a behavior identification result according to the classification result of all the images.
Second approach, bone sequence behavior recognition: the human skeleton node has higher robustness on illumination and visual angle change, the data volume is less, the calculation resource consumption is less, along with the improvement of the equipment precision, the skeleton node can obtain more accurate coordinates through a depth camera or a motion capture system, and the identification effect can be improved by taking the skeleton sequence as the input of a depth network.
The third scheme is based on the behavior recognition of RNN (Recurrent Neural Networks): due to the good nature of RNN itself, behavior recognition using RNN does not require separate consideration of time dimension modeling. The existing scheme is mainly based on an LSTM (Long Short Term Memory Network), and the identification accuracy is continuously improved by introducing a trust door mechanism, adding an attention model and the like.
A fourth scheme, based on CNN (Convolutional Neural Networks) behavior recognition: the input of the CNN is generally euclidean structure data arranged in a matrix form, and in the existing documents, the 2D CNN for image classification is expanded into 3D CNN for video identification, and methods such as video segmentation and multitask parallel computation are adopted to achieve good effects.
However, the above four behavior recognition schemes have the following disadvantages:
for the first scheme, video behavior recognition: the calculation amount is huge, about 5M of an image classification (101-type) network is obtained, and the number of network parameters reaches about 33M after the image classification (101-type) network is expanded to video classification; the global long-distance context information is difficult to extract, the video classification result not only depends on the identification result of a single picture but also depends on the dynamic change among picture sequences, but the limited storage capacity and computing capacity are difficult to capture the global context dynamic information at long distance; sensitive to illumination and visual angle change.
For the second approach described above, the bone sequence behavior recognition: the existing bone behavior identification method mainly comprises two types, one type is that the joint dynamics is captured by manually extracting features, and the features are skillfully designed and a large amount of manpower is needed; one is to use a deep learning method, but the existing documents mostly start from the perspective of dividing body parts, and extraction of deeper level feature association is limited to a certain extent.
For the third scheme, based on the behavior recognition of the RNN network: the RNN can well acquire time dimension information, but has limited ability to extract data features; the existing method is mainly based on multilayer RNN stacking (stacked RNN), and is difficult to train in practical application.
For the fourth scheme, based on the behavior recognition of the CNN network: the existing algorithm generally calculates each frame of data input by CNN without distinction, which wastes calculation capacity and brings noise interference; the CNN network needs to model the time dimension independently, so that the difficulty of model design is increased; the input requirement of CNN networks is the euclidean structure, which, if identified based on skeletal behavior, loses the natural connectivity properties between skeletal nodes.
Therefore, the existing human behavior recognition scheme has the disadvantages of large calculation amount, low recognition efficiency, complex network and a plurality of difficulties in practical application.
Disclosure of Invention
The invention aims to provide a behavior recognition method, a behavior recognition device and behavior recognition equipment, and solves the problems of large calculation amount, low recognition efficiency and complex network of a human behavior recognition scheme in the prior art.
In order to solve the above technical problem, an embodiment of the present invention provides a behavior identification method, including:
acquiring key frame sequence data to be identified from bone sequence data to be identified;
and identifying the behavior action corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network.
Optionally, the identifying, by using a space-time graph convolutional network, a behavior action corresponding to the sequence data of the key frames to be identified includes:
constructing the key frame sequence data to be identified into a bone sequence space-time diagram to be identified;
extracting a spatiotemporal feature to be identified from the bone sequence spatiotemporal image to be identified by utilizing a spatiotemporal image convolution network;
and identifying the behavior action corresponding to the space-time characteristic to be identified.
Optionally, the identifying a behavior action corresponding to the to-be-identified spatiotemporal feature includes:
and obtaining the behavior action corresponding to the space-time characteristic to be identified by utilizing the normalized exponential function.
Optionally, the acquiring the sequence data of the key frame to be identified from the bone sequence data to be identified includes:
and acquiring the sequence data of the key frame to be identified from the bone sequence data to be identified by utilizing a frame rectification network.
Optionally, before acquiring the sequence data of the key frame to be identified from the bone sequence data to be identified by using the frame rectification network, the method further includes:
and training the frame rectification network and the space-time graph convolution network.
Optionally, the training the frame rectification network and the space-time graph convolution network includes:
acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior and action by using the frame rectification network;
training the space-time graph convolutional network by using the key frame sequence training data, and identifying training behavior actions corresponding to the key frame sequence training data by using the trained space-time graph convolutional network;
adjusting the frame rectification network reversely according to the training behavior action;
acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior action again by using the adjusted frame rectification network;
and training the time-space diagram convolutional network again by using the adjusted key frame sequence training data acquired by the frame rectifying network.
Optionally, after the time-space graph convolutional network is trained again by using the adjusted training data of the sequence of key frames acquired by the frame rectification network, the method further includes:
identifying training behavior actions corresponding to the adjusted key frame sequence training data obtained by the frame rectification network by using the retrained space-time graph convolution network;
if the obtained continuous times of the training behavior action consistent with the preset behavior action are larger than a first threshold value, storing parameter information of the frame rectification network and the space-time diagram convolution network; otherwise, returning to the action according to the training behavior, and reversely adjusting the operation of the frame rectification network.
Optionally, the reversely adjusting the frame rectification network according to the training behavior action includes:
acquiring a return function value according to the training behavior action;
obtaining a loss function according to the return function value;
and carrying out gradient descent on the loss function, and updating the network weight of the frame rectification network.
Optionally, the loss function is specifically:
l=α(r t +γ×max Q(s t+1 ,ω)-Q(s t ,ω));
where l denotes a loss function, α denotes a preset learning speed, r t Represents the return function value when the frame rectification network is trained for the t time, gamma represents a preset attenuation value,
Q(s t+1 ω) and Q(s) t ω) each represent a function of motion value, s t+1 Representing the bone sequence state, s, corresponding to the key frame sequence training data when the frame rectification network is trained for the t +1 th time t And representing the bone sequence state corresponding to the key frame sequence training data when the frame rectification network is trained for the t time, wherein omega represents the network weight of the frame rectification network.
An embodiment of the present invention further provides a behavior recognition apparatus, including:
the first acquisition module is used for acquiring the key frame sequence data to be identified from the bone sequence data to be identified;
and the first identification module is used for identifying the behavior action corresponding to the key frame sequence data to be identified by utilizing a space-time graph convolutional network.
Optionally, the first identification module includes:
the first construction submodule is used for constructing the key frame sequence data to be identified into a bone sequence space-time diagram to be identified;
the first extraction submodule is used for extracting the spatiotemporal features to be identified from the bone sequence spatiotemporal image to be identified by utilizing a spatiotemporal image convolution network;
and the first identification submodule is used for identifying the behavior action corresponding to the space-time characteristic to be identified.
Optionally, the first identification submodule includes:
and the first processing unit is used for obtaining the behavior action corresponding to the space-time feature to be identified by utilizing the normalized exponential function.
Optionally, the first obtaining module includes:
and the first acquisition sub-module is used for acquiring the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network.
Optionally, the method further includes:
the first training module is used for training the frame rectification network and the space-time graph convolution network before acquiring the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network.
Optionally, the first training module includes:
the second acquisition submodule is used for acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior action by using the frame rectification network;
the first processing submodule is used for training the space-time graph convolutional network by utilizing the key frame sequence training data and identifying a training behavior action corresponding to the key frame sequence training data by utilizing the trained space-time graph convolutional network;
the first adjusting submodule is used for adjusting the frame rectification network reversely according to the training behavior action;
the third acquisition sub-module is used for acquiring the key frame sequence training data from the bone sequence data corresponding to the preset behavior action again by using the adjusted frame rectification network;
and the first training submodule is used for retraining the space-time diagram convolutional network again by using the adjusted key frame sequence training data acquired by the frame rectification network.
Optionally, the method further includes:
the second identification module is used for identifying the training behavior action corresponding to the adjusted key frame sequence training data obtained by the frame rectification network by using the retrained space-time graph convolution network after retraining the space-time graph convolution network by using the adjusted key frame sequence training data obtained by the frame rectification network;
the first processing module is used for storing the parameter information of the frame rectification network and the space-time diagram convolution network if the continuous times that the obtained training behavior action is consistent with the preset behavior action are larger than a first threshold value; otherwise, returning to the action according to the training behavior, and reversely adjusting the operation of the frame rectification network.
Optionally, the reversely adjusting the frame rectification network according to the training behavior action includes:
acquiring a return function value according to the training behavior action;
obtaining a loss function according to the return function value;
and carrying out gradient descent on the loss function, and updating the network weight of the frame rectification network.
Optionally, the loss function is specifically:
l=α(r t +γ×max Q(s t+1 ,ω)-Q(s t ,ω));
where l denotes a loss function, α denotes a preset learning speed, r t Representing the value of a return function when the frame rectification network is trained for the t time, gamma represents a preset attenuation value,
Q(s t+1 ω) and Q(s) t ω) each represent a function of motion value, s t+1 Represents bones corresponding to the key frame sequence training data when the frame rectification network is trained for the (t + 1) th timeSkeletal sequence State, s t And representing the bone sequence state corresponding to the key frame sequence training data when the frame rectification network is trained for the t time, wherein omega represents the network weight of the frame rectification network.
The embodiment of the invention also provides behavior recognition equipment, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor; the processor implements the above-described behavior recognition method when executing the program.
Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the behavior recognition method.
The technical scheme of the invention has the following beneficial effects:
in the scheme, the behavior recognition method obtains the key frame sequence data to be recognized from the bone sequence data to be recognized; identifying behavior actions corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network; redundant information interference can be reduced, and the workload of behavior identification is reduced; the time domain and space domain characteristics of the skeleton sequence are simultaneously excavated, the model design complexity is reduced, and the identification efficiency and accuracy are improved.
Drawings
FIG. 1 is a flow chart of a behavior recognition method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a modeling method of a time-space domain diagram of a bone sequence according to an embodiment of the present invention;
fig. 3 is a first diagram illustrating a specific implementation of the behavior recognition method according to the embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a specific implementation of the behavior recognition method according to the embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a behavior recognition device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a behavior recognition device according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a behavior recognition method aiming at the problems of large calculation amount, low recognition efficiency and complex network of a human behavior recognition scheme in the prior art, as shown in figure 1, the method comprises the following steps:
step 11: acquiring key frame sequence data to be identified from bone sequence data to be identified;
step 12: and identifying the behavior action corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network.
The behavior recognition method provided by the embodiment of the invention obtains the key frame sequence data to be recognized from the bone sequence data to be recognized; identifying behavior actions corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network; redundant information interference can be reduced, and the workload of behavior identification is reduced; the time domain and space domain characteristics of the skeleton sequence are simultaneously excavated, the model design complexity is reduced, and the identification efficiency and accuracy are improved.
The method for identifying the behavior action corresponding to the to-be-identified key frame sequence data by utilizing the space-time graph convolutional network comprises the following steps of: constructing the key frame sequence data to be identified into a bone sequence space-time diagram to be identified; extracting a spatiotemporal feature to be identified from the bone sequence spatiotemporal image to be identified by utilizing a spatiotemporal image convolution network; and identifying the behavior action corresponding to the space-time characteristic to be identified.
Specifically, the identifying a behavior corresponding to the to-be-identified spatiotemporal feature includes: and obtaining the behavior action corresponding to the space-time characteristic to be identified by utilizing the normalized exponential function.
In order to extract a frame with rich information content, more identification and representativeness; in an embodiment of the present invention, the acquiring the sequence data of the key frame to be identified from the bone sequence data to be identified includes: and acquiring the key frame sequence data to be identified from the bone sequence data to be identified by utilizing a frame rectification network.
Further, before obtaining the sequence data of the key frames to be identified from the bone sequence data to be identified by using the frame rectification network, the method further comprises the following steps: and training the frame rectification network and the time-space diagram convolution network.
Wherein training the frame rectification network and the space-time graph convolution network comprises: acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior and action by using the frame rectification network; training the space-time graph convolutional network by using the key frame sequence training data, and identifying training behavior actions corresponding to the key frame sequence training data by using the trained space-time graph convolutional network; adjusting the frame rectification network reversely according to the training behavior action;
acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior action again by using the adjusted frame rectification network; and training the time-space diagram convolutional network again by using the adjusted key frame sequence training data acquired by the frame rectifying network.
Further, after the time-space graph convolutional network is trained again by using the adjusted training data of the key frame sequence acquired by the frame rectification network, the method further includes: identifying training behavior actions corresponding to the adjusted key frame sequence training data obtained by the frame rectification network by using the retrained space-time graph convolution network; if the obtained continuous times of the training behavior action consistent with the preset behavior action are larger than a first threshold value, storing parameter information of the frame rectification network and the space-time diagram convolution network; otherwise, returning to the action according to the training behavior, and reversely adjusting the operation of the frame rectification network.
Wherein the adjusting the frame rectification network in a reverse direction according to the training behavior comprises: acquiring a return function value according to the training behavior action; obtaining a loss function according to the return function value; and carrying out gradient descent on the loss function, and updating the network weight of the frame rectification network.
Specifically, the loss function is specifically: l = a (r) t +γ×max Q(s t+1 ,ω)-Q(s t ω)); where l denotes a loss function and α denotes a preset learning speed,r t Represents the value of a return function when the frame rectification network is trained for the t time, gamma represents a preset attenuation value, Q(s) t+1 ω) and Q (st, ω) both represent a motion value function, s t+1 Representing the bone sequence state, s, corresponding to the key frame sequence training data when the frame rectification network is trained for the t +1 th time t And representing the bone sequence state corresponding to the key frame sequence training data when the frame rectification network is trained for the t time, wherein omega represents the network weight of the frame rectification network.
The behavior recognition method provided by the embodiment of the invention is further described below.
In view of the above technical problems, embodiments of the present invention provide a behavior recognition method, which may specifically be a human skeleton sequence behavior recognition method based on a space-time graph convolutional network; the method aims at the skeleton sequence data, utilizes the space natural connection (space skeleton node connection) of nodes in a single frame and the time connection of the same node among multiple frames to carry out space-time graph network modeling, provides a frame rectification algorithm to extract frames which are rich in information content, high in discrimination and large in correlation with the whole video behavior, designs a graph convolution kernel and a space-time graph convolution network algorithm, and completes human behavior identification together.
Human behavior recognition is considered as a basic technology in various application fields such as human-computer interaction, intelligent monitoring, robots and the like. Taking the monitoring of the old who lives alone as an example, the intelligent behavior recognition system judges whether the old has a meal normally, sleeps on time, takes medicine according to medical advice and whether abnormal conditions such as falling, myocardial infarction, coma and the like occur or not by detecting the daily activities of the old, so that the old and the young are informed in time and the old are delivered with medical advice in time, and the old can have higher life quality in the life alone. And as for a fitness evaluation and medical rehabilitation system, motion improvement suggestions are given by identifying motions and comparing the motions with correct postures, so that the fitness efficiency and the rehabilitation effect are improved. However, the traditional human behavior recognition method based on video has the disadvantages of large calculation amount, complex network and high requirement on image quality, and has a lot of difficulties in practical application. The method comprises the steps of performing behavior identification based on bone data, selecting RNN, CNN or graph convolution network, wherein the RNN has limited data feature extraction capability, matrix form data structures required by CNN lose some good attributes of the bone data, and the graph convolution network makes full use of natural connection among bone nodes to effectively extract richer features. Therefore, the embodiment of the invention provides the skeleton sequence behavior recognition based on the space-time diagram convolution, and the frame rectification algorithm is added to improve the recognition efficiency and accuracy. The scheme provided by the embodiment of the invention mainly relates to the following three parts:
the first part, a skeleton sequence time-space domain graph modeling method;
the modeling method of the skeleton sequence time-space domain diagram in the embodiment of the invention can be shown in fig. 2, the time-space domain structure of the skeleton sequence is divided into an intra-frame structure and an inter-frame structure, the intra-frame structure mainly describes the space domain structure of the skeleton, and the inter-frame structure mainly describes the time domain structure of the skeleton. Specifically, the skeleton nodes form a node set V of the graph, edges connected between different nodes in the same frame and edges connected between different frames of the same node together form an edge set E, and the edge set and the node set form a time-space graph G = (V, E) of the skeleton. Node set V = { V = ti T =1.. T, i =1.. N }, where T is the number of sequence frames and N is the number of skeletal nodes. Edge set E = { E = { (E) S ∪E T In which E S ={v ti v tj L (i, j) belongs to S, and S is a body joint naturally connected in the frame; e T ={v ti v pi And l (T, p) belongs to T ', and T' is the extracted key frame.
A second part, a key frame preprocessing network of the space-time graph convolution network;
the Frame Distillation Network (FDNet) in fig. 3 is a preprocessing Network for extracting a key Frame in the space-time convolutional Network according to an embodiment of the present invention.
The keyframe extraction Process is a Markov Decision Process (MDP) consisting of a triplet M = (S, a, R), where S = { S = { S = i Is a state set (the data of which is frame sequential data); a = { a = i The method comprises the following steps of (1) keeping a current frame unchanged, selecting a previous frame and selecting a next frame; r is a set of reward functions after (s, a) transitions to the next state. MDP initializationThe process is that the input skeleton sequence is uniformly sampled to obtain an initial state s 1 Performing a random state transition action a 1 Is converted to the state s 2 And calculating a value r of the return function 1 . The calculation of the value of the reward function requires the use of ST-GCN (space-time graph convolutional network), i.e. the state s i Inputting the corresponding skeleton frame sequence into pre-trained ST-GCN to obtain a recognition result, comparing the recognition result with the behavior label (such as walking and sitting), and correctly returning a function value r i Positive, otherwise negative.
The training procedure for the frame rectification network may be specifically as follows:
initializing FDNet, and randomly generating a network weight omega;
initializing a bone sequence state s 1
And (5) circularly traversing t:
selection action a t =maxQ(s t ,ω);
Execution of a t Generating a new state s t+1 Calculating the reward r using ST-GCN t
If (r) t >0)&&(r i >0for i=(t-N)......t);
Ending;
otherwise:
calculating the loss function l = a (r) t +γ×max Q(s t+1 ,ω)-Q(s t ,ω));
Gradient descending is carried out on the loss function, and omega is updated;
and returning to the step of circularly traversing t.
Wherein, t represents the training times of the sample (i.e. the training times of the skeleton sequence, which is equal to the training times of ST-GCN); r is t Representing the value of the return function, r, during the t-th training i Represents the value of the return function in the ith training process, (r) t >0)&&(r i > 0for i = (t-N) · t), which means that the reward function value is positive, and the number of times of continuously being positive reaches N times (that is, the behavior recognition result is consistent with the behavior tag corresponding to the segment of bone sequence, and the consistent number of times reaches N times continuously); n represents the system threshold (customizable), Q(s) t ω) represents a motion value function; α in the loss function l represents a learning speed (learning rate); gamma represents an attenuation value; alpha is more than 0 and less than 1,0 and less than or equal to gamma and less than or equal to 1.
According to the embodiment of the invention, the key frame can be acquired by utilizing the FDNet network after determining omega.
The FDNet network is used as a preprocessing network of the ST-GCN, plays a role in rectifying the most abundant and representative frames of information content, reduces the calculated amount of the ST-GCN network, and effectively reduces noise interference brought by information redundant frames.
A third part, a time-space diagram convolution network based on the skeleton sequence;
fig. 4 is a diagram convolution network constructed based on a time-space domain diagram of a skeleton sequence, which is called a time-space Graph convolution network (ST-GCN) in the embodiment of the present invention.
Similar to the 2D CNN network, the essence of the skeleton graph convolution network is to use a convolution kernel sharing parameters to realize weighted summation between the central node and the neighboring nodes to achieve the purpose of extracting features, so the skeleton graph convolution network focuses on the design of the sampling function and convolution kernel function of the neighboring nodes.
The picture sampling function is defined as the function of other pixels within a certain range around the collection center pixel, and similarly, the sampling function of the skeleton map can be defined as the function of other nodes at a certain distance connected with the collection center node, namely v ti Neighbor node set B (v) ti )={v qj |d(v ti ,v ti ) K is less than or equal to K, | q-t | < D }, wherein D (v) tj ,v ti ) Indicating the slave v within the same frame ti To v tj The length of the shortest path, K being a distance selection criterion (predefinable), | q-t | representing the frame time span before and after the center node, D representing a time selection criterion (predefinable), e.g. K =1, D =40, then the sampling function represents the selection of the distance center node v ti V of only one unit length tj And node v of 40 frames before and after the current frame qi Performing weighting calculation, and expressing the sampling function of the bone map as p (v) by using a formula ti )=v qj . Setting of neighbor node setThe space-time characteristics of the skeleton map are fully embodied.
The convolution kernel function mainly comprises the size of convolution kernel (equal to K multiplied by K) and the weight function w (v) of convolution kernel ti ). The convolution kernel on the picture is generally a square with fixed size and weight to be optimized, and the convolution kernel of the skeleton picture is designed to be from v ti Neighbor node set B (v) ti ) Mapping to K tags/ ti :B(v ti ) K-1, which means that neighboring nodes of the central node are divided into K labels (subsets) according to a preset rule (e.g., the distance between the central node and the neighboring nodes relative to the center of gravity of the skeleton), and weighted values corresponding to the labels are combined to form a weight function w (v) to be optimized ti ). The weight function w (v) to be optimized can be treated by adopting a back propagation mode and the like ti ) And (6) optimizing. The design of the convolution kernel solves the problem that the input of the graph convolution is non-euclidean structure data (non-matrix form).
The preset rule may be specifically any one of the following rules, but is not limited to the following rules:
unified label partition rules: the central node and the neighbor nodes belong to a subset (label);
according to the distance division rule: the central node is a subset, and other neighbor nodes are subsets;
partitioning rules according to spatial positions: based on the distance from the center node to the center of gravity of the whole skeleton node, the skeleton node is divided into three subsets which are larger than, equal to and smaller than the reference.
The following illustrates a skeleton sequence behavior identification method based on a space-time graph convolutional network according to an embodiment of the present invention, as shown in fig. 4, the flow includes:
(1) The bone sequence is uniformly sampled as the FDNet initialization input key frame sequence (i.e., initialization data).
(2) And determining the network weight omega of the FDNet, and obtaining the state S by using the action a through a Markov decision process.
The initial state s is determined according to the uniform sampling of the bone sequence during initialization 1 (i.e., determining the initial key frame), randomly determining the netInitial value of the weight ω of the complex, according to a t =maxQ(s t ω) determining an initial action a 1
Followed by determining the current state from the last action (e.g., from the initial action a) 1 Determining the State s 2 ) Determining the action corresponding to the current time (such as a) according to the current state 2 =maxQ(s 2 ω)) corresponding to the formula a t =maxQ(s t ,ω)。
In the FDNet training process, ω in the FDNet training process is updated after each use (after an action is determined according to the updated ω, a state is obtained correspondingly, and a key frame sequence is obtained), which is specifically referred to the content in the second section above.
(3) And constructing a corresponding key frame in the state s into a bone space-time diagram according to the content (a bone sequence space-time diagram modeling mode) of the first part.
(4) Training ST-GCN (through operations of calculating loss and/or error, back propagation and the like) by using the skeleton space-time diagram in (3) as an input, wherein the training content comprises the weight function w (v) ti ) And extracting space-time characteristics by using the algorithm in the third part (the skeleton sequence-based space-time diagram convolutional network), and then obtaining a behavior recognition result by using a SoftMax function (normalized exponential function).
(5) And (3) reversely adjusting the FDNet (including updating the network weight omega) according to the behavior recognition result in the step (4) by using the algorithm of the second part (the key frame preprocessing network of the space-time graph convolutional network), and optimizing the key frame sequence selection result.
(6) And (2) circularly executing (2) to (5), and cross-adjusting parameters of the two networks (FDNet and ST-GCN) until the behavior recognition result is not obviously changed any more (specifically, the recognition result is not changed any more), and the recognition result is consistent with the skeletal sequence tag (the skeletal sequence in (1)), so as to obtain the final FDNet and ST-GCN.
Specifically, the obtained behavior recognition result may be compared with the tag of the bone sequence in (1) before performing the reverse adjustment FDNet; if the comparison results are consistent and the consistent continuous times (the times of the continuous occurrence of the comparison results) reach a preset threshold value N, the reverse adjustment is not carried out; otherwise, the adjustment is continued.
The timing for executing the comparison operation may be after the FDNet is reversely adjusted for a preset number of times in the training; or the training is performed after the first execution (4) in the training; and is not limited herein.
(7) And (5) carrying out human behavior recognition by using the final FDNet and ST-GCN.
In the embodiment of the invention, two networks (FDNet and ST-GCN) are mutually optimized and mutually promoted, the preprocessing network FDNet provides key frame training data for the time-space diagram convolutional network ST-GCN, and the higher the representativeness of the extracted key frame is, the richer the information content is, the more accurate the trained ST-GCN parameters are; similarly, the higher the ST-GCN recognition result is, the more accurate the data for reversely adjusting FDNet is, and the network is further optimized to obtain a higher-quality key frame sequence.
In the embodiment of the present invention, the depth camera may be used to directly acquire the bone sequence, but not limited thereto.
Therefore, the bone sequence behavior identification method based on the space-time graph convolutional network in the embodiment of the invention comprises the following steps: aiming at the problems that the video behavior identification computation amount is huge, redundant frame information containing noise is contained, and the modeling difficulty is high when a skeleton sequence is used for deep learning behavior identification, the method provides graph modeling of a time-space domain aiming at the skeleton sequence, and provides a time-space graph convolution network containing a key frame preprocessing network, wherein the two networks (the key frame preprocessing network and the time-space graph convolution network) are mutually matched and optimized to identify human behaviors;
the key frame preprocessing network of the time-space graph convolution network comprises the following steps:
the preprocessing network FDNet for extracting the key frames adopts a Markov decision process, performs state conversion by executing different actions to obtain a return function, guides the execution of the next action according to the return function, and performs cyclic operation to determine the key frame sequence with most representativeness and most abundant information content.
The time-space diagram convolution network based on the skeleton sequence comprises the following steps:
and constructing a space-time-space graph convolution network based on a space-time domain graph model of the skeleton sequence. The sampling function of the graph convolution network is essentially the structure of a neighbor node set of a central node and is divided into an intra-frame subset and an inter-frame subset, wherein the intra-frame subset mainly comprises other naturally connected nodes which are within a specified range of the distance from the central node, and the inter-frame subset mainly comprises other nodes which are within a certain range before and after the frame where the central node is located and correspond to the same position. The convolution kernel function designs a weight function in a key mode, nodes in the neighbor nodes are divided into different subsets according to a certain rule, and each subset corresponds to different weight parameters. And finally, carrying out multilayer weighted summation on the time-space domain graph model of the bone sequence according to the sampling function and the convolution kernel function, and extracting time domain and space domain characteristics.
In summary, the embodiment of the invention provides a human body bone sequence behavior identification method based on a space-time diagram convolutional network. The method fully utilizes the nature and the characteristics of natural connection of skeleton nodes to establish a time-space domain graph model, so that the model has stronger generalization capability without artificially defining body parts; the original skeleton sequence is processed by adopting the preprocessing network for extracting the key frames, so that frames with rich information content, more identification degree and representativeness are extracted, the calculated amount of the graph convolution network is reduced, the interference of redundant information is reduced, and the model training efficiency is improved; a time-space graph convolution network based on a skeleton key frame sequence is adopted, and time domain and space domain characteristics of the skeleton sequence are excavated simultaneously, so that the model design complexity is reduced; by adopting an organization method that the pre-training network and the graph convolution network are mutually matched and optimized, the overall efficiency and accuracy of behavior recognition are improved.
The scheme adopts a deep learning method, and solves the problem of model training of large-scale data; the skeleton sequence is described by adopting a time-space domain graph model, the natural connection characteristic of the skeleton sequence is reserved, and the characteristics of richer expressive force can be extracted;
the skeleton sequence is directly obtained by using the depth camera for behavior recognition, so that the calculated amount of deep network skeleton extraction is reduced; the graph convolution network and the SoftMax function are used for identification and classification, the result is more accurate, and the model is more generalized;
the method does not need to manually extract various features, but uses a graph modeling method to describe the whole sequence, and uses a graph convolution method to extract time domain and space domain features, so that the method has fewer links needing manual participation and is more intelligent in model; and a preprocessing network for extracting key frames is added, so that the workload of behavior identification is further reduced.
An embodiment of the present invention further provides a behavior recognition apparatus, as shown in fig. 5, including:
a first obtaining module 51, configured to obtain the key frame sequence data to be identified from the bone sequence data to be identified;
and the first identification module 52 is configured to identify a behavior action corresponding to the to-be-identified key frame sequence data by using a space-time graph convolutional network.
The behavior recognition device provided by the embodiment of the invention acquires the sequence data of the key frames to be recognized from the bone sequence data to be recognized; identifying behavior actions corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolution network; redundant information interference can be reduced, and the workload of behavior identification is reduced; the time domain and space domain characteristics of the skeleton sequence are simultaneously excavated, the model design complexity is reduced, and the identification efficiency and accuracy are improved.
Wherein the first identification module comprises: the first construction submodule is used for constructing the key frame sequence data to be identified into a bone sequence space-time diagram to be identified; the first extraction submodule is used for extracting the spatiotemporal features to be identified from the bone sequence spatiotemporal image to be identified by utilizing a spatiotemporal image convolution network; and the first identification submodule is used for identifying the behavior action corresponding to the space-time characteristic to be identified.
Specifically, the first identification submodule includes: and the first processing unit is used for obtaining the behavior action corresponding to the space-time characteristic to be identified by utilizing the normalized exponential function.
In order to extract a frame with rich information content, more identification and representativeness; in an embodiment of the present invention, the first obtaining module includes: and the first acquisition sub-module is used for acquiring the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network.
Further, the behavior recognition device further includes: the first training module is used for training the frame rectification network and the space-time graph convolution network before acquiring the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network.
Wherein the first training module comprises: the second acquisition submodule is used for acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior action by using the frame rectification network; the first processing submodule is used for training the space-time graph convolutional network by utilizing the key frame sequence training data and identifying a training behavior action corresponding to the key frame sequence training data by utilizing the trained space-time graph convolutional network; the first adjusting submodule is used for adjusting the frame rectification network reversely according to the training behavior action;
the third acquisition submodule is used for acquiring the key frame sequence training data from the bone sequence data corresponding to the preset behavior action again by using the adjusted frame rectification network; and the first training submodule is used for retraining the space-time diagram convolutional network again by using the adjusted key frame sequence training data acquired by the frame rectification network.
Further, the behavior recognition device further includes: the second identification module is used for identifying the training behavior action corresponding to the adjusted key frame sequence training data obtained by the frame rectification network by using the retrained space-time graph convolution network after retraining the space-time graph convolution network by using the adjusted key frame sequence training data obtained by the frame rectification network; the first processing module is used for storing the parameter information of the frame rectification network and the space-time diagram convolution network if the continuous times that the obtained training behavior action is consistent with the preset behavior action are larger than a first threshold value; otherwise, returning to the action according to the training behavior, and reversely adjusting the operation of the frame rectification network.
Wherein the adjusting the frame rectification network in a reverse direction according to the training behavior comprises: acquiring a return function value according to the training behavior action; obtaining a loss function according to the return function value; and carrying out gradient descent on the loss function, and updating the network weight of the frame rectification network.
Specifically, the loss function is specifically: l = a (r) t +γ×max Q(s t+1 ,ω)-Q(s t ω), and; where l denotes a loss function, α denotes a preset learning speed, r t Represents the value of a return function when the frame rectification network is trained for the t time, gamma represents a preset attenuation value, Q(s) t+1 ω) and Q(s) t ω) both represent a function of the action value, s t+1 Representing the bone sequence state, s, corresponding to the key frame sequence training data when the frame rectification network is trained for the t +1 th time t And representing the bone sequence state corresponding to the key frame sequence training data when the frame rectification network is trained for the t time, wherein omega represents the network weight of the frame rectification network.
The implementation embodiments of the behavior recognition method are all suitable for the embodiment of the behavior recognition device, and the same technical effect can be achieved.
An embodiment of the present invention further provides a behavior recognition device, as shown in fig. 6, including a memory 61, a processor 62, and a computer program 63 stored in the memory 61 and capable of running on the processor; the processor 62, when executing the program, implements the behavior recognition method described above.
The implementation embodiments of the behavior recognition method are all suitable for the embodiment of the behavior recognition device, and the same technical effect can be achieved.
Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the behavior recognition method.
The implementation embodiments of the behavior recognition method are all applicable to the embodiment of the computer-readable storage medium, and the same technical effect can be achieved.
It should be noted that many of the functional components described in this specification are referred to as modules/sub-modules/units in order to more particularly emphasize their implementation independence.
In embodiments of the present invention, the modules/sub-modules/units may be implemented in software for execution by various types of processors. An identified module of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions which may, for instance, be constructed as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different bits which, when joined logically together, comprise the module and achieve the stated purpose for the module.
Indeed, a module of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Likewise, operational data may be identified within the modules and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, merely as electronic signals on a system or network.
When a module can be implemented by software, considering the level of existing hardware technology, a module implemented by software may build a corresponding hardware circuit to implement a corresponding function, without considering cost, and the hardware circuit may include a conventional Very Large Scale Integration (VLSI) circuit or a gate array and an existing semiconductor such as a logic chip, a transistor, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.
While the foregoing is directed to the preferred embodiment of the present invention, it will be appreciated by those skilled in the art that various changes and modifications may be made therein without departing from the principles of the invention as set forth in the appended claims.

Claims (12)

1. A method of behavior recognition, comprising:
acquiring key frame sequence data to be identified from the bone sequence data to be identified;
identifying behavior actions corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network;
the method for identifying the behavior action corresponding to the to-be-identified key frame sequence data by utilizing the space-time graph convolutional network comprises the following steps of:
constructing the key frame sequence data to be identified into a bone sequence space-time diagram to be identified;
extracting a spatiotemporal feature to be identified from the bone sequence spatiotemporal image to be identified by utilizing a spatiotemporal image convolution network;
identifying the behavior action corresponding to the space-time characteristic to be identified;
wherein the acquiring of the sequence data of the key frame to be identified from the bone sequence data to be identified comprises:
acquiring key frame sequence data to be identified from the bone sequence data to be identified by using a frame rectification network;
before acquiring the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network, the method further comprises the following steps:
training the frame rectification network and the space-time graph convolution network;
the training of the frame rectification network and the space-time graph convolution network comprises the following steps:
acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior action by using the frame rectification network;
training the space-time graph convolutional network by using the key frame sequence training data, and identifying training behavior actions corresponding to the key frame sequence training data by using the trained space-time graph convolutional network;
adjusting the frame rectification network reversely according to the training behavior;
acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior action again by using the adjusted frame rectification network;
and training the time-space diagram convolutional network again by using the adjusted key frame sequence training data acquired by the frame rectifying network.
2. The behavior recognition method according to claim 1, wherein the recognizing the behavior action corresponding to the spatiotemporal feature to be recognized comprises:
and obtaining the behavior action corresponding to the space-time characteristic to be identified by utilizing the normalized exponential function.
3. The behavior recognition method according to claim 1, further comprising, after retraining the space-time graph convolutional network with the adjusted training data of the sequence of key frames obtained by the frame rectification network, the following steps:
identifying training behavior actions corresponding to the adjusted key frame sequence training data obtained by the frame rectification network by using the retrained space-time graph convolution network;
if the obtained continuous times of the training behavior action consistent with the preset behavior action are larger than a first threshold value, storing parameter information of the frame rectification network and the space-time diagram convolution network; otherwise, returning to the action according to the training behavior, and reversely adjusting the operation of the frame rectification network.
4. The behavior recognition method according to claim 1 or 3, wherein the adjusting the frame rectification network in reverse according to the training behavior action comprises:
acquiring a return function value according to the training behavior action;
obtaining a loss function according to the return function value;
and carrying out gradient descent on the loss function, and updating the network weight of the frame rectification network.
5. The behavior recognition method according to claim 4, wherein the loss function is specifically:
l=α(r t +γ×maxQ(s t+1 ,ω)-Q(s t ,ω));
where l represents a loss function, α represents a preset learning speed, r t Representing the value of a return function when the frame rectification network is trained for the t time, gamma represents a preset attenuation value,
Q(s t+1 ω) and Q(s) t ω) each represent a function of motion value, s t+1 Representing the bone sequence state, s, corresponding to the key frame sequence training data when the frame rectification network is trained for the t +1 th time t And representing the bone sequence state corresponding to the key frame sequence training data when the frame rectification network is trained for the t time, wherein omega represents the network weight of the frame rectification network.
6. A behavior recognition apparatus, comprising:
the first acquisition module is used for acquiring the key frame sequence data to be identified from the bone sequence data to be identified;
the first identification module is used for identifying the behavior action corresponding to the key frame sequence data to be identified by utilizing a time-space diagram convolution network;
wherein the first identification module comprises:
the first construction submodule is used for constructing the key frame sequence data to be identified into a bone sequence space-time diagram to be identified;
the first extraction submodule is used for extracting the spatiotemporal features to be identified from the bone sequence spatiotemporal image to be identified by utilizing a spatiotemporal image convolution network;
the first identification submodule is used for identifying the behavior action corresponding to the space-time characteristic to be identified;
wherein, the first obtaining module comprises:
the first acquisition submodule is used for acquiring the key frame sequence data to be identified from the bone sequence data to be identified by utilizing the frame rectification network;
the behavior recognition device further comprises:
the first training module is used for training the frame rectification network and the space-time graph convolution network before acquiring the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network;
the first training module comprising:
the second acquisition sub-module is used for acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior and action by using the frame rectification network;
the first processing submodule is used for training the space-time graph convolutional network by utilizing the key frame sequence training data and identifying a training behavior action corresponding to the key frame sequence training data by utilizing the trained space-time graph convolutional network;
the first adjusting submodule is used for adjusting the frame rectification network reversely according to the training behavior action;
the third acquisition submodule is used for acquiring the key frame sequence training data from the bone sequence data corresponding to the preset behavior action again by using the adjusted frame rectification network;
and the first training submodule is used for retraining the space-time diagram convolutional network again by using the adjusted key frame sequence training data acquired by the frame rectification network.
7. The behavior recognition device according to claim 6, wherein the first recognition submodule includes:
and the first processing unit is used for obtaining the behavior action corresponding to the space-time feature to be identified by utilizing the normalized exponential function.
8. The behavior recognition device according to claim 6, further comprising:
the second identification module is used for identifying the training behavior action corresponding to the adjusted key frame sequence training data obtained by the frame rectification network by using the retrained space-time graph convolution network after retraining the space-time graph convolution network by using the adjusted key frame sequence training data obtained by the frame rectification network;
the first processing module is used for storing the parameter information of the frame rectification network and the space-time diagram convolution network if the continuous times that the obtained training behavior action is consistent with the preset behavior action are larger than a first threshold value; otherwise, returning to the action according to the training behavior, and reversely adjusting the operation of the frame rectification network.
9. The behavior recognition device according to claim 7 or 8, wherein the adjusting the frame rectification network in reverse according to the training behavior action comprises:
acquiring a return function value according to the training behavior action;
obtaining a loss function according to the return function value;
and carrying out gradient descent on the loss function, and updating the network weight of the frame rectification network.
10. The behavior recognition device according to claim 9, wherein the loss function is specifically:
l=α(r t +γ×maxQ(s t+1 ,ω)-Q(s t ,ω));
where l denotes a loss function, α denotes a preset learning speed, r t Representing the value of a return function when the frame rectification network is trained for the t time, gamma represents a preset attenuation value,
Q(s t+1 ω) and Q(s) t ω) each represent a function of motion value, s t+1 Representing the bone sequence state, s, corresponding to the key frame sequence training data when the frame rectification network is trained for the t +1 th time t And the skeleton sequence state corresponding to the key frame sequence training data when the frame rectification network is trained for the t time is represented, and omega represents the network weight of the frame rectification network.
11. A behavior recognition device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor; characterized in that the processor, when executing the program, implements the behavior recognition method according to any one of claims 1 to 5.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method for behavior recognition according to any one of claims 1 to 5.
CN201910000953.0A 2019-01-02 2019-01-02 Behavior identification method, device and equipment Active CN111401106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910000953.0A CN111401106B (en) 2019-01-02 2019-01-02 Behavior identification method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910000953.0A CN111401106B (en) 2019-01-02 2019-01-02 Behavior identification method, device and equipment

Publications (2)

Publication Number Publication Date
CN111401106A CN111401106A (en) 2020-07-10
CN111401106B true CN111401106B (en) 2023-03-31

Family

ID=71430152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910000953.0A Active CN111401106B (en) 2019-01-02 2019-01-02 Behavior identification method, device and equipment

Country Status (1)

Country Link
CN (1) CN111401106B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069979B (en) * 2020-09-03 2024-02-02 浙江大学 Real-time action recognition man-machine interaction system
CN112070027B (en) * 2020-09-09 2022-08-26 腾讯科技(深圳)有限公司 Network training and action recognition method, device, equipment and storage medium
CN112380955B (en) * 2020-11-10 2023-06-16 浙江大华技术股份有限公司 Action recognition method and device
CN112528811A (en) * 2020-12-02 2021-03-19 建信金融科技有限责任公司 Behavior recognition method and device
CN113128424B (en) * 2021-04-23 2024-05-03 浙江理工大学 Method for identifying action of graph convolution neural network based on attention mechanism
CN113139469B (en) * 2021-04-25 2022-04-29 武汉理工大学 Driver road stress adjusting method and system based on micro-expression recognition
CN113378656B (en) * 2021-05-24 2023-07-25 南京信息工程大学 Action recognition method and device based on self-adaptive graph convolution neural network
CN113989927B (en) * 2021-10-27 2024-04-26 东北大学 Method and system for identifying violent behaviors of video group based on bone data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN107169415A (en) * 2017-04-13 2017-09-15 西安电子科技大学 Human motion recognition method based on convolutional neural networks feature coding
CN108229355A (en) * 2017-12-22 2018-06-29 北京市商汤科技开发有限公司 Activity recognition method and apparatus, electronic equipment, computer storage media, program
WO2018119807A1 (en) * 2016-12-29 2018-07-05 浙江工商大学 Depth image sequence generation method based on convolutional neural network and spatiotemporal coherence
CN108304795A (en) * 2018-01-29 2018-07-20 清华大学 Human skeleton Activity recognition method and device based on deeply study

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
WO2018119807A1 (en) * 2016-12-29 2018-07-05 浙江工商大学 Depth image sequence generation method based on convolutional neural network and spatiotemporal coherence
CN107169415A (en) * 2017-04-13 2017-09-15 西安电子科技大学 Human motion recognition method based on convolutional neural networks feature coding
CN108229355A (en) * 2017-12-22 2018-06-29 北京市商汤科技开发有限公司 Activity recognition method and apparatus, electronic equipment, computer storage media, program
CN108304795A (en) * 2018-01-29 2018-07-20 清华大学 Human skeleton Activity recognition method and device based on deeply study

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种融合全局时空特征的CNNs动作识别方法;王珂等;《华中科技大学学报(自然科学版)》;20181220(第12期);全文 *
基于时空注意力深度网络的视角无关性骨架行为识别;丰艳等;《计算机辅助设计与图形学学报》;20181215(第12期);全文 *

Also Published As

Publication number Publication date
CN111401106A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111401106B (en) Behavior identification method, device and equipment
CN111310707B (en) Bone-based graph annotation meaning network action recognition method and system
CN107492121B (en) Two-dimensional human body bone point positioning method of monocular depth video
CN111291809B (en) Processing device, method and storage medium
Yang et al. SiamAtt: Siamese attention network for visual tracking
CN109685037B (en) Real-time action recognition method and device and electronic equipment
CN109902565B (en) Multi-feature fusion human behavior recognition method
Filtjens et al. Skeleton-based action segmentation with multi-stage spatial-temporal graph convolutional neural networks
CN110477907B (en) Modeling method for intelligently assisting in recognizing epileptic seizures
Liu et al. Joint dynamic pose image and space time reversal for human action recognition from videos
CN113963304B (en) Cross-modal video time sequence action positioning method and system based on time sequence-space diagram
CN114613013A (en) End-to-end human behavior recognition method and model based on skeleton nodes
CN106203255A (en) A kind of pedestrian based on time unifying heavily recognition methods and system
CN115527269B (en) Intelligent human body posture image recognition method and system
Jin et al. Cvt-assd: convolutional vision-transformer based attentive single shot multibox detector
CN109858351B (en) Gait recognition method based on hierarchy real-time memory
CN110852214A (en) Light-weight face recognition method facing edge calculation
Liu et al. Key algorithm for human motion recognition in virtual reality video sequences based on hidden markov model
CN112199994B (en) Method and device for detecting interaction of3D hand and unknown object in RGB video in real time
Zhu et al. Dance Action Recognition and Pose Estimation Based on Deep Convolutional Neural Network.
CN114882595A (en) Armed personnel behavior identification method and armed personnel behavior identification system
Benhamida et al. Theater Aid System for the Visually Impaired Through Transfer Learning of Spatio-Temporal Graph Convolution Networks
Jia et al. Lightweight CNN-Based Image Recognition with Ecological IoT Framework for Management of Marine Fishes
Hedegaard et al. Human activity recognition
Tyagi et al. Hybrid classifier model with tuned weights for human activity recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant