CN111401106A - Behavior identification method, device and equipment - Google Patents

Behavior identification method, device and equipment Download PDF

Info

Publication number
CN111401106A
CN111401106A CN201910000953.0A CN201910000953A CN111401106A CN 111401106 A CN111401106 A CN 111401106A CN 201910000953 A CN201910000953 A CN 201910000953A CN 111401106 A CN111401106 A CN 111401106A
Authority
CN
China
Prior art keywords
network
behavior
training
identified
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910000953.0A
Other languages
Chinese (zh)
Other versions
CN111401106B (en
Inventor
丁晓璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Ltd Research Institute filed Critical China Mobile Communications Group Co Ltd
Priority to CN201910000953.0A priority Critical patent/CN111401106B/en
Publication of CN111401106A publication Critical patent/CN111401106A/en
Application granted granted Critical
Publication of CN111401106B publication Critical patent/CN111401106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a behavior identification method, a behavior identification device and behavior identification equipment, wherein the behavior identification method comprises the following steps: acquiring key frame sequence data to be identified from bone sequence data to be identified; and identifying the behavior action corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network. The scheme acquires the sequence data of the key frames to be identified from the sequence data of the bones to be identified; identifying behavior actions corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network; redundant information interference can be reduced, and the workload of behavior identification is reduced; the time domain and space domain characteristics of the skeleton sequence are simultaneously excavated, the model design complexity is reduced, and the identification efficiency and accuracy are improved.

Description

Behavior identification method, device and equipment
Technical Field
The invention relates to the technical field of human behavior recognition, in particular to a behavior recognition method, a behavior recognition device and behavior recognition equipment.
Background
The following behavior recognition schemes mainly exist in the prior art:
in the first scheme, video behavior recognition: and directly taking video image data as input, and performing behavior recognition by using a deep learning method. The method can be approximately regarded as image classification for each frame in multi-frame images, and provides a behavior identification result according to the classification result of all the images.
Second approach, bone sequence behavior recognition: the human skeleton node has higher robustness on illumination and visual angle change, the data volume is small, the computing resource consumption is small, along with the improvement of the equipment precision, the skeleton node can obtain more accurate coordinates through a depth camera or a motion capture system, and the recognition effect can be improved by taking a skeleton sequence as the input of a depth network.
The third scheme is based on behavior recognition of RNN (Recurrent Neural Networks). due to the good property of RNN, the RNN is used for behavior recognition without independently considering time dimension modeling, and the existing scheme is mostly based on L STM (L ong Short Term Memory Network, long and Short Term Memory Network), and the recognition accuracy is continuously improved by introducing a trust gate mechanism, increasing an attention model and other methods.
A fourth scheme, CNN (Convolutional Neural Networks) based behavior recognition: the input of the CNN is generally Euclidean structure data which is regularly arranged in a matrix form, and the existing documents mostly adopt the method of expanding 2D CNN of image classification into 3D CNN of video identification, and adding video segmentation, multitask parallel computation and the like to achieve good effects.
However, the above four behavior recognition schemes have the following disadvantages, respectively:
for the first scheme, video behavior recognition: the calculation amount is huge, about 5M of an image classification (101-type) network is obtained, and the number of network parameters reaches about 33M after the image classification (101-type) network is expanded to video classification; the global long-distance context information is difficult to extract, the video classification result not only depends on the identification result of a single picture but also depends on the dynamic change among picture sequences, but the long-distance global context dynamic information is difficult to capture due to limited storage capacity and computing capacity; sensitive to illumination and visual angle change.
For the second approach described above, the bone sequence behavior recognition: the existing bone behavior identification method mainly comprises two types, one type is that the joint dynamics is captured by manually extracting features, and the features are skillfully designed and a large amount of manpower is needed; one is to use a deep learning method, but the existing documents mostly start from the perspective of dividing body parts, and extraction of deeper level feature association is limited to a certain extent.
For the third scheme, based on the behavior recognition of the RNN network: the RNN can well acquire time dimension information, but has limited ability to extract data features; the existing method is mainly based on multilayer RNN stacking (stacked RNN), and is difficult to train in practical application.
For the fourth scheme, based on the behavior recognition of the CNN network: the existing algorithm generally calculates each frame of data input by CNN without distinction, which wastes calculation capacity and brings noise interference; the CNN network needs to model the time dimension independently, so that the difficulty of model design is increased; the input requirement of CNN networks is the euclidean structure, which, if identified based on skeletal behavior, loses the natural connectivity properties between skeletal nodes.
Therefore, the existing human behavior recognition scheme has the disadvantages of large calculation amount, low recognition efficiency, complex network and a plurality of difficulties in practical application.
Disclosure of Invention
The invention aims to provide a behavior recognition method, a behavior recognition device and behavior recognition equipment, and solves the problems of large calculation amount, low recognition efficiency and complex network of a human behavior recognition scheme in the prior art.
In order to solve the above technical problem, an embodiment of the present invention provides a behavior identification method, including:
acquiring key frame sequence data to be identified from bone sequence data to be identified;
and identifying the behavior action corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network.
Optionally, the identifying, by using a space-time graph convolutional network, a behavior action corresponding to the sequence data of the key frames to be identified includes:
constructing the key frame sequence data to be identified into a bone sequence space-time diagram to be identified;
extracting a spatiotemporal feature to be identified from the bone sequence spatiotemporal image to be identified by utilizing a spatiotemporal image convolution network;
and identifying the behavior action corresponding to the space-time characteristic to be identified.
Optionally, the identifying the behavior action corresponding to the to-be-identified spatio-temporal feature includes:
and obtaining the behavior action corresponding to the space-time characteristic to be identified by utilizing the normalized exponential function.
Optionally, the acquiring the sequence data of the key frame to be identified from the bone sequence data to be identified includes:
and acquiring the key frame sequence data to be identified from the bone sequence data to be identified by utilizing a frame rectification network.
Optionally, before acquiring the sequence data of the key frame to be identified from the bone sequence data to be identified by using the frame rectification network, the method further includes:
and training the frame rectification network and the space-time graph convolution network.
Optionally, the training the frame rectification network and the space-time graph convolution network includes:
acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior and action by using the frame rectification network;
training the space-time graph convolutional network by using the key frame sequence training data, and identifying training behavior actions corresponding to the key frame sequence training data by using the trained space-time graph convolutional network;
adjusting the frame rectification network reversely according to the training behavior action;
acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior action again by using the adjusted frame rectification network;
and training the time-space diagram convolutional network again by using the adjusted key frame sequence training data acquired by the frame rectifying network.
Optionally, after the time-space graph convolutional network is trained again by using the adjusted training data of the sequence of key frames acquired by the frame rectification network, the method further includes:
identifying training behavior actions corresponding to the adjusted key frame sequence training data obtained by the frame rectification network by using the retrained space-time graph convolution network;
if the obtained continuous times of the training behavior action consistent with the preset behavior action are larger than a first threshold value, storing parameter information of the frame rectification network and the space-time diagram convolution network; otherwise, returning to the action according to the training behavior, and reversely adjusting the operation of the frame rectification network.
Optionally, the reversely adjusting the frame rectification network according to the training behavior action includes:
acquiring a return function value according to the training behavior action;
obtaining a loss function according to the return function value;
and carrying out gradient descent on the loss function, and updating the network weight of the frame rectification network.
Optionally, the loss function is specifically:
l=α(rt+γ×max Q(st+1,ω)-Q(st,ω));
where l denotes a loss function, α denotes a preset learning speed, rtRepresenting the value of a return function when the frame rectification network is trained for the t time, gamma represents a preset attenuation value,
Q(st+1ω) and Q(s)tω) each represent a function of motion value, st+1Representing the bone sequence state, s, corresponding to the key frame sequence training data when the frame rectification network is trained for the t +1 th timetAnd representing the bone sequence state corresponding to the key frame sequence training data when the frame rectification network is trained for the t time, wherein omega represents the network weight of the frame rectification network.
An embodiment of the present invention further provides a behavior recognition apparatus, including:
the first acquisition module is used for acquiring the key frame sequence data to be identified from the bone sequence data to be identified;
and the first identification module is used for identifying the behavior action corresponding to the key frame sequence data to be identified by utilizing a space-time graph convolutional network.
Optionally, the first identification module includes:
the first construction submodule is used for constructing the key frame sequence data to be identified into a bone sequence space-time diagram to be identified;
the first extraction submodule is used for extracting the spatiotemporal features to be identified from the bone sequence spatiotemporal image to be identified by utilizing a spatiotemporal image convolution network;
and the first identification submodule is used for identifying the behavior action corresponding to the space-time characteristic to be identified.
Optionally, the first identification submodule includes:
and the first processing unit is used for obtaining the behavior action corresponding to the space-time feature to be identified by utilizing the normalized exponential function.
Optionally, the first obtaining module includes:
and the first acquisition sub-module is used for acquiring the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network.
Optionally, the method further includes:
the first training module is used for training the frame rectification network and the space-time graph convolution network before acquiring the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network.
Optionally, the first training module includes:
the second acquisition submodule is used for acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior action by using the frame rectification network;
the first processing submodule is used for training the space-time graph convolutional network by utilizing the key frame sequence training data and identifying a training behavior action corresponding to the key frame sequence training data by utilizing the trained space-time graph convolutional network;
the first adjusting submodule is used for adjusting the frame rectification network reversely according to the training behavior action;
the third acquisition submodule is used for acquiring the key frame sequence training data from the bone sequence data corresponding to the preset behavior action again by using the adjusted frame rectification network;
and the first training submodule is used for retraining the space-time diagram convolutional network again by using the adjusted key frame sequence training data acquired by the frame rectification network.
Optionally, the method further includes:
the second identification module is used for identifying the training behavior action corresponding to the adjusted key frame sequence training data obtained by the frame rectification network by using the retrained space-time graph convolution network after retraining the space-time graph convolution network by using the adjusted key frame sequence training data obtained by the frame rectification network;
the first processing module is used for storing the parameter information of the frame rectification network and the space-time diagram convolution network if the continuous times that the obtained training behavior action is consistent with the preset behavior action are larger than a first threshold value; otherwise, returning to the action according to the training behavior, and reversely adjusting the operation of the frame rectification network.
Optionally, the reversely adjusting the frame rectification network according to the training behavior action includes:
acquiring a return function value according to the training behavior action;
obtaining a loss function according to the return function value;
and carrying out gradient descent on the loss function, and updating the network weight of the frame rectification network.
Optionally, the loss function is specifically:
l=α(rt+γ×max Q(st+1,ω)-Q(st,ω));
whereinL denotes a loss function, α denotes a preset learning speed, rtRepresenting the value of a return function when the frame rectification network is trained for the t time, gamma represents a preset attenuation value,
Q(st+1ω) and Q(s)tω) each represent a function of motion value, st+1Representing the bone sequence state, s, corresponding to the key frame sequence training data when the frame rectification network is trained for the t +1 th timetAnd representing the bone sequence state corresponding to the key frame sequence training data when the frame rectification network is trained for the t time, wherein omega represents the network weight of the frame rectification network.
The embodiment of the invention also provides behavior recognition equipment, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor; the processor implements the above-described behavior recognition method when executing the program.
Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the behavior recognition method.
The technical scheme of the invention has the following beneficial effects:
in the scheme, the behavior recognition method obtains the key frame sequence data to be recognized from the bone sequence data to be recognized; identifying behavior actions corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network; redundant information interference can be reduced, and the workload of behavior identification is reduced; the time domain and space domain characteristics of the skeleton sequence are simultaneously excavated, the model design complexity is reduced, and the identification efficiency and accuracy are improved.
Drawings
FIG. 1 is a flow chart of a behavior recognition method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a modeling method of a time-space domain diagram of a bone sequence according to an embodiment of the present invention;
fig. 3 is a first diagram illustrating a specific implementation of the behavior recognition method according to the embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a specific implementation of the behavior recognition method according to the embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a behavior recognition device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a behavior recognition device according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a behavior recognition method aiming at the problems of large calculation amount, low recognition efficiency and complex network of a human behavior recognition scheme in the prior art, as shown in figure 1, the method comprises the following steps:
step 11: acquiring key frame sequence data to be identified from bone sequence data to be identified;
step 12: and identifying the behavior action corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network.
The behavior recognition method provided by the embodiment of the invention obtains the key frame sequence data to be recognized from the bone sequence data to be recognized; identifying behavior actions corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network; redundant information interference can be reduced, and the workload of behavior identification is reduced; the time domain and space domain characteristics of the skeleton sequence are simultaneously excavated, the model design complexity is reduced, and the identification efficiency and accuracy are improved.
The method for identifying the behavior action corresponding to the to-be-identified key frame sequence data by utilizing the space-time graph convolutional network comprises the following steps of: constructing the key frame sequence data to be identified into a bone sequence space-time diagram to be identified; extracting a spatiotemporal feature to be identified from the bone sequence spatiotemporal image to be identified by utilizing a spatiotemporal image convolution network; and identifying the behavior action corresponding to the space-time characteristic to be identified.
Specifically, the identifying the behavior action corresponding to the to-be-identified spatio-temporal feature includes: and obtaining the behavior action corresponding to the space-time characteristic to be identified by utilizing the normalized exponential function.
In order to extract a frame with rich information content, more identification and representativeness; in an embodiment of the present invention, the acquiring the sequence data of the key frame to be identified from the bone sequence data to be identified includes: and acquiring the key frame sequence data to be identified from the bone sequence data to be identified by utilizing a frame rectification network.
Further, before obtaining the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network, the method further comprises the following steps: and training the frame rectification network and the space-time graph convolution network.
Wherein training the frame rectification network and the space-time graph convolution network comprises: acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior and action by using the frame rectification network; training the space-time graph convolutional network by using the key frame sequence training data, and identifying training behavior actions corresponding to the key frame sequence training data by using the trained space-time graph convolutional network; adjusting the frame rectification network reversely according to the training behavior action;
acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior action again by using the adjusted frame rectification network; and training the time-space diagram convolutional network again by using the adjusted key frame sequence training data acquired by the frame rectifying network.
Further, after the time-space graph convolutional network is trained again by using the adjusted training data of the key frame sequence acquired by the frame rectification network, the method further includes: identifying training behavior actions corresponding to the adjusted key frame sequence training data obtained by the frame rectification network by using the retrained space-time graph convolution network; if the obtained continuous times of the training behavior action consistent with the preset behavior action are larger than a first threshold value, storing parameter information of the frame rectification network and the space-time diagram convolution network; otherwise, returning to the action according to the training behavior, and reversely adjusting the operation of the frame rectification network.
Wherein the adjusting the frame rectification network in reverse direction according to the training behavior action comprises: acquiring a return function value according to the training behavior action; obtaining a loss function according to the return function value; and carrying out gradient descent on the loss function, and updating the network weight of the frame rectification network.
In particular, the loss function is specifically represented by l- α (r)t+γ×max Q(st+1,ω)-Q(stω), where l represents a loss function, α represents a predetermined learning rate, rtRepresents the value of a return function when the frame rectification network is trained for the t time, gamma represents a preset attenuation value, Q(s)t+1ω) and Q (st, ω) both represent a motion value function, st+1Representing the bone sequence state, s, corresponding to the key frame sequence training data when the frame rectification network is trained for the t +1 th timetAnd representing the bone sequence state corresponding to the key frame sequence training data when the frame rectification network is trained for the t time, wherein omega represents the network weight of the frame rectification network.
The behavior recognition method provided by the embodiment of the invention is further described below.
In view of the above technical problems, embodiments of the present invention provide a behavior recognition method, which may specifically be a human skeleton sequence behavior recognition method based on a space-time graph convolutional network; the method aims at the skeleton sequence data, utilizes the space natural connection (space skeleton node connection) of nodes in a single frame and the time connection of the same node among multiple frames to carry out space-time graph network modeling, provides a frame rectification algorithm to extract frames which are rich in information content, high in discrimination and large in correlation with the whole video behavior, designs a graph convolution kernel and a space-time graph convolution network algorithm, and completes human behavior identification together.
Human behavior recognition is considered as a basic technology in various application fields such as human-computer interaction, intelligent monitoring, robots and the like. Taking the monitoring of the old who lives alone as an example, the intelligent behavior recognition system judges whether the old has a meal normally, sleeps on time, takes medicine according to medical advice and whether abnormal conditions such as falling, myocardial infarction, coma and the like occur or not by detecting the daily activities of the old, so that the old and the young are informed in time and the old are delivered with medical advice in time, and the old can have higher life quality in the life alone. And as for a fitness evaluation and medical rehabilitation system, motion improvement suggestions are given by identifying motions and comparing the motions with correct postures, so that the fitness efficiency and the rehabilitation effect are improved. However, the traditional human behavior recognition method based on video has large calculation amount, complex network and high requirement on image quality, and has a plurality of difficulties in practical application. The method comprises the steps of performing behavior identification based on bone data, selecting an RNN, a CNN or a graph convolution network, wherein the RNN has limited data feature extraction capability, a matrix form data structure required by the CNN loses some good attributes of the bone data, and the graph convolution network makes full use of natural connection among bone nodes to effectively extract richer features. Therefore, the embodiment of the invention provides the skeleton sequence behavior recognition based on the space-time diagram convolution, and the frame rectification algorithm is added to improve the recognition efficiency and accuracy. The scheme provided by the embodiment of the invention mainly relates to the following three parts:
the first part, a skeleton sequence time-space domain graph modeling method;
the modeling method of the skeleton sequence time-space domain diagram in the embodiment of the invention can be shown in fig. 2, wherein the time-space domain structure of the skeleton sequence is divided into an intra-frame structure and an inter-frame structure, the intra-frame structure mainly describes the space domain structure of the skeleton, and the inter-frame structure mainly describes the time domain structure of the skeleton. Specifically, the skeleton nodes form a node set V of the graph, edges connected between different nodes in the same frame and edges connected between different frames by the same node together form an edge set E, and the edge set and the node set form a time-space graph G of the skeleton. Node set V ═ VtiT, i 1.. N }, where T is the number of frame sequences and N is the number of bone nodes. Edge set E ═ ES∪ETIn which ES={vtivtjI (i, j) ∈ S, S being the body joint naturally connected within the frame, ET={vtivpiAnd | T, p ∈ T ', T' is the extracted key frame.
A second part, a key frame preprocessing network of the space-time graph convolution network;
the Frame Distillation Network (FDNet) in fig. 3 is a preprocessing Network for extracting a key Frame in the space-time convolutional Network according to an embodiment of the present invention.
The key frame extraction Process is a Markov Decision Process (MDP) consisting of a triplet M ═ S, a, R, where S ═ S { (S)iIs a state set (the data of which is frame sequential data); a ═ aiThe method comprises the following steps of (1) keeping a current frame unchanged, selecting a previous frame and selecting a next frame; r is a set of reward functions after (s, a) transitions to the next state. The MDP initialization process is that the input skeleton sequence is uniformly sampled to obtain an initial state s1Performing a random state transition action a1Is switched to state s2And calculating a value r of the return function1. The calculation of the value of the reward function requires the use of ST-GCN (space-time graph convolutional network), i.e. the state siInputting the corresponding skeleton frame sequence into pre-trained ST-GCN to obtain a recognition result, comparing the recognition result with the behavior label (such as walking and sitting), and correctly returning a function value riPositive, otherwise negative.
The training procedure for the frame rectification network may be specifically as follows:
initializing FDNet, and randomly generating a network weight omega;
initializing a bone sequence state s1
And (5) circularly traversing t:
selection action at=maxQ(st,ω);
Execution of atGenerating a new state st+1Calculating the reward r using ST-GCNt
If (r)t>0)&&(ri>0for i=(t-N)......t);
Finishing;
otherwise:
calculate loss function l α (r)t+γ×max Q(st+1,ω)-Q(st,ω));
Gradient descending is carried out on the loss function, and omega is updated;
and returning to the step of circularly traversing t.
Wherein t represents the number of sample training times (i.e., the number of training times of the bone sequence, which is equal to the number of training times of ST-GCN); r istIndicating during the t-th trainingValue of the return function riRepresents the value of the return function in the ith training process, (r)t>0)&&(riThe value of the reward function is positive, and the number of times of continuous positive reaches N times (namely, the behavior recognition result is consistent with the behavior tag corresponding to the section of skeleton sequence, and the consistent number of times reaches N times continuously); n represents the system threshold (customizable), Q(s)tω) represents an operation value function, α in the loss function l represents a learning rate (learning rate), γ represents a attenuation value, 0 < α < 1, and 0. ltoreq. γ.ltoreq.1.
According to the embodiment of the invention, the key frame can be acquired by utilizing the FDNet network after determining omega.
The FDNet network is used as a preprocessing network of the ST-GCN, plays a role in rectifying the most abundant and representative frames of information content, reduces the calculated amount of the ST-GCN network, and effectively reduces noise interference caused by information redundant frames.
A third part, a time-space diagram convolution network based on the skeleton sequence;
fig. 4 is a Graph convolution network constructed based on a time-space domain Graph of a skeleton sequence, which is referred to as a time-space Graph convolution network (ST-GCN) in the embodiment of the present invention.
Similar to the 2D CNN network, the essence of the skeleton graph convolution network is to use a convolution kernel sharing parameters to realize weighted summation between the central node and the neighboring nodes to achieve the purpose of extracting features, so the skeleton graph convolution network focuses on the design of the sampling function and convolution kernel function of the neighboring nodes.
The picture sampling function is defined as the function of other pixels within a certain range around the collection center pixel, and similarly, the sampling function of the skeleton map can be defined as the function of other nodes at a certain distance connected with the collection center node, namely vtiNeighbor node set B (v)ti)={vqj|d(vti,vti) K is less than or equal to K, | q-t | < D }, wherein D (v)tj,vti) Indicating the slave v within the same frametiTo vtjThe length of the shortest path, K is a distance selection criterion (which can be predefined), | q-t | represents the centerThe time span of the frames before and after the node, D represents a time selection criterion (which may be predefined), such as K1 and D40, the sampling function represents the selection of the distance center node vtiV of only one unit lengthtjAnd node v of 40 frames before and after the current frameqiPerforming weighting calculation, and formulating the sampling function of the bone map into p (v)ti)=vqj. The design of the neighbor node set fully embodies the spatio-temporal characteristics of the skeleton map.
The convolution kernel function mainly comprises the size of convolution kernel (equal to K × K) and the weight function w (v) of convolution kernelti). The convolution kernel on the picture is generally a square with fixed size and weight to be optimized, and the convolution kernel of the skeleton picture is designed to be from vtiNeighbor node set B (v)ti) Mapping to K tags/ti:B(vti) K-1, which means that neighboring nodes of the central node are divided into K labels (subsets) according to a preset rule (e.g., the distance between the central node and the neighboring nodes is far and near relative to the center of gravity of the skeleton), and weighted values corresponding to the labels are combined to form a weight function w (v) to be optimizedti). The optimized weight function w (v) can be treated in a mode of back propagation and the liketi) And (6) optimizing. The design of the convolution kernel solves the problem that the input of the graph convolution is non-euclidean structure data (non-matrix form).
The preset rule may be specifically any one of the following rules, but is not limited to the following rules:
unified label partition rules: the central node and the neighbor nodes belong to a subset (label);
according to the distance division rule: the central node is a subset, and other neighbor nodes are subsets;
dividing rules according to spatial positions: based on the distance from the center node to the center of gravity of the whole skeleton node, the center node is divided into three subsets which are larger than, equal to and smaller than the reference.
The following illustrates a skeleton sequence behavior identification method based on a space-time graph convolutional network according to an embodiment of the present invention, as shown in fig. 4, the flow includes:
(1) the bone sequence is uniformly sampled as the FDNet initialization input key frame sequence (i.e., initialization data).
(2) And determining the network weight omega of the FDNet, and obtaining the state S by using the action a through a Markov decision process.
The initial state s is determined according to the uniform sampling of the bone sequence during initialization1(i.e., determining the initial key frame), randomly determining the initial value of the network weight ω according to at=maxQ(stω) determining an initial action a1
Followed by determining the current state from the last action (e.g., from the initial action a)1Determining the State s2) Determining the action corresponding to the current time (such as a) according to the current state2=maxQ(s2ω)) corresponding to the formula at=maxQ(st,ω)。
In the FDNet training process, ω in the FDNet training process is updated after each use (after an action is determined according to the updated ω, a state is obtained correspondingly, and a key frame sequence is obtained), which is specifically referred to the content in the second section above.
(3) And constructing the corresponding key frame in the state s into a bone space-time diagram according to the content (the bone sequence space-time diagram modeling mode) of the first part.
(4) Training ST-GCN (through operations of calculating loss and/or error, back propagation and the like) by using the skeleton space-time diagram in (3) as an input, wherein the training content comprises the weight function w (v)ti) And extracting space-time characteristics by using the algorithm in the third part (the skeleton sequence-based space-time diagram convolutional network), and then obtaining a behavior recognition result by using a SoftMax function (normalized exponential function).
(5) And (3) reversely adjusting the FDNet (including updating the network weight omega) according to the behavior recognition result in the step (4) by using the algorithm of the second part (the key frame preprocessing network of the space-time graph convolutional network), and optimizing the key frame sequence selection result.
(6) And (2) circularly executing (2) to (5), and cross-adjusting parameters of the two networks (FDNet and ST-GCN) until the behavior recognition result is not obviously changed any more (specifically, the recognition result is not changed any more), and the recognition result is consistent with the skeletal sequence tag (the skeletal sequence in (1)), so as to obtain the final FDNet and ST-GCN.
Specifically, the obtained behavior recognition result may be compared with the tag of the bone sequence in (1) before performing the reverse adjustment FDNet; if the comparison results are consistent and the consistent continuous times (the times of continuous occurrence of the comparison results) reach a preset threshold value N, reverse adjustment is not performed; otherwise, continuing to adjust.
The timing for executing the comparison operation may be after the FDNet is reversely adjusted for a preset number of times in the training; or the training can be performed after the first execution (4) in the training; and is not limited herein.
(7) And (5) carrying out human behavior recognition by using the final FDNet and ST-GCN.
In the embodiment of the invention, two networks (FDNet and ST-GCN) are mutually optimized and mutually promoted, the preprocessing network FDNet provides key frame training data for the time-space diagram convolutional network ST-GCN, and the higher the representativeness of the extracted key frame is, the richer the information content is, the more accurate the trained ST-GCN parameters are; similarly, the higher the ST-GCN recognition result is, the more accurate the data for reversely adjusting FDNet is, and the network is further optimized to obtain a higher-quality key frame sequence.
In the embodiment of the present invention, the depth camera may be used to directly acquire the bone sequence, but not limited thereto.
Therefore, the bone sequence behavior identification method based on the space-time graph convolutional network in the embodiment of the invention comprises the following steps: aiming at the problems that the video behavior identification computation amount is huge, redundant frame information containing noise is contained, and the modeling difficulty is high when a skeleton sequence is used for deep learning behavior identification, the method provides graph modeling of a time-space domain aiming at the skeleton sequence, and provides a time-space graph convolution network containing a key frame preprocessing network, wherein the two networks (the key frame preprocessing network and the time-space graph convolution network) are mutually matched and optimized to identify human behaviors;
the key frame preprocessing network of the time-space graph convolution network comprises the following steps:
the preprocessing network FDNet for extracting the key frames adopts a Markov decision process, performs state conversion by executing different actions to obtain a return function, guides the execution of the next action according to the return function, and performs cyclic operation to determine the key frame sequence with most representativeness and most abundant information content.
The time-space diagram convolution network based on the skeleton sequence comprises the following steps:
and constructing a space-time-space graph convolution network based on a space-time domain graph model of the skeleton sequence. The sampling function of the graph convolution network is essentially the structure of a neighbor node set of a central node and is divided into an intra-frame subset and an inter-frame subset, wherein the intra-frame subset mainly comprises other naturally connected nodes which are within a specified range of the distance from the central node, and the inter-frame subset mainly comprises other nodes which are within a certain range before and after the frame where the central node is located and correspond to the same position. The convolution kernel function designs a weight function in a key mode, nodes in the neighbor nodes are divided into different subsets according to a certain rule, and each subset corresponds to different weight parameters. And finally, carrying out multilayer weighted summation on the time-space domain graph model of the bone sequence according to the sampling function and the convolution kernel function, and extracting time domain and space domain characteristics.
In summary, the embodiment of the invention provides a human body bone sequence behavior identification method based on a space-time diagram convolutional network. The method fully utilizes the nature and the characteristics of natural connection of skeleton nodes to establish a time-space domain graph model, so that the model has stronger generalization capability and does not need to artificially define body parts; the original skeleton sequence is processed by adopting the preprocessing network for extracting the key frames, so that frames with rich information content, more identification degree and representativeness are extracted, the calculated amount of the graph convolution network is reduced, the interference of redundant information is reduced, and the model training efficiency is improved; a time-space graph convolution network based on a skeleton key frame sequence is adopted, and time domain and space domain characteristics of the skeleton sequence are excavated simultaneously, so that the model design complexity is reduced; by adopting an organization method that the pre-training network and the graph convolution network are mutually matched and optimized, the overall efficiency and accuracy of behavior recognition are improved.
The scheme adopts a deep learning method, and solves the problem of model training of large-scale data; the skeleton sequence is described by adopting a time-space domain graph model, the natural connection characteristic of the skeleton sequence is reserved, and the characteristics of richer expressive force can be extracted;
the skeleton sequence is directly obtained by using the depth camera for behavior recognition, so that the calculated amount of deep network skeleton extraction is reduced; the graph convolution network and the SoftMax function are used for recognition and classification, so that the result is more accurate, and the model is more generalized;
the method does not need to manually extract various features, but uses a graph modeling method to describe the whole sequence, and uses a graph convolution method to extract time domain and space domain features, so that the method has fewer links needing manual participation and is more intelligent in model; and a preprocessing network for extracting key frames is added, so that the workload of behavior identification is further reduced.
An embodiment of the present invention further provides a behavior recognition apparatus, as shown in fig. 5, including:
a first obtaining module 51, configured to obtain the key frame sequence data to be identified from the bone sequence data to be identified;
and the first identification module 52 is configured to identify a behavior action corresponding to the to-be-identified key frame sequence data by using a space-time graph convolutional network.
The behavior recognition device provided by the embodiment of the invention obtains the key frame sequence data to be recognized from the bone sequence data to be recognized; identifying behavior actions corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network; redundant information interference can be reduced, and the workload of behavior identification is reduced; the time domain and space domain characteristics of the skeleton sequence are simultaneously excavated, the model design complexity is reduced, and the identification efficiency and accuracy are improved.
Wherein the first identification module comprises: the first construction submodule is used for constructing the key frame sequence data to be identified into a bone sequence space-time diagram to be identified; the first extraction submodule is used for extracting the spatiotemporal features to be identified from the bone sequence spatiotemporal image to be identified by utilizing a spatiotemporal image convolution network; and the first identification submodule is used for identifying the behavior action corresponding to the space-time characteristic to be identified.
Specifically, the first identification submodule includes: and the first processing unit is used for obtaining the behavior action corresponding to the space-time feature to be identified by utilizing the normalized exponential function.
In order to extract a frame with rich information content, more identification and representativeness; in an embodiment of the present invention, the first obtaining module includes: and the first acquisition sub-module is used for acquiring the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network.
Further, the behavior recognition device further includes: the first training module is used for training the frame rectification network and the space-time graph convolution network before acquiring the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network.
Wherein the first training module comprises: the second acquisition submodule is used for acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior action by using the frame rectification network; the first processing submodule is used for training the space-time graph convolutional network by utilizing the key frame sequence training data and identifying a training behavior action corresponding to the key frame sequence training data by utilizing the trained space-time graph convolutional network; the first adjusting submodule is used for adjusting the frame rectification network reversely according to the training behavior action;
the third acquisition submodule is used for acquiring the key frame sequence training data from the bone sequence data corresponding to the preset behavior action again by using the adjusted frame rectification network; and the first training submodule is used for retraining the space-time diagram convolutional network again by using the adjusted key frame sequence training data acquired by the frame rectification network.
Further, the behavior recognition device further includes: the second identification module is used for identifying the training behavior action corresponding to the adjusted key frame sequence training data obtained by the frame rectification network by using the retrained space-time graph convolution network after retraining the space-time graph convolution network by using the adjusted key frame sequence training data obtained by the frame rectification network; the first processing module is used for storing the parameter information of the frame rectification network and the space-time diagram convolution network if the continuous times that the obtained training behavior action is consistent with the preset behavior action are larger than a first threshold value; otherwise, returning to the action according to the training behavior, and reversely adjusting the operation of the frame rectification network.
Wherein the adjusting the frame rectification network in reverse direction according to the training behavior action comprises: acquiring a return function value according to the training behavior action; obtaining a loss function according to the return function value; and carrying out gradient descent on the loss function, and updating the network weight of the frame rectification network.
In particular, the loss function is specifically represented by l- α (r)t+γ×max Q(st+1,ω)-Q(stω), where l represents a loss function, α represents a predetermined learning rate, rtRepresents the value of a return function when the frame rectification network is trained for the t time, gamma represents a preset attenuation value, Q(s)t+1ω) and Q(s)tω) each represent a function of motion value, st+1Representing the bone sequence state, s, corresponding to the key frame sequence training data when the frame rectification network is trained for the t +1 th timetAnd representing the bone sequence state corresponding to the key frame sequence training data when the frame rectification network is trained for the t time, wherein omega represents the network weight of the frame rectification network.
The implementation embodiments of the behavior recognition method are all suitable for the embodiment of the behavior recognition device, and the same technical effects can be achieved.
An embodiment of the present invention further provides a behavior recognition apparatus, as shown in fig. 6, including a memory 61, a processor 62, and a computer program 63 stored on the memory 61 and executable on the processor; the processor 62, when executing the program, implements the behavior recognition method described above.
The implementation embodiments of the behavior recognition method are all suitable for the embodiment of the behavior recognition device, and the same technical effect can be achieved.
Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the behavior recognition method.
The implementation embodiments of the behavior recognition method are all applicable to the embodiment of the computer-readable storage medium, and the same technical effects can be achieved.
It should be noted that many of the functional components described in this specification are referred to as modules/sub-modules/units in order to more particularly emphasize their implementation independence.
In embodiments of the present invention, the modules/sub-modules/units may be implemented in software for execution by various types of processors. An identified module of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions which may, for instance, be constructed as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different bits which, when joined logically together, comprise the module and achieve the stated purpose for the module.
Indeed, a module of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Likewise, operational data may be identified within the modules and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, merely as electronic signals on a system or network.
When a module can be implemented by software, considering the level of existing hardware technology, a module that can be implemented by software can build corresponding hardware circuits including conventional very large scale integration (V L SI) circuits or gate arrays and existing semiconductors such as logic chips, transistors, or other discrete components to implement corresponding functions, without considering the cost.
While the preferred embodiments of the present invention have been described, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (20)

1. A method of behavior recognition, comprising:
acquiring key frame sequence data to be identified from bone sequence data to be identified;
and identifying the behavior action corresponding to the key frame sequence data to be identified by utilizing a time-space graph convolutional network.
2. The behavior recognition method according to claim 1, wherein the recognizing the behavior action corresponding to the sequence of the key frames to be recognized by using a space-time graph convolutional network comprises:
constructing the key frame sequence data to be identified into a bone sequence space-time diagram to be identified;
extracting a spatiotemporal feature to be identified from the bone sequence spatiotemporal image to be identified by utilizing a spatiotemporal image convolution network;
and identifying the behavior action corresponding to the space-time characteristic to be identified.
3. The behavior recognition method according to claim 2, wherein the recognizing the behavior action corresponding to the spatiotemporal feature to be recognized comprises:
and obtaining the behavior action corresponding to the space-time characteristic to be identified by utilizing the normalized exponential function.
4. The behavior recognition method according to claim 1, wherein the obtaining of the sequence of key frames data to be recognized from the bone sequence data to be recognized comprises:
and acquiring the key frame sequence data to be identified from the bone sequence data to be identified by utilizing a frame rectification network.
5. The behavior recognition method according to claim 4, wherein before the step of obtaining the sequence data of the key frame to be recognized from the sequence data of the bone to be recognized by using the frame rectification network, the method further comprises the following steps:
and training the frame rectification network and the space-time graph convolution network.
6. The behavior recognition method of claim 5, wherein the training the frame rectification network and the space-time graph convolution network comprises:
acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior and action by using the frame rectification network;
training the space-time graph convolutional network by using the key frame sequence training data, and identifying training behavior actions corresponding to the key frame sequence training data by using the trained space-time graph convolutional network;
adjusting the frame rectification network reversely according to the training behavior action;
acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior action again by using the adjusted frame rectification network;
and training the time-space diagram convolutional network again by using the adjusted key frame sequence training data acquired by the frame rectifying network.
7. The behavior recognition method according to claim 6, further comprising, after retraining the space-time graph convolutional network with the adjusted training data of the sequence of key frames obtained by the frame rectification network, the following steps:
identifying training behavior actions corresponding to the adjusted key frame sequence training data obtained by the frame rectification network by using the retrained space-time graph convolution network;
if the obtained continuous times of the training behavior action consistent with the preset behavior action are larger than a first threshold value, storing parameter information of the frame rectification network and the space-time diagram convolution network; otherwise, returning to the action according to the training behavior, and reversely adjusting the operation of the frame rectification network.
8. The behavior recognition method according to claim 6 or 7, wherein the adjusting the frame rectification network in reverse according to the training behavior action comprises:
acquiring a return function value according to the training behavior action;
obtaining a loss function according to the return function value;
and carrying out gradient descent on the loss function, and updating the network weight of the frame rectification network.
9. The behavior recognition method according to claim 8, wherein the loss function is specifically:
l=α(rt+γ×maxQ(st+1,ω)-Q(st,ω));
where l denotes a loss function, α denotes a preset learning speed, rtRepresenting the value of a return function when the frame rectification network is trained for the t time, gamma represents a preset attenuation value,
Q(st+1ω) and Q(s)tω) each represent a function of motion value, st+1Representing the bone sequence state, s, corresponding to the key frame sequence training data when the frame rectification network is trained for the t +1 th timetAnd representing the bone sequence state corresponding to the key frame sequence training data when the frame rectification network is trained for the t time, wherein omega represents the network weight of the frame rectification network.
10. A behavior recognition apparatus, comprising:
the first acquisition module is used for acquiring the key frame sequence data to be identified from the bone sequence data to be identified;
and the first identification module is used for identifying the behavior action corresponding to the key frame sequence data to be identified by utilizing a space-time graph convolutional network.
11. The behavior recognition device according to claim 10, wherein the first recognition module includes:
the first construction submodule is used for constructing the key frame sequence data to be identified into a bone sequence space-time diagram to be identified;
the first extraction submodule is used for extracting the spatiotemporal features to be identified from the bone sequence spatiotemporal image to be identified by utilizing a spatiotemporal image convolution network;
and the first identification submodule is used for identifying the behavior action corresponding to the space-time characteristic to be identified.
12. The behavior recognition device according to claim 11, wherein the first recognition submodule includes:
and the first processing unit is used for obtaining the behavior action corresponding to the space-time feature to be identified by utilizing the normalized exponential function.
13. The behavior recognition device according to claim 10, wherein the first obtaining module includes:
and the first acquisition sub-module is used for acquiring the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network.
14. The behavior recognition device according to claim 13, further comprising:
the first training module is used for training the frame rectification network and the space-time graph convolution network before acquiring the key frame sequence data to be identified from the bone sequence data to be identified by using the frame rectification network.
15. The behavior recognition device of claim 14, wherein the first training module comprises:
the second acquisition submodule is used for acquiring key frame sequence training data from the bone sequence data corresponding to the preset behavior action by using the frame rectification network;
the first processing submodule is used for training the space-time graph convolutional network by utilizing the key frame sequence training data and identifying a training behavior action corresponding to the key frame sequence training data by utilizing the trained space-time graph convolutional network;
the first adjusting submodule is used for adjusting the frame rectification network reversely according to the training behavior action;
the third acquisition submodule is used for acquiring the key frame sequence training data from the bone sequence data corresponding to the preset behavior action again by using the adjusted frame rectification network;
and the first training submodule is used for retraining the space-time diagram convolutional network again by using the adjusted key frame sequence training data acquired by the frame rectification network.
16. The behavior recognition device according to claim 15, further comprising:
the second identification module is used for identifying the training behavior action corresponding to the adjusted key frame sequence training data obtained by the frame rectification network by using the retrained space-time graph convolution network after retraining the space-time graph convolution network by using the adjusted key frame sequence training data obtained by the frame rectification network;
the first processing module is used for storing the parameter information of the frame rectification network and the space-time diagram convolution network if the continuous times that the obtained training behavior action is consistent with the preset behavior action are larger than a first threshold value; otherwise, returning to the action according to the training behavior, and reversely adjusting the operation of the frame rectification network.
17. The behavior recognition device according to claim 15 or 16, wherein the adjusting the frame rectification network in reverse according to the training behavior action comprises:
acquiring a return function value according to the training behavior action;
obtaining a loss function according to the return function value;
and carrying out gradient descent on the loss function, and updating the network weight of the frame rectification network.
18. The behavior recognition device according to claim 17, wherein the loss function is specifically:
l=α(rt+γ×maxQ(st+1,ω)-Q(st,ω));
where l denotes a loss function, α denotes a preset learning speed, rtRepresenting the value of a return function when the frame rectification network is trained for the t time, gamma represents a preset attenuation value,
Q(st+1ω) and Q(s)tω) each represent a function of motion value, st+1Representing the bone sequence state, s, corresponding to the key frame sequence training data when the frame rectification network is trained for the t +1 th timetAnd representing the bone sequence state corresponding to the key frame sequence training data when the frame rectification network is trained for the t time, wherein omega represents the network weight of the frame rectification network.
19. A behavior recognition device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor; characterized in that the processor implements the behavior recognition method according to any one of claims 1 to 9 when executing the program.
20. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for behavior recognition according to one of claims 1 to 9.
CN201910000953.0A 2019-01-02 2019-01-02 Behavior identification method, device and equipment Active CN111401106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910000953.0A CN111401106B (en) 2019-01-02 2019-01-02 Behavior identification method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910000953.0A CN111401106B (en) 2019-01-02 2019-01-02 Behavior identification method, device and equipment

Publications (2)

Publication Number Publication Date
CN111401106A true CN111401106A (en) 2020-07-10
CN111401106B CN111401106B (en) 2023-03-31

Family

ID=71430152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910000953.0A Active CN111401106B (en) 2019-01-02 2019-01-02 Behavior identification method, device and equipment

Country Status (1)

Country Link
CN (1) CN111401106B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070027A (en) * 2020-09-09 2020-12-11 腾讯科技(深圳)有限公司 Network training and action recognition method, device, equipment and storage medium
CN112069979A (en) * 2020-09-03 2020-12-11 浙江大学 Real-time action recognition man-machine interaction system
CN112380955A (en) * 2020-11-10 2021-02-19 浙江大华技术股份有限公司 Action recognition method and device
CN112528811A (en) * 2020-12-02 2021-03-19 建信金融科技有限责任公司 Behavior recognition method and device
CN113128424A (en) * 2021-04-23 2021-07-16 浙江理工大学 Attention mechanism-based graph convolution neural network action identification method
CN113139469A (en) * 2021-04-25 2021-07-20 武汉理工大学 Driver road stress adjusting method and system based on micro-expression recognition
CN113378656A (en) * 2021-05-24 2021-09-10 南京信息工程大学 Action identification method and device based on self-adaptive graph convolution neural network
CN113989927A (en) * 2021-10-27 2022-01-28 东北大学 Video group violent behavior identification method and system based on skeleton data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN107169415A (en) * 2017-04-13 2017-09-15 西安电子科技大学 Human motion recognition method based on convolutional neural networks feature coding
CN108229355A (en) * 2017-12-22 2018-06-29 北京市商汤科技开发有限公司 Activity recognition method and apparatus, electronic equipment, computer storage media, program
WO2018119807A1 (en) * 2016-12-29 2018-07-05 浙江工商大学 Depth image sequence generation method based on convolutional neural network and spatiotemporal coherence
CN108304795A (en) * 2018-01-29 2018-07-20 清华大学 Human skeleton Activity recognition method and device based on deeply study

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
WO2018119807A1 (en) * 2016-12-29 2018-07-05 浙江工商大学 Depth image sequence generation method based on convolutional neural network and spatiotemporal coherence
CN107169415A (en) * 2017-04-13 2017-09-15 西安电子科技大学 Human motion recognition method based on convolutional neural networks feature coding
CN108229355A (en) * 2017-12-22 2018-06-29 北京市商汤科技开发有限公司 Activity recognition method and apparatus, electronic equipment, computer storage media, program
CN108304795A (en) * 2018-01-29 2018-07-20 清华大学 Human skeleton Activity recognition method and device based on deeply study

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
丰艳等: "基于时空注意力深度网络的视角无关性骨架行为识别", 《计算机辅助设计与图形学学报》 *
王珂等: "一种融合全局时空特征的CNNs动作识别方法", 《华中科技大学学报(自然科学版)》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069979A (en) * 2020-09-03 2020-12-11 浙江大学 Real-time action recognition man-machine interaction system
CN112069979B (en) * 2020-09-03 2024-02-02 浙江大学 Real-time action recognition man-machine interaction system
CN112070027B (en) * 2020-09-09 2022-08-26 腾讯科技(深圳)有限公司 Network training and action recognition method, device, equipment and storage medium
CN112070027A (en) * 2020-09-09 2020-12-11 腾讯科技(深圳)有限公司 Network training and action recognition method, device, equipment and storage medium
CN112380955A (en) * 2020-11-10 2021-02-19 浙江大华技术股份有限公司 Action recognition method and device
CN112380955B (en) * 2020-11-10 2023-06-16 浙江大华技术股份有限公司 Action recognition method and device
CN112528811A (en) * 2020-12-02 2021-03-19 建信金融科技有限责任公司 Behavior recognition method and device
CN113128424A (en) * 2021-04-23 2021-07-16 浙江理工大学 Attention mechanism-based graph convolution neural network action identification method
CN113128424B (en) * 2021-04-23 2024-05-03 浙江理工大学 Method for identifying action of graph convolution neural network based on attention mechanism
CN113139469B (en) * 2021-04-25 2022-04-29 武汉理工大学 Driver road stress adjusting method and system based on micro-expression recognition
CN113139469A (en) * 2021-04-25 2021-07-20 武汉理工大学 Driver road stress adjusting method and system based on micro-expression recognition
CN113378656A (en) * 2021-05-24 2021-09-10 南京信息工程大学 Action identification method and device based on self-adaptive graph convolution neural network
CN113378656B (en) * 2021-05-24 2023-07-25 南京信息工程大学 Action recognition method and device based on self-adaptive graph convolution neural network
CN113989927A (en) * 2021-10-27 2022-01-28 东北大学 Video group violent behavior identification method and system based on skeleton data
CN113989927B (en) * 2021-10-27 2024-04-26 东北大学 Method and system for identifying violent behaviors of video group based on bone data

Also Published As

Publication number Publication date
CN111401106B (en) 2023-03-31

Similar Documents

Publication Publication Date Title
CN111401106B (en) Behavior identification method, device and equipment
Liu et al. Recognizing human actions as the evolution of pose estimation maps
CN109685037B (en) Real-time action recognition method and device and electronic equipment
CN111291809B (en) Processing device, method and storage medium
CN109948741A (en) A kind of transfer learning method and device
CN108399435B (en) Video classification method based on dynamic and static characteristics
CN109902565B (en) Multi-feature fusion human behavior recognition method
CN111539941B (en) Parkinson&#39;s disease leg flexibility task evaluation method and system, storage medium and terminal
CN111199207B (en) Two-dimensional multi-human body posture estimation method based on depth residual error neural network
CN110477907B (en) Modeling method for intelligently assisting in recognizing epileptic seizures
CN111105439A (en) Synchronous positioning and mapping method using residual attention mechanism network
CN114419732A (en) HRNet human body posture identification method based on attention mechanism optimization
CN114241270A (en) Intelligent monitoring method, system and device for home care
CN114743273A (en) Human skeleton behavior identification method and system based on multi-scale residual error map convolutional network
CN113205043B (en) Video sequence two-dimensional attitude estimation method based on reinforcement learning
CN118135660A (en) Cross-view gait recognition method for joint multi-view information bottleneck under view-angle deficiency condition
CN112199994B (en) Method and device for detecting interaction of3D hand and unknown object in RGB video in real time
CN117935362A (en) Human behavior recognition method and system based on heterogeneous skeleton diagram
Zhu et al. Dance Action Recognition and Pose Estimation Based on Deep Convolutional Neural Network.
CN116665300A (en) Skeleton action recognition method based on space-time self-adaptive feature fusion graph convolution network
CN116824689A (en) Bone sequence behavior recognition method, device, equipment and storage medium
CN113158870B (en) Antagonistic training method, system and medium of 2D multi-person gesture estimation network
CN114882595A (en) Armed personnel behavior identification method and armed personnel behavior identification system
Tsai et al. Temporal-variation skeleton point correction algorithm for improved accuracy of human action recognition
Benhamida et al. Theater Aid System for the Visually Impaired Through Transfer Learning of Spatio-Temporal Graph Convolution Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant