CN113239819A - Visual angle normalization-based skeleton behavior identification method, device and equipment - Google Patents

Visual angle normalization-based skeleton behavior identification method, device and equipment Download PDF

Info

Publication number
CN113239819A
CN113239819A CN202110538744.9A CN202110538744A CN113239819A CN 113239819 A CN113239819 A CN 113239819A CN 202110538744 A CN202110538744 A CN 202110538744A CN 113239819 A CN113239819 A CN 113239819A
Authority
CN
China
Prior art keywords
network
skeleton
sub
data
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110538744.9A
Other languages
Chinese (zh)
Other versions
CN113239819B (en
Inventor
谢雪梅
赵至夫
潘庆哲
李佳楠
曹玉晗
石光明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Institute of Technology of Xidian University
Original Assignee
Guangzhou Institute of Technology of Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Institute of Technology of Xidian University filed Critical Guangzhou Institute of Technology of Xidian University
Priority to CN202110538744.9A priority Critical patent/CN113239819B/en
Publication of CN113239819A publication Critical patent/CN113239819A/en
Application granted granted Critical
Publication of CN113239819B publication Critical patent/CN113239819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a skeleton behavior identification method based on visual angle normalization, which comprises the following steps: training a pre-constructed generated countermeasure network; wherein the pre-constructed generation countermeasure network is composed of a generation sub-network and a discrimination sub-network; training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain a trained skeleton behavior recognition network; acquiring human skeleton sequence data to be identified; and inputting the human body skeleton sequence data into the trained skeleton behavior recognition network to obtain a behavior recognition result. Correspondingly, the invention also discloses a visual angle normalization-based skeleton behavior identification device, a computer-readable storage medium and electronic equipment. By adopting the technical scheme of the invention, the problem of intra-skeleton expression intra-class difference caused by visual angle diversity can be solved, and the accuracy of behavior identification is improved.

Description

Visual angle normalization-based skeleton behavior identification method, device and equipment
Technical Field
The invention relates to the technical field of computer vision, in particular to a skeleton behavior identification method and device based on visual angle normalization, a computer readable storage medium and electronic equipment.
Background
Behavior recognition is an important and challenging task in computer vision tasks, and has wide application in the fields of security monitoring, intelligent video analysis, human-computer interaction and the like. With the development of human posture estimation technology, behavior recognition based on human skeleton is more and more popular.
The behavior recognition scheme based on the human skeleton disclosed in the prior art specifically comprises the following steps: firstly, skeleton data to be recognized are obtained, then the skeleton data are normalized, the unification of data distribution is achieved, finally, behavior recognition is conducted according to the skeleton data after normalization processing and a preset network model, wherein when the skeleton data are normalized, generally, coordinates of a certain skeleton point are selected as a reference, for example, coordinates of a spine joint are selected as a reference, coordinates of the spine joint are subtracted from coordinates of each remaining skeleton point, so that the skeleton data are converted into a human body coordinate system from a camera coordinate system, then, skeleton rotation operation is conducted, three-dimensional vectors from a right shoulder joint to a left shoulder joint are enabled to be parallel to an X axis of a coordinate space, and three-dimensional vectors from a spine base joint to the spine joint are enabled to be parallel to a Y axis of the coordinate space.
Although the technical scheme can reduce the intra-class difference of the skeleton data, the method also reduces the inter-class difference of the skeleton data and destroys the dynamic characteristics of actions related to the right shoulder joint, the left shoulder joint, the spine base joint and the spine joint, so that the visual angle normalization is incomplete, the skeleton expression difference cannot be well reflected, and the accuracy of behavior identification is low.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a skeleton behavior identification method and apparatus based on view angle normalization, a computer-readable storage medium, and an electronic device, to solve the problem of intra-skeleton expression heterogeneity caused by view angle diversification, and to improve the accuracy of behavior identification.
In order to solve the above technical problem, an embodiment of the present invention provides a skeleton behavior identification method based on view angle normalization, including:
training a pre-constructed generated countermeasure network; wherein the pre-constructed generation countermeasure network is composed of a generation sub-network and a discrimination sub-network;
training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain a trained skeleton behavior recognition network;
acquiring human skeleton sequence data to be identified;
and inputting the human body skeleton sequence data into the trained skeleton behavior recognition network to obtain a behavior recognition result.
Further, the generation sub-network comprises a first graph volume layer, a sixth graph volume layer, a first global average pooling layer, a first full-link layer and a skeleton rotation layer; the discrimination sub-network includes seventh to twelfth graph convolutional layers, a second global average pooling layer, and a second fully-connected layer.
Further, the function expression of the skeleton rotation layer is as follows: v'm,n=RxRyRzvm,n(ii) a Wherein v ism,nCoordinate data before rotation representing the nth skeleton point in the mth frame image,v'm,nrepresenting the rotated coordinate data of the nth skeleton point in the mth frame image, wherein m is more than or equal to 1, and n is more than or equal to 1; rx、RyAnd RzRespectively represent rotation matrices in x-axis, y-axis and z-axis directions, and
Figure BDA0003071372540000021
Figure BDA0003071372540000022
θx、θyand thetazThe rotation angles in the x-axis, y-axis and z-axis directions are respectively indicated.
Further, the training of the pre-constructed generated countermeasure network specifically includes:
acquiring a first training data set; the first training data set comprises an input sample set and a real sample set, the input sample set comprises a plurality of side-view angle skeleton sample data, and the real sample set comprises a plurality of front-view angle skeleton sample data;
inputting the data in the input sample set into a pre-constructed generation sub-network to obtain a false sample set;
inputting the data in the false sample set and the data in the real sample set into a pre-constructed discrimination sub-network to obtain a sample discrimination result;
training the pre-constructed generation sub-network and the pre-constructed discrimination sub-network according to the sample discrimination result and a preset first loss function by adopting a gradient descent method to obtain a trained generation countermeasure network; wherein the trained generative confrontation network comprises the trained generative subnetwork and the trained discriminative subnetwork.
Further, the first loss function is:
Figure BDA0003071372540000031
wherein G represents a generation sub-network, D represents a discrimination sub-network, x represents a real sample, z represents a side-view skeleton sample, G (z) represents a dummy sample, Pnorm(x)Represents positiveProbability distribution of view angle skeleton samples, Pside(z)Representing the probability distribution of a side view angle skeleton sample.
Further, the behavior classification sub-network includes thirteenth through twenty-second graph convolutional layers, a third global average pooling layer, and a third fully-connected layer.
Further, the training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain the trained skeleton behavior recognition network specifically includes:
acquiring a second training data set; the second training data set comprises a plurality of skeleton sample data, and the skeleton sample data comprises coordinate data of skeleton points and behavior labels of skeleton samples;
inputting the data in the second training data set into the trained generation sub-network to obtain a visual angle normalized skeleton sample set;
inputting the data in the visual angle normalization skeleton sample set into a pre-constructed behavior classification sub-network to obtain a sample classification result;
and training the trained generation sub-network and the pre-constructed behavior classification sub-network according to the sample classification result and a preset second loss function by adopting a gradient descent method to obtain the trained skeleton behavior recognition network.
In order to solve the above technical problem, an embodiment of the present invention further provides a skeleton behavior recognition apparatus based on view angle normalization, including:
the generation confrontation network training module is used for training a pre-constructed generation confrontation network; wherein the pre-constructed generation countermeasure network is composed of a generation sub-network and a discrimination sub-network;
the behavior recognition network training module is used for training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain a trained skeleton behavior recognition network;
the framework sequence data acquisition module is used for acquiring human framework sequence data to be identified;
and the human body skeleton behavior recognition module is used for inputting the human body skeleton sequence data into the trained skeleton behavior recognition network to obtain a behavior recognition result.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; when the computer program runs, the device where the computer-readable storage medium is located is controlled by the computer program to execute any one of the above skeleton behavior identification methods based on perspective normalization.
An embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor, when executing the computer program, implements any one of the above skeleton behavior identification methods based on perspective normalization.
Compared with the prior art, the embodiment of the invention provides a visual angle normalization-based skeleton behavior recognition method, a visual angle normalization-based skeleton behavior recognition device, a computer-readable storage medium and an electronic device.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of a method for identifying skeleton behavior based on view normalization according to the present invention;
fig. 2 is a block diagram of a framework behavior recognition apparatus based on perspective normalization according to a preferred embodiment of the present invention;
fig. 3 is a block diagram of an electronic device according to a preferred embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, are within the scope of the present invention.
The embodiment of the present invention provides a skeleton behavior identification method based on perspective normalization, which is a flowchart of a preferred embodiment of the skeleton behavior identification method based on perspective normalization provided by the present invention, as shown in fig. 1, and the method includes steps S11 to S14:
step S11, training a pre-constructed generated countermeasure network; wherein the pre-constructed generation countermeasure network is composed of a generation sub-network and a discrimination sub-network;
step S12, training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain a trained skeleton behavior recognition network;
step S13, obtaining human body skeleton sequence data to be identified;
and step S14, inputting the human body skeleton sequence data into the trained skeleton behavior recognition network to obtain a behavior recognition result.
Specifically, the embodiment of the invention builds a generation sub-network, a judgment sub-network and a behavior classification sub-network in advance, the generation sub-network and the judgment sub-network jointly form a generation countermeasure network, and the generation sub-network and the behavior classification sub-network jointly form a skeleton behavior identification network; in order to obtain a trained framework behavior recognition network, training a pre-constructed generation countermeasure network to correspondingly obtain the trained generation countermeasure network (including a trained generation sub-network and a trained discrimination sub-network), and then training the trained generation sub-network and a pre-constructed behavior classification sub-network to correspondingly obtain the trained framework behavior recognition network; when the actual behavior recognition is carried out according to the human body skeleton, the human body skeleton sequence data to be recognized are obtained firstly, and then the obtained human body skeleton sequence data are input into the trained skeleton behavior recognition network, so that the behavior recognition result is obtained correspondingly.
The method includes acquiring an RGB image sequence by using a camera, extracting a skeleton sequence from the RGB image sequence by using a skeleton posture estimation tool, namely, correspondingly acquiring human skeleton sequence data to be recognized, where each image corresponds to a set of human skeleton data, where the human skeleton data specifically includes a certain number of skeleton points (e.g., 25) in the image, and each skeleton point data is three-dimensional coordinate data corresponding to a position where the skeleton point is located.
It should be noted that, in the embodiment of the present invention, the generation countermeasure network is trained, and the trained generation sub-network and the trained discrimination sub-network are obtained accordingly, where the discrimination sub-network only functions in the training process of generating the countermeasure network, which is helpful for obtaining a better trained generation sub-network, and when the framework behavior recognition network is trained, the trained generation sub-network is retrained again, so that a better trained framework behavior recognition network can be obtained.
According to the visual angle normalization-based skeleton behavior recognition method provided by the embodiment of the invention, a pre-constructed generation countermeasure network is trained, the pre-constructed generation countermeasure network is composed of a generation sub-network and a judgment sub-network, the trained generation sub-network and a pre-constructed behavior classification sub-network are trained to obtain a trained skeleton behavior recognition network, and then obtained human skeleton sequence data are input into the trained skeleton behavior recognition network to obtain a behavior recognition result, so that the problem of intra-skeleton expression intra-class difference caused by visual angle diversity can be solved, and the accuracy of behavior recognition is improved.
In another preferred embodiment, the generation subnetwork comprises first through sixth graph convolutional layers, a first global average pooling layer, a first fully-connected layer, and a skeleton rotation layer; the discrimination sub-network includes seventh to twelfth graph convolutional layers, a second global average pooling layer, and a second fully-connected layer.
Specifically, in combination with the above embodiment, the pre-established generation sub-network is a 9-layer network, and the structure thereof sequentially includes: the first graph volume lamination layer → the second graph volume lamination layer → the third graph volume lamination layer → the fourth graph volume lamination layer → the fifth graph volume lamination layer → the sixth graph volume lamination layer → the first global average pooling layer → the first full-link layer → the framework rotation layer, the pre-built judgment sub-network is an 8-layer network, and the structure sequentially comprises the following components: the seventh convolution layer → the eighth convolution layer → the ninth convolution layer → the tenth convolution layer → the eleventh convolution layer → the twelfth convolution layer → the second global average pooling layer → the second global connection layer.
The parameters of each layer of the network can be set as: setting the sizes of spatial convolution kernels of 12 graph convolution layers to be 1 multiplied by 1, setting convolution step sizes to be 1, setting the sizes of temporal convolution kernels to be 9 multiplied by 1, setting convolution step sizes to be 1, 2, 1, 2 and 1 in sequence, keeping the number of the spatial convolution kernels and the temporal convolution kernels of each layer consistent, and setting the numbers to be 16, 32, 64, 16, 32, 64 and 64 in sequence; setting the number of output neurons of the first full-connection layer to be 3, namely the three-dimensional angle of the skeleton data needing to rotate; the number of output neurons of the second fully-connected layer is set to 1, i.e. the probability of discriminating the input of the subnetwork as a true sample.
It should be noted that the skeleton sequence data is essentially a graph structure data, and the graph convolution layer can well model the skeleton sequence to perform feature extraction, so the embodiment of the present invention uses the graph convolution layer to build a corresponding network model.
As an improvement of the above scheme, a functional expression of the skeleton rotation layer is as follows: v'm,n=RxRyRzvm,n(ii) a Wherein v ism,nRepresents the coordinate data v 'before the rotation of the n-th skeleton point in the m-th frame image'm,nRepresenting the rotated coordinate data of the nth skeleton point in the mth frame image, wherein m is more than or equal to 1, and n is more than or equal to 1; rx、RyAnd RzRespectively represent rotation matrices in x-axis, y-axis and z-axis directions, and
Figure BDA0003071372540000071
Figure BDA0003071372540000072
θx、θyand thetazThe rotation angles in the x-axis, y-axis and z-axis directions are respectively indicated.
Specifically, in combination with the above embodiments, the last layer of the generated sub-network is a skeleton rotation layer for performing view angle normalization processing on skeleton data, and the three-dimensional rotation angle is set to be (θ)x,θy,θz),θx、θyAnd thetazThe data of the skeleton data are respectively expressed by rotation angles corresponding to the rotation of the skeleton data around an x axis, a y axis and a z axis, and the data of the skeleton data are set as V ═ Vm,nM is more than or equal to 1 and less than or equal to M, N is more than or equal to 1 and less than or equal to N, M represents the total number of frames of the images in the image sequence corresponding to the frame sequence, N represents the number of the frame points in each frame of image, and the number is determined according to thetax、θyAnd thetazRespectively defining rotation matrixes corresponding to the rotation of the skeleton data around the x axis, the y axis and the z axis as Rx、RyAnd RzThe specific form of the rotation matrix is
Figure BDA0003071372540000081
Then, each of the skeleton sequence data V may be rotated by a skeleton rotation layer with respect to any one of the skeleton data Vm,nRotating the cylinder, wherein the corresponding rotating formula is v'm,n=RxRyRzvm,n,vm,nSpecifically, the three-dimensional coordinate data v 'before rotation of the nth skeleton point in the mth frame image'm,nSpecifically representing the rotated three-dimensional coordinate data of the nth skeleton point in the mth frame image, and after the rotation operation is performed on each skeleton data in the skeleton sequence data V, the output of the skeleton rotation layer obtained accordingly is V '═ V'm,n|1≤m≤M,1≤n≤N}。
In another preferred embodiment, the training of the pre-constructed generated countermeasure network specifically includes:
acquiring a first training data set; the first training data set comprises an input sample set and a real sample set, the input sample set comprises a plurality of side-view angle skeleton sample data, and the real sample set comprises a plurality of front-view angle skeleton sample data;
inputting the data in the input sample set into a pre-constructed generation sub-network to obtain a false sample set;
inputting the data in the false sample set and the data in the real sample set into a pre-constructed discrimination sub-network to obtain a sample discrimination result;
training the pre-constructed generation sub-network and the pre-constructed discrimination sub-network according to the sample discrimination result and a preset first loss function by adopting a gradient descent method to obtain a trained generation countermeasure network; wherein the trained generative confrontation network comprises the trained generative subnetwork and the trained discriminative subnetwork.
Specifically, with reference to the above embodiment, when training a pre-constructed generated countermeasure network, a first training data set is first obtained from a preset data set, where the first training data set includes an input sample set and a real sample set, the input sample set includes a plurality of (e.g., 10000) side view angle skeleton sample data, and the real sample set includes a plurality of (e.g., 10000) front view angle skeleton sample data; inputting the side-view angle skeleton sample data in the obtained input sample set into a pre-constructed generation sub-network, outputting a false sample set by a skeleton rotating layer, inputting the data in the obtained false sample set and the front-view angle skeleton sample data in the real sample set into a pre-constructed discrimination sub-network to discriminate whether the sample is true or false, and correspondingly obtaining a sample discrimination result; and finally, updating parameters of a pre-constructed generation sub-network and a pre-constructed discrimination sub-network according to the obtained sample discrimination result and a preset first loss function by adopting a gradient descent method, assigning the updated parameter values to corresponding network parameters of each layer, and correspondingly obtaining a trained generation countermeasure network, wherein the trained generation countermeasure network comprises the trained generation sub-network and the trained discrimination sub-network.
As an improvement of the above solution, the first loss function is:
Figure BDA0003071372540000091
wherein G represents a generation sub-network, D represents a discrimination sub-network, x represents a real sample, z represents a side-view skeleton sample, G (z) represents a dummy sample, Pnorm(x)Representing the probability distribution, P, of a normal-view skeleton sampleside(z)Representing the probability distribution of a side view angle skeleton sample.
Specifically, with reference to the above embodiment, the functional expression of the first loss function is specifically:
Figure BDA0003071372540000092
g denotes a generator sub-network, D denotes a discriminant sub-network, x denotes a true sample, z denotes a side-view skeleton sample, P denotes a side-view skeleton samplenorm(x)Representing the probability distribution, P, of a normal-view skeleton sampleside(z)The probability distribution of the side view angle skeleton samples is shown, G (z) shows the output of the side view angle skeleton samples z of the generation sub-network G, namely false samples obtained after the side view angle skeleton samples are input into the generation sub-network, D (x) shows the output of the discrimination sub-network D on the true samples x, and D (G (z)) shows the output of the discrimination sub-network D on the false samples G (z).
It should be noted that, the specific steps of generating the update of the network parameters of the countermeasure network by using the gradient descent method are as follows:
(1) setting a learning rate alpha to 0.0002 in advance, setting an iteration time threshold to 30000, and setting the number of skeleton samples selected in a training set by each iteration to 32;
(2) let the parameter dimension of the generating subnetwork be N1Discriminating the parameter dimension of the subnetwork to be N2Respectively obtaining the output D (x) of the discrimination sub-network D to the real sample x and the output D (G (z)) of the discrimination sub-network D to the false sample G (z), and calculating the loss by combining the first loss functionThe gradient of the corresponding obtained generation subnetwork is expressed as the loss value J
Figure BDA0003071372540000101
The gradient of the discriminating subnetwork is expressed as
Figure BDA0003071372540000102
(3) Updating the network parameters of the generating sub-network according to the gradient vector of the generating sub-network, wherein the network parameter updating formula of the generating sub-network is as follows
Figure BDA0003071372540000103
θgIndicates N before update1The parameters of the formation sub-network are maintained,
Figure BDA0003071372540000104
representing updated N1Dimension generating sub-network parameters;
(4) updating the network parameters of the discrimination sub-network according to the gradient vector of the discrimination sub-network, wherein the network parameters of the discrimination sub-network are updated according to the formula
Figure BDA0003071372540000105
θdIndicates N before update2The dimensions identify the parameters of the sub-network,
Figure BDA0003071372540000106
representing updated N2Dimension discrimination sub-network parameters;
(5) after each time of updating the generation sub-network and the judgment sub-network, judging whether the current iteration number reaches a preset iteration number threshold value 30000; if so, stopping updating the network parameters to obtain a trained generated countermeasure network; otherwise, repeating the steps (2) to (4) until the iteration number reaches 30000, and obtaining the trained generated countermeasure network.
In yet another preferred embodiment, the behavior classification subnetwork includes a thirteenth through twenty-second graph convolutional layer, a third global average pooling layer, and a third fully-connected layer.
Specifically, in combination with the above embodiment, the pre-established behavior classification sub-network is a 12-layer network, and the structure thereof sequentially includes: a thirteenth graphic convolution layer → a fourteenth graphic convolution layer → a fifteenth graphic convolution layer → a sixteenth graphic convolution layer → a seventeenth graphic convolution layer → an eighteenth graphic convolution layer → a nineteenth graphic convolution layer → a twentieth graphic convolution layer → a twenty-first graphic convolution layer → a twenty-second graphic convolution layer → a third global average pooling layer → a third fully connected layer.
The parameters of each layer of the network can be set as: setting the sizes of spatial convolution kernels of 10 graph convolution layers to be 1 multiplied by 1, setting convolution step sizes to be 1, setting the sizes of temporal convolution kernels to be 9 multiplied by 1, setting convolution step sizes to be 1, 2, 1 in sequence, keeping the number of spatial convolution kernels and the number of temporal convolution kernels of each layer consistent, and setting the numbers to be 32, 64, 128 in sequence; the number of output neurons of the third fully-connected layer was set to 60.
The network layers of the generation sub-network and the behavior classification sub-network are sequentially connected (i.e., the last layer of the generation sub-network, i.e., the skeleton rotation layer, is connected to the first layer of the behavior classification sub-network, i.e., the thirteenth convolution layer), and thus a skeleton behavior recognition network is formed.
In another preferred embodiment, the training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain the trained skeleton behavior recognition network specifically includes:
acquiring a second training data set; the second training data set comprises a plurality of skeleton sample data, and the skeleton sample data comprises coordinate data of skeleton points and behavior labels of skeleton samples;
inputting the data in the second training data set into the trained generation sub-network to obtain a visual angle normalized skeleton sample set;
inputting the data in the visual angle normalization skeleton sample set into a pre-constructed behavior classification sub-network to obtain a sample classification result;
and training the trained generation sub-network and the pre-constructed behavior classification sub-network according to the sample classification result and a preset second loss function by adopting a gradient descent method to obtain the trained skeleton behavior recognition network.
Specifically, with reference to the above embodiment, when training a pre-constructed skeleton behavior recognition network, first obtaining a second training data set from a preset data set, for example, selecting 30000 skeleton samples (having no requirement on view angle) from the preset data set to form the second training data set, where the second training data set includes 30000 skeleton sample data, and the skeleton sample data includes coordinate data of skeleton points and behavior tags of skeleton samples; inputting the skeleton sample data in the obtained second training data set into a trained generation sub-network obtained after training of the generated countermeasure network, correspondingly obtaining a visual angle normalized skeleton sample set, inputting the data in the obtained visual angle normalized skeleton sample set into a pre-constructed behavior classification sub-network, and correspondingly obtaining a sample classification result; and finally, training the trained generation sub-network and the pre-constructed behavior classification sub-network by adopting a gradient descent method according to the obtained sample classification result and a preset second loss function, and correspondingly obtaining the trained skeleton behavior recognition network.
The second loss function may be a cross entropy loss function, and the function expression specifically is:
Figure BDA0003071372540000111
xifor data in the second training data set, p (x)i) Representing the true probability distribution, q (x)i) Representing a predictive probability distribution.
It should be noted that, when the network parameters of the framework behavior recognition network are updated by using the gradient descent method, the learning rate α may be set to 0.1 in advance, the threshold of the number of iterations is set to 60000, the number of framework samples selected in the training set in each iteration is set to 64, and the following execution steps are similar to the steps of updating the network parameters of the countermeasure network in the foregoing embodiment, and are not described here again.
In addition, as can be seen from simulation experiments, 10000 skeleton samples are selected from a preset data set to form a test data set, each skeleton sample data in the test data set is input to a trained skeleton behavior identification network, a behavior classification result corresponding to the skeleton sample is obtained, the number of skeleton samples with the same behavior classification result and the same behavior label in the test data set, that is, the number of skeleton samples with correct classification is 9235, and the accuracy of behavior identification is calculated to be a, the number of skeleton samples with correct classification/the total number of skeleton samples is 9235/10000, which is 92.35%.
An embodiment of the present invention further provides a skeleton behavior recognition apparatus based on view angle normalization, which is a block diagram of a preferred embodiment of the skeleton behavior recognition apparatus based on view angle normalization, shown in fig. 2, and includes:
a generated confrontation network training module 11, configured to train a pre-constructed generated confrontation network; wherein the pre-constructed generation countermeasure network is composed of a generation sub-network and a discrimination sub-network;
a behavior recognition network training module 12, configured to train the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain a trained skeleton behavior recognition network;
a skeleton sequence data acquisition module 13, configured to acquire human skeleton sequence data to be identified;
and the human body skeleton behavior recognition module 14 is used for inputting the human body skeleton sequence data into the trained skeleton behavior recognition network to obtain a behavior recognition result.
Preferably, the generation subnetwork includes a first to sixth graph convolutional layers, a first global average pooling layer, a first fully-connected layer, and a skeleton rotation layer; the discrimination sub-network includes seventh to twelfth graph convolutional layers, a second global average pooling layer, and a second fully-connected layer.
Preference is given toAnd the function expression of the skeleton rotation layer is as follows: v'm,n=RxRyRzvm,n(ii) a Wherein v ism,nRepresents the coordinate data v 'before the rotation of the n-th skeleton point in the m-th frame image'm,nRepresenting the rotated coordinate data of the nth skeleton point in the mth frame image, wherein m is more than or equal to 1, and n is more than or equal to 1; rx、RyAnd RzRespectively represent rotation matrices in x-axis, y-axis and z-axis directions, and
Figure BDA0003071372540000131
Figure BDA0003071372540000132
θx、θyand thetazThe rotation angles in the x-axis, y-axis and z-axis directions are respectively indicated.
Preferably, the generation confrontation network training module 11 specifically includes:
a first training data set acquisition unit for acquiring a first training data set; the first training data set comprises an input sample set and a real sample set, the input sample set comprises a plurality of side-view angle skeleton sample data, and the real sample set comprises a plurality of front-view angle skeleton sample data;
the false sample set acquisition unit is used for inputting the data in the input sample set into a pre-constructed generation sub-network to obtain a false sample set;
the sample discrimination result acquisition unit is used for inputting the data in the false sample set and the data in the real sample set into a pre-constructed discrimination sub-network to obtain a sample discrimination result;
a generation countermeasure network training unit, configured to train the pre-established generation sub-network and the pre-established discrimination sub-network according to the sample discrimination result and a preset first loss function by using a gradient descent method, so as to obtain a trained generation countermeasure network; wherein the trained generative confrontation network comprises the trained generative subnetwork and the trained discriminative subnetwork.
Preferably, the first loss function is:
Figure BDA0003071372540000133
wherein G represents a generation sub-network, D represents a discrimination sub-network, x represents a real sample, z represents a side-view skeleton sample, G (z) represents a dummy sample, Pnorm(x)Representing the probability distribution, P, of a normal-view skeleton sampleside(z)Representing the probability distribution of a side view angle skeleton sample.
Preferably, the behavior classification sub-network includes thirteenth through twenty-second graph convolutional layers, a third global average pooling layer, and a third fully-connected layer.
Preferably, the behavior recognition network training module 12 specifically includes:
a second training data set acquisition unit for acquiring a second training data set; the second training data set comprises a plurality of skeleton sample data, and the skeleton sample data comprises coordinate data of skeleton points and behavior labels of skeleton samples;
a normalized skeleton sample set acquisition unit, configured to input data in the second training data set into the trained generation subnetwork, so as to obtain a view-angle normalized skeleton sample set;
the sample classification result acquisition unit is used for inputting the data in the visual angle normalization skeleton sample set into a pre-constructed behavior classification sub-network to obtain a sample classification result;
and the behavior recognition network training unit is used for training the trained generation sub-network and the pre-constructed behavior classification sub-network according to the sample classification result and a preset second loss function by adopting a gradient descent method to obtain the trained skeleton behavior recognition network.
It should be noted that, the skeleton behavior identification device based on view angle normalization provided in the embodiment of the present invention can implement all the processes of the skeleton behavior identification method based on view angle normalization described in any one of the embodiments, and the functions and implemented technical effects of each module and unit in the device are respectively the same as those of the skeleton behavior identification method based on view angle normalization described in the embodiment, and are not described herein again.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; when running, the computer program controls a device where the computer-readable storage medium is located to execute the skeleton behavior identification method based on perspective normalization according to any one of the embodiments.
An electronic device is further provided in an embodiment of the present invention, and as shown in fig. 3, the electronic device is a block diagram of a preferred embodiment of the electronic device provided in the present invention, the electronic device includes a processor 10, a memory 20, and a computer program stored in the memory 20 and configured to be executed by the processor 10, and when the computer program is executed, the processor 10 implements the skeleton behavior identification method based on perspective normalization according to any one of the embodiments.
Preferably, the computer program can be divided into one or more modules/units (e.g. computer program 1, computer program 2,) which are stored in the memory 20 and executed by the processor 10 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the electronic device.
The Processor 10 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, etc., the general purpose Processor may be a microprocessor, or the Processor 10 may be any conventional Processor, the Processor 10 is a control center of the electronic device, and various interfaces and lines are used to connect various parts of the electronic device.
The memory 20 mainly includes a program storage area that may store an operating system, an application program required for at least one function, and the like, and a data storage area that may store related data and the like. In addition, the memory 20 may be a high speed random access memory, may also be a non-volatile memory, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), and the like, or the memory 20 may also be other volatile solid state memory devices.
It should be noted that the above-mentioned electronic device may include, but is not limited to, a processor and a memory, and those skilled in the art will understand that the structural block diagram in fig. 3 is only an example of the above-mentioned electronic device, and does not constitute a limitation of the electronic device, and may include more or less components than those shown in the drawings, or may combine some components, or different components.
To sum up, the method and the device for identifying the skeleton behavior based on the view angle normalization, the computer-readable storage medium and the electronic device provided by the embodiment of the invention have the following beneficial effects:
(1) the embodiment of the invention constructs a generated countermeasure network, realizes the transformation from input skeleton sample data to normal-view-angle skeleton sample data distribution by utilizing a countermeasure loss function, overcomes the defect that the existing skeleton data lack the corresponding relation of transformation coordinates among view angles, and realizes the view angle normalization;
(2) the embodiment of the invention constructs the visual angle normalization framework behavior recognition network based on the generated countermeasure network, solves the problem of large difference in the frameworks caused by visual angle diversity in a real scene, overcomes the defects that the dynamic characteristics of human behaviors are damaged and explicit visual angle constraints and physical meanings are lacked in the prior art, and improves the accuracy of behavior recognition.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A skeleton behavior identification method based on visual angle normalization is characterized by comprising the following steps:
training a pre-constructed generated countermeasure network; wherein the pre-constructed generation countermeasure network is composed of a generation sub-network and a discrimination sub-network;
training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain a trained skeleton behavior recognition network;
acquiring human skeleton sequence data to be identified;
and inputting the human body skeleton sequence data into the trained skeleton behavior recognition network to obtain a behavior recognition result.
2. The visual angle normalization-based skeleton behavior recognition method according to claim 1, wherein the generation sub-network comprises a first graph convolution layer, a sixth graph convolution layer, a first global average pooling layer, a first fully-connected layer and a skeleton rotation layer; the discrimination sub-network includes seventh to twelfth graph convolutional layers, a second global average pooling layer, and a second fully-connected layer.
3. The method for identifying skeletal behaviors based on perspective normalization of claim 2, wherein the function expression of the skeleton rotation layer is: v'm,n=RxRyRzvm,n(ii) a Wherein v ism,nRepresents the coordinate data v 'before the rotation of the n-th skeleton point in the m-th frame image'm,nRepresenting the rotated coordinate data of the nth skeleton point in the mth frame image, wherein m is more than or equal to 1, and n is more than or equal to 1; rx、RyAnd RzRespectively represent rotation matrices in x-axis, y-axis and z-axis directions, and
Figure FDA0003071372530000011
θx、θyand thetazThe rotation angles in the x-axis, y-axis and z-axis directions are respectively indicated.
4. The visual angle normalization-based skeleton behavior recognition method according to any one of claims 1 to 3, wherein the training of the pre-constructed generative confrontation network specifically comprises:
acquiring a first training data set; the first training data set comprises an input sample set and a real sample set, the input sample set comprises a plurality of side-view angle skeleton sample data, and the real sample set comprises a plurality of front-view angle skeleton sample data;
inputting the data in the input sample set into a pre-constructed generation sub-network to obtain a false sample set;
inputting the data in the false sample set and the data in the real sample set into a pre-constructed discrimination sub-network to obtain a sample discrimination result;
training the pre-constructed generation sub-network and the pre-constructed discrimination sub-network according to the sample discrimination result and a preset first loss function by adopting a gradient descent method to obtain a trained generation countermeasure network; wherein the trained generative confrontation network comprises the trained generative subnetwork and the trained discriminative subnetwork.
5. The method of claim 4, wherein the first loss function is:
Figure FDA0003071372530000021
wherein G represents a generation sub-network, D represents a discrimination sub-network, x represents a real sample, z represents a side-view skeleton sample, G (z) represents a dummy sample, Pnorm(x)Representing the probability distribution, P, of a normal-view skeleton sampleside(z)Representing the probability distribution of a side view angle skeleton sample.
6. The visual-angle-normalization-based skeletal behavior recognition method of claim 1, wherein the behavior classification sub-network comprises a thirteenth graph convolutional layer to a twenty-second graph convolutional layer, a third global-average pooling layer, and a third fully-connected layer.
7. The visual angle normalization-based skeleton behavior recognition method according to claim 1 or 6, wherein the training of the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain the trained skeleton behavior recognition network specifically comprises:
acquiring a second training data set; the second training data set comprises a plurality of skeleton sample data, and the skeleton sample data comprises coordinate data of skeleton points and behavior labels of skeleton samples;
inputting the data in the second training data set into the trained generation sub-network to obtain a visual angle normalized skeleton sample set;
inputting the data in the visual angle normalization skeleton sample set into a pre-constructed behavior classification sub-network to obtain a sample classification result;
and training the trained generation sub-network and the pre-constructed behavior classification sub-network according to the sample classification result and a preset second loss function by adopting a gradient descent method to obtain the trained skeleton behavior recognition network.
8. The utility model provides a skeleton behavior recognition device based on visual angle normalization which characterized in that includes:
the generation confrontation network training module is used for training a pre-constructed generation confrontation network; wherein the pre-constructed generation countermeasure network is composed of a generation sub-network and a discrimination sub-network;
the behavior recognition network training module is used for training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain a trained skeleton behavior recognition network;
the framework sequence data acquisition module is used for acquiring human framework sequence data to be identified;
and the human body skeleton behavior recognition module is used for inputting the human body skeleton sequence data into the trained skeleton behavior recognition network to obtain a behavior recognition result.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program; the computer program controls a device where the computer readable storage medium is located to execute the visual angle normalization-based skeleton behavior identification method according to any one of claims 1 to 7 when running.
10. An electronic device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor, when executing the computer program, implements the perspective normalization-based skeletal behavior recognition method according to any one of claims 1 to 7.
CN202110538744.9A 2021-05-18 2021-05-18 Visual angle normalization-based skeleton behavior identification method, device and equipment Active CN113239819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110538744.9A CN113239819B (en) 2021-05-18 2021-05-18 Visual angle normalization-based skeleton behavior identification method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110538744.9A CN113239819B (en) 2021-05-18 2021-05-18 Visual angle normalization-based skeleton behavior identification method, device and equipment

Publications (2)

Publication Number Publication Date
CN113239819A true CN113239819A (en) 2021-08-10
CN113239819B CN113239819B (en) 2022-05-03

Family

ID=77134984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110538744.9A Active CN113239819B (en) 2021-05-18 2021-05-18 Visual angle normalization-based skeleton behavior identification method, device and equipment

Country Status (1)

Country Link
CN (1) CN113239819B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216896A (en) * 2008-01-14 2008-07-09 浙江大学 An identification method for movement by human bodies irrelevant with the viewpoint based on stencil matching
CN101853388A (en) * 2009-04-01 2010-10-06 中国科学院自动化研究所 Unchanged view angle behavior identification method based on geometric invariable
CN104598890A (en) * 2015-01-30 2015-05-06 南京邮电大学 Human body behavior recognizing method based on RGB-D video
CN105631420A (en) * 2015-12-23 2016-06-01 武汉工程大学 Multi-angle indoor human action recognition method based on 3D skeleton
CN106909938A (en) * 2017-02-16 2017-06-30 青岛科技大学 Viewing angle independence Activity recognition method based on deep learning network
CN108764107A (en) * 2018-05-23 2018-11-06 中国科学院自动化研究所 Behavior based on human skeleton sequence and identity combination recognition methods and device
CN109871750A (en) * 2019-01-02 2019-06-11 东南大学 A kind of gait recognition method based on skeleton drawing sequence variation joint repair
CN111160164A (en) * 2019-12-18 2020-05-15 上海交通大学 Action recognition method based on human body skeleton and image fusion
CN111832516A (en) * 2020-07-22 2020-10-27 西安电子科技大学 Video behavior identification method based on unsupervised video representation learning
CN112633377A (en) * 2020-12-24 2021-04-09 电子科技大学 Human behavior prediction method and system based on generation of confrontation network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216896A (en) * 2008-01-14 2008-07-09 浙江大学 An identification method for movement by human bodies irrelevant with the viewpoint based on stencil matching
CN101853388A (en) * 2009-04-01 2010-10-06 中国科学院自动化研究所 Unchanged view angle behavior identification method based on geometric invariable
CN104598890A (en) * 2015-01-30 2015-05-06 南京邮电大学 Human body behavior recognizing method based on RGB-D video
CN105631420A (en) * 2015-12-23 2016-06-01 武汉工程大学 Multi-angle indoor human action recognition method based on 3D skeleton
CN106909938A (en) * 2017-02-16 2017-06-30 青岛科技大学 Viewing angle independence Activity recognition method based on deep learning network
CN108764107A (en) * 2018-05-23 2018-11-06 中国科学院自动化研究所 Behavior based on human skeleton sequence and identity combination recognition methods and device
CN109871750A (en) * 2019-01-02 2019-06-11 东南大学 A kind of gait recognition method based on skeleton drawing sequence variation joint repair
CN111160164A (en) * 2019-12-18 2020-05-15 上海交通大学 Action recognition method based on human body skeleton and image fusion
CN111832516A (en) * 2020-07-22 2020-10-27 西安电子科技大学 Video behavior identification method based on unsupervised video representation learning
CN112633377A (en) * 2020-12-24 2021-04-09 电子科技大学 Human behavior prediction method and system based on generation of confrontation network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DANILO AVOLA等: "《2-D Skeleton-Based Action Recognition via Two-Branch Stacked LSTM-RNNs》", 《IEEE TRANSACTIONS ON MULTIMEDIA》 *
FANJIA LI等: "《Multi-Stream and Enhanced Spatial-Temporal Graph Convolution Network for Skeleton-Based Action Recognition》", 《IEEE ACCESS》 *
董安等: "《基于图卷积的骨架行为识别》", 《现代计算机》 *
钱一琛: "《基于生成对抗的人脸正面化生成》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN113239819B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN106503671B (en) The method and apparatus for determining human face posture
CN112446270B (en) Training method of pedestrian re-recognition network, pedestrian re-recognition method and device
CN109359526B (en) Human face posture estimation method, device and equipment
JP5352738B2 (en) Object recognition using 3D model
CN107045631B (en) Method, device and equipment for detecting human face characteristic points
Beach et al. Quantum image processing (quip)
CN113223091B (en) Three-dimensional target detection method, three-dimensional target capture device and electronic equipment
CN110991513B (en) Image target recognition system and method with continuous learning ability of human-like
CN112639846A (en) Method and device for training deep learning model
WO2005111936A1 (en) Parameter estimation method, parameter estimation device, and correlation method
CN108960189A (en) Image recognition methods, device and electronic equipment again
CN112560967B (en) Multi-source remote sensing image classification method, storage medium and computing device
CN111127631A (en) Single image-based three-dimensional shape and texture reconstruction method, system and storage medium
CN113011401B (en) Face image posture estimation and correction method, system, medium and electronic equipment
CN111709980A (en) Multi-scale image registration method and device based on deep learning
CN111968165A (en) Dynamic human body three-dimensional model completion method, device, equipment and medium
Bui et al. When regression meets manifold learning for object recognition and pose estimation
CN111126254A (en) Image recognition method, device, equipment and storage medium
CN115331259A (en) Three-dimensional human body posture estimation method, system and storage medium
CN113239819B (en) Visual angle normalization-based skeleton behavior identification method, device and equipment
Skočaj et al. Incremental and robust learning of subspace representations
CN112699784A (en) Face orientation estimation method and device, electronic equipment and storage medium
CN114120436A (en) Motion recognition model training method, motion recognition method and related device
Khalifa et al. An automatic facial age proression estimation system
Luo et al. Robot artist performs cartoon style facial portrait painting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant