CN113239819A

CN113239819A - Visual angle normalization-based skeleton behavior identification method, device and equipment

Info

Publication number: CN113239819A
Application number: CN202110538744.9A
Authority: CN
Inventors: 谢雪梅; 赵至夫; 潘庆哲; 李佳楠; 曹玉晗; 石光明
Original assignee: Guangzhou Institute of Technology of Xidian University
Current assignee: Guangzhou Institute of Technology of Xidian University
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2021-08-10
Anticipated expiration: 2041-05-18
Also published as: CN113239819B

Abstract

The invention discloses a skeleton behavior identification method based on visual angle normalization, which comprises the following steps: training a pre-constructed generated countermeasure network; wherein the pre-constructed generation countermeasure network is composed of a generation sub-network and a discrimination sub-network; training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain a trained skeleton behavior recognition network; acquiring human skeleton sequence data to be identified; and inputting the human body skeleton sequence data into the trained skeleton behavior recognition network to obtain a behavior recognition result. Correspondingly, the invention also discloses a visual angle normalization-based skeleton behavior identification device, a computer-readable storage medium and electronic equipment. By adopting the technical scheme of the invention, the problem of intra-skeleton expression intra-class difference caused by visual angle diversity can be solved, and the accuracy of behavior identification is improved.

Description

Visual angle normalization-based skeleton behavior identification method, device and equipment

Technical Field

The invention relates to the technical field of computer vision, in particular to a skeleton behavior identification method and device based on visual angle normalization, a computer readable storage medium and electronic equipment.

Background

Behavior recognition is an important and challenging task in computer vision tasks, and has wide application in the fields of security monitoring, intelligent video analysis, human-computer interaction and the like. With the development of human posture estimation technology, behavior recognition based on human skeleton is more and more popular.

The behavior recognition scheme based on the human skeleton disclosed in the prior art specifically comprises the following steps: firstly, skeleton data to be recognized are obtained, then the skeleton data are normalized, the unification of data distribution is achieved, finally, behavior recognition is conducted according to the skeleton data after normalization processing and a preset network model, wherein when the skeleton data are normalized, generally, coordinates of a certain skeleton point are selected as a reference, for example, coordinates of a spine joint are selected as a reference, coordinates of the spine joint are subtracted from coordinates of each remaining skeleton point, so that the skeleton data are converted into a human body coordinate system from a camera coordinate system, then, skeleton rotation operation is conducted, three-dimensional vectors from a right shoulder joint to a left shoulder joint are enabled to be parallel to an X axis of a coordinate space, and three-dimensional vectors from a spine base joint to the spine joint are enabled to be parallel to a Y axis of the coordinate space.

Although the technical scheme can reduce the intra-class difference of the skeleton data, the method also reduces the inter-class difference of the skeleton data and destroys the dynamic characteristics of actions related to the right shoulder joint, the left shoulder joint, the spine base joint and the spine joint, so that the visual angle normalization is incomplete, the skeleton expression difference cannot be well reflected, and the accuracy of behavior identification is low.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present invention is to provide a skeleton behavior identification method and apparatus based on view angle normalization, a computer-readable storage medium, and an electronic device, to solve the problem of intra-skeleton expression heterogeneity caused by view angle diversification, and to improve the accuracy of behavior identification.

In order to solve the above technical problem, an embodiment of the present invention provides a skeleton behavior identification method based on view angle normalization, including:

training a pre-constructed generated countermeasure network; wherein the pre-constructed generation countermeasure network is composed of a generation sub-network and a discrimination sub-network;

training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain a trained skeleton behavior recognition network;

acquiring human skeleton sequence data to be identified;

and inputting the human body skeleton sequence data into the trained skeleton behavior recognition network to obtain a behavior recognition result.

Further, the generation sub-network comprises a first graph volume layer, a sixth graph volume layer, a first global average pooling layer, a first full-link layer and a skeleton rotation layer; the discrimination sub-network includes seventh to twelfth graph convolutional layers, a second global average pooling layer, and a second fully-connected layer.

Further, the function expression of the skeleton rotation layer is as follows: v'_m,n＝R_xR_yR_zv_m,n(ii) a Wherein v is_m,nCoordinate data before rotation representing the nth skeleton point in the mth frame image,v'_m,nrepresenting the rotated coordinate data of the nth skeleton point in the mth frame image, wherein m is more than or equal to 1, and n is more than or equal to 1; r_x、R_yAnd R_zRespectively represent rotation matrices in x-axis, y-axis and z-axis directions, and

θ_x、θ_yand theta_zThe rotation angles in the x-axis, y-axis and z-axis directions are respectively indicated.

Further, the training of the pre-constructed generated countermeasure network specifically includes:

acquiring a first training data set; the first training data set comprises an input sample set and a real sample set, the input sample set comprises a plurality of side-view angle skeleton sample data, and the real sample set comprises a plurality of front-view angle skeleton sample data;

inputting the data in the input sample set into a pre-constructed generation sub-network to obtain a false sample set;

inputting the data in the false sample set and the data in the real sample set into a pre-constructed discrimination sub-network to obtain a sample discrimination result;

training the pre-constructed generation sub-network and the pre-constructed discrimination sub-network according to the sample discrimination result and a preset first loss function by adopting a gradient descent method to obtain a trained generation countermeasure network; wherein the trained generative confrontation network comprises the trained generative subnetwork and the trained discriminative subnetwork.

Further, the first loss function is:

wherein G represents a generation sub-network, D represents a discrimination sub-network, x represents a real sample, z represents a side-view skeleton sample, G (z) represents a dummy sample, P_norm(x)Represents positiveProbability distribution of view angle skeleton samples, P_side(z)Representing the probability distribution of a side view angle skeleton sample.

Further, the behavior classification sub-network includes thirteenth through twenty-second graph convolutional layers, a third global average pooling layer, and a third fully-connected layer.

Further, the training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain the trained skeleton behavior recognition network specifically includes:

acquiring a second training data set; the second training data set comprises a plurality of skeleton sample data, and the skeleton sample data comprises coordinate data of skeleton points and behavior labels of skeleton samples;

inputting the data in the second training data set into the trained generation sub-network to obtain a visual angle normalized skeleton sample set;

inputting the data in the visual angle normalization skeleton sample set into a pre-constructed behavior classification sub-network to obtain a sample classification result;

and training the trained generation sub-network and the pre-constructed behavior classification sub-network according to the sample classification result and a preset second loss function by adopting a gradient descent method to obtain the trained skeleton behavior recognition network.

In order to solve the above technical problem, an embodiment of the present invention further provides a skeleton behavior recognition apparatus based on view angle normalization, including:

the generation confrontation network training module is used for training a pre-constructed generation confrontation network; wherein the pre-constructed generation countermeasure network is composed of a generation sub-network and a discrimination sub-network;

the behavior recognition network training module is used for training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain a trained skeleton behavior recognition network;

the framework sequence data acquisition module is used for acquiring human framework sequence data to be identified;

and the human body skeleton behavior recognition module is used for inputting the human body skeleton sequence data into the trained skeleton behavior recognition network to obtain a behavior recognition result.

An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; when the computer program runs, the device where the computer-readable storage medium is located is controlled by the computer program to execute any one of the above skeleton behavior identification methods based on perspective normalization.

An embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor, when executing the computer program, implements any one of the above skeleton behavior identification methods based on perspective normalization.

Compared with the prior art, the embodiment of the invention provides a visual angle normalization-based skeleton behavior recognition method, a visual angle normalization-based skeleton behavior recognition device, a computer-readable storage medium and an electronic device.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of a method for identifying skeleton behavior based on view normalization according to the present invention;

fig. 2 is a block diagram of a framework behavior recognition apparatus based on perspective normalization according to a preferred embodiment of the present invention;

fig. 3 is a block diagram of an electronic device according to a preferred embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, are within the scope of the present invention.

The embodiment of the present invention provides a skeleton behavior identification method based on perspective normalization, which is a flowchart of a preferred embodiment of the skeleton behavior identification method based on perspective normalization provided by the present invention, as shown in fig. 1, and the method includes steps S11 to S14:

step S11, training a pre-constructed generated countermeasure network; wherein the pre-constructed generation countermeasure network is composed of a generation sub-network and a discrimination sub-network;

step S12, training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain a trained skeleton behavior recognition network;

step S13, obtaining human body skeleton sequence data to be identified;

and step S14, inputting the human body skeleton sequence data into the trained skeleton behavior recognition network to obtain a behavior recognition result.

Specifically, the embodiment of the invention builds a generation sub-network, a judgment sub-network and a behavior classification sub-network in advance, the generation sub-network and the judgment sub-network jointly form a generation countermeasure network, and the generation sub-network and the behavior classification sub-network jointly form a skeleton behavior identification network; in order to obtain a trained framework behavior recognition network, training a pre-constructed generation countermeasure network to correspondingly obtain the trained generation countermeasure network (including a trained generation sub-network and a trained discrimination sub-network), and then training the trained generation sub-network and a pre-constructed behavior classification sub-network to correspondingly obtain the trained framework behavior recognition network; when the actual behavior recognition is carried out according to the human body skeleton, the human body skeleton sequence data to be recognized are obtained firstly, and then the obtained human body skeleton sequence data are input into the trained skeleton behavior recognition network, so that the behavior recognition result is obtained correspondingly.

The method includes acquiring an RGB image sequence by using a camera, extracting a skeleton sequence from the RGB image sequence by using a skeleton posture estimation tool, namely, correspondingly acquiring human skeleton sequence data to be recognized, where each image corresponds to a set of human skeleton data, where the human skeleton data specifically includes a certain number of skeleton points (e.g., 25) in the image, and each skeleton point data is three-dimensional coordinate data corresponding to a position where the skeleton point is located.

It should be noted that, in the embodiment of the present invention, the generation countermeasure network is trained, and the trained generation sub-network and the trained discrimination sub-network are obtained accordingly, where the discrimination sub-network only functions in the training process of generating the countermeasure network, which is helpful for obtaining a better trained generation sub-network, and when the framework behavior recognition network is trained, the trained generation sub-network is retrained again, so that a better trained framework behavior recognition network can be obtained.

According to the visual angle normalization-based skeleton behavior recognition method provided by the embodiment of the invention, a pre-constructed generation countermeasure network is trained, the pre-constructed generation countermeasure network is composed of a generation sub-network and a judgment sub-network, the trained generation sub-network and a pre-constructed behavior classification sub-network are trained to obtain a trained skeleton behavior recognition network, and then obtained human skeleton sequence data are input into the trained skeleton behavior recognition network to obtain a behavior recognition result, so that the problem of intra-skeleton expression intra-class difference caused by visual angle diversity can be solved, and the accuracy of behavior recognition is improved.

In another preferred embodiment, the generation subnetwork comprises first through sixth graph convolutional layers, a first global average pooling layer, a first fully-connected layer, and a skeleton rotation layer; the discrimination sub-network includes seventh to twelfth graph convolutional layers, a second global average pooling layer, and a second fully-connected layer.

Specifically, in combination with the above embodiment, the pre-established generation sub-network is a 9-layer network, and the structure thereof sequentially includes: the first graph volume lamination layer → the second graph volume lamination layer → the third graph volume lamination layer → the fourth graph volume lamination layer → the fifth graph volume lamination layer → the sixth graph volume lamination layer → the first global average pooling layer → the first full-link layer → the framework rotation layer, the pre-built judgment sub-network is an 8-layer network, and the structure sequentially comprises the following components: the seventh convolution layer → the eighth convolution layer → the ninth convolution layer → the tenth convolution layer → the eleventh convolution layer → the twelfth convolution layer → the second global average pooling layer → the second global connection layer.

The parameters of each layer of the network can be set as: setting the sizes of spatial convolution kernels of 12 graph convolution layers to be 1 multiplied by 1, setting convolution step sizes to be 1, setting the sizes of temporal convolution kernels to be 9 multiplied by 1, setting convolution step sizes to be 1, 2, 1, 2 and 1 in sequence, keeping the number of the spatial convolution kernels and the temporal convolution kernels of each layer consistent, and setting the numbers to be 16, 32, 64, 16, 32, 64 and 64 in sequence; setting the number of output neurons of the first full-connection layer to be 3, namely the three-dimensional angle of the skeleton data needing to rotate; the number of output neurons of the second fully-connected layer is set to 1, i.e. the probability of discriminating the input of the subnetwork as a true sample.

It should be noted that the skeleton sequence data is essentially a graph structure data, and the graph convolution layer can well model the skeleton sequence to perform feature extraction, so the embodiment of the present invention uses the graph convolution layer to build a corresponding network model.

As an improvement of the above scheme, a functional expression of the skeleton rotation layer is as follows: v'_m,n＝R_xR_yR_zv_m,n(ii) a Wherein v is_m,nRepresents the coordinate data v 'before the rotation of the n-th skeleton point in the m-th frame image'_m,nRepresenting the rotated coordinate data of the nth skeleton point in the mth frame image, wherein m is more than or equal to 1, and n is more than or equal to 1; r_x、R_yAnd R_zRespectively represent rotation matrices in x-axis, y-axis and z-axis directions, and

Specifically, in combination with the above embodiments, the last layer of the generated sub-network is a skeleton rotation layer for performing view angle normalization processing on skeleton data, and the three-dimensional rotation angle is set to be (θ)_x，θ_y，θ_z)，θ_x、θ_yAnd theta_zThe data of the skeleton data are respectively expressed by rotation angles corresponding to the rotation of the skeleton data around an x axis, a y axis and a z axis, and the data of the skeleton data are set as V ═ V_m,nM is more than or equal to 1 and less than or equal to M, N is more than or equal to 1 and less than or equal to N, M represents the total number of frames of the images in the image sequence corresponding to the frame sequence, N represents the number of the frame points in each frame of image, and the number is determined according to theta_x、θ_yAnd theta_zRespectively defining rotation matrixes corresponding to the rotation of the skeleton data around the x axis, the y axis and the z axis as R_x、R_yAnd R_zThe specific form of the rotation matrix is

Then, each of the skeleton sequence data V may be rotated by a skeleton rotation layer with respect to any one of the skeleton data V_m,nRotating the cylinder, wherein the corresponding rotating formula is v'_m,n＝R_xR_yR_zv_m,n，v_m,nSpecifically, the three-dimensional coordinate data v 'before rotation of the nth skeleton point in the mth frame image'_m,nSpecifically representing the rotated three-dimensional coordinate data of the nth skeleton point in the mth frame image, and after the rotation operation is performed on each skeleton data in the skeleton sequence data V, the output of the skeleton rotation layer obtained accordingly is V '═ V'_m,n|1≤m≤M,1≤n≤N}。

In another preferred embodiment, the training of the pre-constructed generated countermeasure network specifically includes:

Specifically, with reference to the above embodiment, when training a pre-constructed generated countermeasure network, a first training data set is first obtained from a preset data set, where the first training data set includes an input sample set and a real sample set, the input sample set includes a plurality of (e.g., 10000) side view angle skeleton sample data, and the real sample set includes a plurality of (e.g., 10000) front view angle skeleton sample data; inputting the side-view angle skeleton sample data in the obtained input sample set into a pre-constructed generation sub-network, outputting a false sample set by a skeleton rotating layer, inputting the data in the obtained false sample set and the front-view angle skeleton sample data in the real sample set into a pre-constructed discrimination sub-network to discriminate whether the sample is true or false, and correspondingly obtaining a sample discrimination result; and finally, updating parameters of a pre-constructed generation sub-network and a pre-constructed discrimination sub-network according to the obtained sample discrimination result and a preset first loss function by adopting a gradient descent method, assigning the updated parameter values to corresponding network parameters of each layer, and correspondingly obtaining a trained generation countermeasure network, wherein the trained generation countermeasure network comprises the trained generation sub-network and the trained discrimination sub-network.

As an improvement of the above solution, the first loss function is:

wherein G represents a generation sub-network, D represents a discrimination sub-network, x represents a real sample, z represents a side-view skeleton sample, G (z) represents a dummy sample, P_norm(x)Representing the probability distribution, P, of a normal-view skeleton sample_side(z)Representing the probability distribution of a side view angle skeleton sample.

Specifically, with reference to the above embodiment, the functional expression of the first loss function is specifically:

g denotes a generator sub-network, D denotes a discriminant sub-network, x denotes a true sample, z denotes a side-view skeleton sample, P denotes a side-view skeleton sample_norm(x)Representing the probability distribution, P, of a normal-view skeleton sample_side(z)The probability distribution of the side view angle skeleton samples is shown, G (z) shows the output of the side view angle skeleton samples z of the generation sub-network G, namely false samples obtained after the side view angle skeleton samples are input into the generation sub-network, D (x) shows the output of the discrimination sub-network D on the true samples x, and D (G (z)) shows the output of the discrimination sub-network D on the false samples G (z).

It should be noted that, the specific steps of generating the update of the network parameters of the countermeasure network by using the gradient descent method are as follows:

(1) setting a learning rate alpha to 0.0002 in advance, setting an iteration time threshold to 30000, and setting the number of skeleton samples selected in a training set by each iteration to 32;

(2) let the parameter dimension of the generating subnetwork be N₁Discriminating the parameter dimension of the subnetwork to be N₂Respectively obtaining the output D (x) of the discrimination sub-network D to the real sample x and the output D (G (z)) of the discrimination sub-network D to the false sample G (z), and calculating the loss by combining the first loss functionThe gradient of the corresponding obtained generation subnetwork is expressed as the loss value J

The gradient of the discriminating subnetwork is expressed as

(3) Updating the network parameters of the generating sub-network according to the gradient vector of the generating sub-network, wherein the network parameter updating formula of the generating sub-network is as follows

θ_gIndicates N before update₁The parameters of the formation sub-network are maintained,

representing updated N₁Dimension generating sub-network parameters;

(4) updating the network parameters of the discrimination sub-network according to the gradient vector of the discrimination sub-network, wherein the network parameters of the discrimination sub-network are updated according to the formula

θ_dIndicates N before update₂The dimensions identify the parameters of the sub-network,

representing updated N₂Dimension discrimination sub-network parameters;

(5) after each time of updating the generation sub-network and the judgment sub-network, judging whether the current iteration number reaches a preset iteration number threshold value 30000; if so, stopping updating the network parameters to obtain a trained generated countermeasure network; otherwise, repeating the steps (2) to (4) until the iteration number reaches 30000, and obtaining the trained generated countermeasure network.

In yet another preferred embodiment, the behavior classification subnetwork includes a thirteenth through twenty-second graph convolutional layer, a third global average pooling layer, and a third fully-connected layer.

Specifically, in combination with the above embodiment, the pre-established behavior classification sub-network is a 12-layer network, and the structure thereof sequentially includes: a thirteenth graphic convolution layer → a fourteenth graphic convolution layer → a fifteenth graphic convolution layer → a sixteenth graphic convolution layer → a seventeenth graphic convolution layer → an eighteenth graphic convolution layer → a nineteenth graphic convolution layer → a twentieth graphic convolution layer → a twenty-first graphic convolution layer → a twenty-second graphic convolution layer → a third global average pooling layer → a third fully connected layer.

The parameters of each layer of the network can be set as: setting the sizes of spatial convolution kernels of 10 graph convolution layers to be 1 multiplied by 1, setting convolution step sizes to be 1, setting the sizes of temporal convolution kernels to be 9 multiplied by 1, setting convolution step sizes to be 1, 2, 1 in sequence, keeping the number of spatial convolution kernels and the number of temporal convolution kernels of each layer consistent, and setting the numbers to be 32, 64, 128 in sequence; the number of output neurons of the third fully-connected layer was set to 60.

The network layers of the generation sub-network and the behavior classification sub-network are sequentially connected (i.e., the last layer of the generation sub-network, i.e., the skeleton rotation layer, is connected to the first layer of the behavior classification sub-network, i.e., the thirteenth convolution layer), and thus a skeleton behavior recognition network is formed.

In another preferred embodiment, the training the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain the trained skeleton behavior recognition network specifically includes:

Specifically, with reference to the above embodiment, when training a pre-constructed skeleton behavior recognition network, first obtaining a second training data set from a preset data set, for example, selecting 30000 skeleton samples (having no requirement on view angle) from the preset data set to form the second training data set, where the second training data set includes 30000 skeleton sample data, and the skeleton sample data includes coordinate data of skeleton points and behavior tags of skeleton samples; inputting the skeleton sample data in the obtained second training data set into a trained generation sub-network obtained after training of the generated countermeasure network, correspondingly obtaining a visual angle normalized skeleton sample set, inputting the data in the obtained visual angle normalized skeleton sample set into a pre-constructed behavior classification sub-network, and correspondingly obtaining a sample classification result; and finally, training the trained generation sub-network and the pre-constructed behavior classification sub-network by adopting a gradient descent method according to the obtained sample classification result and a preset second loss function, and correspondingly obtaining the trained skeleton behavior recognition network.

The second loss function may be a cross entropy loss function, and the function expression specifically is:

x_ifor data in the second training data set, p (x)_i) Representing the true probability distribution, q (x)_i) Representing a predictive probability distribution.

It should be noted that, when the network parameters of the framework behavior recognition network are updated by using the gradient descent method, the learning rate α may be set to 0.1 in advance, the threshold of the number of iterations is set to 60000, the number of framework samples selected in the training set in each iteration is set to 64, and the following execution steps are similar to the steps of updating the network parameters of the countermeasure network in the foregoing embodiment, and are not described here again.

In addition, as can be seen from simulation experiments, 10000 skeleton samples are selected from a preset data set to form a test data set, each skeleton sample data in the test data set is input to a trained skeleton behavior identification network, a behavior classification result corresponding to the skeleton sample is obtained, the number of skeleton samples with the same behavior classification result and the same behavior label in the test data set, that is, the number of skeleton samples with correct classification is 9235, and the accuracy of behavior identification is calculated to be a, the number of skeleton samples with correct classification/the total number of skeleton samples is 9235/10000, which is 92.35%.

An embodiment of the present invention further provides a skeleton behavior recognition apparatus based on view angle normalization, which is a block diagram of a preferred embodiment of the skeleton behavior recognition apparatus based on view angle normalization, shown in fig. 2, and includes:

a generated confrontation network training module 11, configured to train a pre-constructed generated confrontation network; wherein the pre-constructed generation countermeasure network is composed of a generation sub-network and a discrimination sub-network;

a behavior recognition network training module 12, configured to train the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain a trained skeleton behavior recognition network;

a skeleton sequence data acquisition module 13, configured to acquire human skeleton sequence data to be identified;

and the human body skeleton behavior recognition module 14 is used for inputting the human body skeleton sequence data into the trained skeleton behavior recognition network to obtain a behavior recognition result.

Preferably, the generation subnetwork includes a first to sixth graph convolutional layers, a first global average pooling layer, a first fully-connected layer, and a skeleton rotation layer; the discrimination sub-network includes seventh to twelfth graph convolutional layers, a second global average pooling layer, and a second fully-connected layer.

Preference is given toAnd the function expression of the skeleton rotation layer is as follows: v'_m,n＝R_xR_yR_zv_m,n(ii) a Wherein v is_m,nRepresents the coordinate data v 'before the rotation of the n-th skeleton point in the m-th frame image'_m,nRepresenting the rotated coordinate data of the nth skeleton point in the mth frame image, wherein m is more than or equal to 1, and n is more than or equal to 1; r_x、R_yAnd R_zRespectively represent rotation matrices in x-axis, y-axis and z-axis directions, and

Preferably, the generation confrontation network training module 11 specifically includes:

a first training data set acquisition unit for acquiring a first training data set; the first training data set comprises an input sample set and a real sample set, the input sample set comprises a plurality of side-view angle skeleton sample data, and the real sample set comprises a plurality of front-view angle skeleton sample data;

the false sample set acquisition unit is used for inputting the data in the input sample set into a pre-constructed generation sub-network to obtain a false sample set;

the sample discrimination result acquisition unit is used for inputting the data in the false sample set and the data in the real sample set into a pre-constructed discrimination sub-network to obtain a sample discrimination result;

a generation countermeasure network training unit, configured to train the pre-established generation sub-network and the pre-established discrimination sub-network according to the sample discrimination result and a preset first loss function by using a gradient descent method, so as to obtain a trained generation countermeasure network; wherein the trained generative confrontation network comprises the trained generative subnetwork and the trained discriminative subnetwork.

Preferably, the first loss function is:

Preferably, the behavior classification sub-network includes thirteenth through twenty-second graph convolutional layers, a third global average pooling layer, and a third fully-connected layer.

Preferably, the behavior recognition network training module 12 specifically includes:

a second training data set acquisition unit for acquiring a second training data set; the second training data set comprises a plurality of skeleton sample data, and the skeleton sample data comprises coordinate data of skeleton points and behavior labels of skeleton samples;

a normalized skeleton sample set acquisition unit, configured to input data in the second training data set into the trained generation subnetwork, so as to obtain a view-angle normalized skeleton sample set;

the sample classification result acquisition unit is used for inputting the data in the visual angle normalization skeleton sample set into a pre-constructed behavior classification sub-network to obtain a sample classification result;

and the behavior recognition network training unit is used for training the trained generation sub-network and the pre-constructed behavior classification sub-network according to the sample classification result and a preset second loss function by adopting a gradient descent method to obtain the trained skeleton behavior recognition network.

It should be noted that, the skeleton behavior identification device based on view angle normalization provided in the embodiment of the present invention can implement all the processes of the skeleton behavior identification method based on view angle normalization described in any one of the embodiments, and the functions and implemented technical effects of each module and unit in the device are respectively the same as those of the skeleton behavior identification method based on view angle normalization described in the embodiment, and are not described herein again.

An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program; when running, the computer program controls a device where the computer-readable storage medium is located to execute the skeleton behavior identification method based on perspective normalization according to any one of the embodiments.

An electronic device is further provided in an embodiment of the present invention, and as shown in fig. 3, the electronic device is a block diagram of a preferred embodiment of the electronic device provided in the present invention, the electronic device includes a processor 10, a memory 20, and a computer program stored in the memory 20 and configured to be executed by the processor 10, and when the computer program is executed, the processor 10 implements the skeleton behavior identification method based on perspective normalization according to any one of the embodiments.

Preferably, the computer program can be divided into one or more modules/units (e.g. computer program 1, computer program 2,) which are stored in the memory 20 and executed by the processor 10 to accomplish the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the electronic device.

The Processor 10 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, etc., the general purpose Processor may be a microprocessor, or the Processor 10 may be any conventional Processor, the Processor 10 is a control center of the electronic device, and various interfaces and lines are used to connect various parts of the electronic device.

The memory 20 mainly includes a program storage area that may store an operating system, an application program required for at least one function, and the like, and a data storage area that may store related data and the like. In addition, the memory 20 may be a high speed random access memory, may also be a non-volatile memory, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), and the like, or the memory 20 may also be other volatile solid state memory devices.

It should be noted that the above-mentioned electronic device may include, but is not limited to, a processor and a memory, and those skilled in the art will understand that the structural block diagram in fig. 3 is only an example of the above-mentioned electronic device, and does not constitute a limitation of the electronic device, and may include more or less components than those shown in the drawings, or may combine some components, or different components.

To sum up, the method and the device for identifying the skeleton behavior based on the view angle normalization, the computer-readable storage medium and the electronic device provided by the embodiment of the invention have the following beneficial effects:

(1) the embodiment of the invention constructs a generated countermeasure network, realizes the transformation from input skeleton sample data to normal-view-angle skeleton sample data distribution by utilizing a countermeasure loss function, overcomes the defect that the existing skeleton data lack the corresponding relation of transformation coordinates among view angles, and realizes the view angle normalization;

(2) the embodiment of the invention constructs the visual angle normalization framework behavior recognition network based on the generated countermeasure network, solves the problem of large difference in the frameworks caused by visual angle diversity in a real scene, overcomes the defects that the dynamic characteristics of human behaviors are damaged and explicit visual angle constraints and physical meanings are lacked in the prior art, and improves the accuracy of behavior recognition.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A skeleton behavior identification method based on visual angle normalization is characterized by comprising the following steps:

acquiring human skeleton sequence data to be identified;

2. The visual angle normalization-based skeleton behavior recognition method according to claim 1, wherein the generation sub-network comprises a first graph convolution layer, a sixth graph convolution layer, a first global average pooling layer, a first fully-connected layer and a skeleton rotation layer; the discrimination sub-network includes seventh to twelfth graph convolutional layers, a second global average pooling layer, and a second fully-connected layer.

3. The method for identifying skeletal behaviors based on perspective normalization of claim 2, wherein the function expression of the skeleton rotation layer is: v'_m,n＝R_xR_yR_zv_m,n(ii) a Wherein v is_m,nRepresents the coordinate data v 'before the rotation of the n-th skeleton point in the m-th frame image'_m,nRepresenting the rotated coordinate data of the nth skeleton point in the mth frame image, wherein m is more than or equal to 1, and n is more than or equal to 1; r_x、R_yAnd R_zRespectively represent rotation matrices in x-axis, y-axis and z-axis directions, and

4. The visual angle normalization-based skeleton behavior recognition method according to any one of claims 1 to 3, wherein the training of the pre-constructed generative confrontation network specifically comprises:

5. The method of claim 4, wherein the first loss function is:

6. The visual-angle-normalization-based skeletal behavior recognition method of claim 1, wherein the behavior classification sub-network comprises a thirteenth graph convolutional layer to a twenty-second graph convolutional layer, a third global-average pooling layer, and a third fully-connected layer.

7. The visual angle normalization-based skeleton behavior recognition method according to claim 1 or 6, wherein the training of the trained generation sub-network and the pre-constructed behavior classification sub-network to obtain the trained skeleton behavior recognition network specifically comprises:

8. The utility model provides a skeleton behavior recognition device based on visual angle normalization which characterized in that includes:

9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program; the computer program controls a device where the computer readable storage medium is located to execute the visual angle normalization-based skeleton behavior identification method according to any one of claims 1 to 7 when running.

10. An electronic device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor, when executing the computer program, implements the perspective normalization-based skeletal behavior recognition method according to any one of claims 1 to 7.