CN111582220B

CN111582220B - Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof

Info

Publication number: CN111582220B
Application number: CN202010419839.4A
Authority: CN
Inventors: 张一帆; 程科; 程健
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-05-18
Filing date: 2020-05-18
Publication date: 2023-05-26
Anticipated expiration: 2040-05-18
Also published as: CN111582220A

Abstract

The invention discloses a skeletal point behavior recognition system based on a shift map convolution neural network, which comprises the following steps: the system comprises an image acquisition module, an image processing module, an extraction module and a behavior recognition module, wherein the image acquisition module is used for acquiring a behavior image; the image processing module is used for processing the behavior image acquired by the image acquisition module to perform image processing; the extraction module is used for extracting skeleton points of the image processed by the image processing module; the behavior recognition module is used for recognizing and extracting the behavior characteristics of the bone points extracted by the extraction module. The design behavior recognition module is used for carrying out novel graph convolution for recognizing the bone point behavior and reducing the calculated amount of graph convolution, and unlike the traditional graph convolution, the shift graph convolution does not expand the sensing range by expanding the convolution kernel, but enables graph features to be shifted and spliced through novel shift operation, so that the same or even higher recognition precision is achieved under the condition that the calculated amount is obviously reduced and the calculation speed is improved, and the calculated amount of the traditional graph convolution is prevented from being increased along with the increase of the convolution kernel.

Description

Bone point behavior recognition system based on shift map convolution neural network and recognition method thereof

Technical Field

The invention relates to a bone point behavior recognition system based on a shift map convolution neural network, which relates to the field of general image data processing or G06T generation, in particular to the field of G06T7/20 motion analysis.

Background

In the behavior recognition task, due to the restriction of data volume and algorithm, the behavior recognition model based on the RGB image is often interfered by the change of the visual angle and the complex background, so that the generalization performance is insufficient, and the robustness is poor in practical application. And behavior recognition based on skeletal point data can better address this problem.

In the bone point data, the human body is represented by coordinates of a plurality of predefined key nodes in a camera coordinate system. It can be conveniently obtained by a depth camera and various attitude estimation algorithms.

However, in this conventional graph convolution method, the convolution kernel modeled only covers a neighborhood of one point. However, in skeletal point behavior recognition tasks, some behaviors (e.g., clapping hands) require modeling the positional relationship of points that are physically far apart (e.g., two hands). This requires increasing the convolution kernel size of the graph convolution model. However, the calculation amount of graph convolution increases as the convolution kernel increases, so that the calculation amount of the traditional graph convolution is larger.

Disclosure of Invention

The invention aims to: a system for identifying skeletal point behaviors based on a shift map convolution neural network is provided to solve the above problems in the prior art.

The technical scheme is as follows: a shift-graph-based convolution neural network skeletal point behavior recognition system, comprising:

an image acquisition module for acquiring a behavior image;

the image processing module is used for processing the behavior image acquired by the image acquisition module to perform image processing;

the skeleton point extraction module is used for extracting the image processed by the image processing module;

and the behavior recognition module is used for recognizing and extracting the behavior characteristics of the bone points by the extraction module.

In a further embodiment, the image acquisition module is based on an image acquisition device, the image acquisition device comprises a camera which is arranged in an equilateral triangle shape, and a rotating device which is arranged at the tail part of the camera and comprises a rotating shaft fixedly connected with the camera, and a rotating motor which is sleeved on the rotating shaft.

In a further embodiment, the image acquisition module performs image capturing of human body behaviors through three groups of cameras which are arranged in an equilateral triangle shape, and further, before, after and on the side parts of the behavior images acquired by the three groups of cameras are installed, the behavior images are respectively displayed on the computer terminal, and then the image processing module is used for comparing and processing the images.

In a further embodiment, the image processing module is mainly configured to process the human behavior image acquired by the image acquisition module into a human edge map; through a Krisch edge detection operator, when detecting the edge of an image, using a convolution 3*3 template to traverse pixel points in the image, examining pixel gray values of adjacent areas around each pixel point one by one, and calculating the gray weighted sum difference value of the gray weights of three adjacent pixels and the gray weights of the other five pixels; the convolution templates are as follows:

1 2 3 4

5 6 7 8

using eight convolution templates to sequentially process all pixels in an original image, calculating to obtain the edge strength of the pixels, detecting the pixels through a threshold value, extracting the final edge point, and finishing edge detection;

the Krisch operator detection image edge implementation steps are as follows:

step 1, acquiring a data area pointer of an original image;

step 2, two buffer areas are established, the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and the original image copy, the two buffer areas are initialized to the original image copy, and the original image copy is marked as an image 1 and an image 2 respectively;

step 3, setting a Krisch template for convolution operation in each buffer area, traversing pixels in the duplicate images in two areas respectively, carrying out convolution operation one by one, comparing calculated results, storing calculated comparative values in the image 1, and copying the image 1 into the buffer image 2;

Step 4, repeating the step 3, setting the rest six templates once, performing calculation processing, and finally obtaining larger gray values in the image 1 and the image 2 and storing the larger gray values in the buffer image 1;

and 5, copying the processed image 1 into original image data, and programming to realize the edge processing of the image.

In a further embodiment, the extraction module is configured to extract a bone point of the image processed by the image processing module, and when the image processing module finishes processing the image acquired by the image acquisition module, the bone point position that is input in advance according to the closest acquired image agent body shape on the human body edge map is then displayed on the human body edge map.

In a further embodiment, the extracting module further includes a correction module, when the image acquiring module acquires the human behavior image, the sizes of the frames need to be normalized to the same size because the sizes of the frames are different when the same group of actions are performed by the people with different sizes of the frames;

firstly, selecting a skeleton of a person as a reference skeleton, selecting a body center point as a root node for certain frame skeleton data, calculating vectors from all points directly connected with the root node to the root node, respectively using the modular length of the vectors at each vector to obtain a direction vector (the modular length is 1) of each vector, multiplying the length of the corresponding vector in the reference skeleton by the direction vector to obtain a vector, adding the coordinates of the root node to the vector to obtain the corrected coordinates of a point directly connected with the root node, recording the coordinates of the connected points, using the coordinates as the coordinates of the corresponding normalized skeleton point, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, repeating the steps until the values of all skeleton points are corrected, and the algorithm is as follows:

Input: the length of the limb in the reference addition is

Preparing normalizationIs a skeleton point coordinate value;

the first step: definition of the definition

The root node coordinates;

and a second step of: will be

Giving an initial value of +.>

；

Thirdly, performing the following steps; for all things

) Sequentially executing according to breadth-first search strategies; />

Fourth step: calculation of

-

；

Fifth step: calculation of

；

Sixth step:

+

will->

The values of (a) are saved to set a;

seventh step: returning to the third part, and knowing that all limbs in the skeleton are traversed;

and (3) outputting: the skeleton point coordinates stored in the set A are corrected coordinates;

wherein ,

the value representing->

Limbs, suffering from pain>

Representing the +.sup.th in the reference valuation>

Length of individual limb->

Respectively represent +.>

Coordinate values of the start node and the end node of the limb, so that all +.>

Calculating the values of the bone points to obtain all corrected bone point coordinates, and scaling the estimated size under the condition of ensuring that the included angle between limbs is unchanged;

when the included angle between the limbs changes, the included angle between vectors is selected to describe the bone points so as to avoid the bone point deviation when the included angle between the limbs changes;

the step of solving the human joint vector included angle is as follows:

Solving the angle of a certain joint point, firstly obtaining three joint points used for calculating the angle, capturing three-dimensional coordinate values of the joint point by using Kinect, constructing structural vectors among the three joint points of the component, and then solving the size of an included angle of the joint vectors by adopting an inverse cosine theorem;

find the angle of the first joint

As an example;

selecting other two joint points connected with the first joint, acquiring three-dimensional coordinate values of the joint points captured by Kinect, wherein the other two joint points are expressed as

、

The first joint point is expressed as

；

Constructing an inter-articular structure vector from a first articulation point to

Point vector->

=

First node to->

Point vector->

=

，

Point to Point->

Vector of (2) is

；

Calculating vectors

Sum vector->

Included angle->

Size of:

wherein ,

in order to make the joint vector included angle representation more accurate, according to the importance ranking of joint angles in the course of action, selecting representative joint angles for representation, and correcting the bone point position by size normalization and angle correction.

In a further embodiment, the behavior recognition module is mainly configured to perform recognition and extraction of behavior features of bone points, shift and splice neighboring behavior features according to an adjacency relationship of the graph, and perform convolution of 1*1 only once after splicing to obtain calculated behavior features, for one of the behavior features

For each node diagram, the feature dimension is set as +.>

Characteristic size is +.>

Wherein node->

Is +.>

The adjacent nodes are adjacent to each other, and the set of adjacent nodes is

The method comprises the steps of carrying out a first treatment on the surface of the For->

The shift map module equally divides the characteristics of the nodes into +.>

+1 parts, the first part retaining its own characteristics, followed by +.>

The shares are shifted from their neighbor node features, expressed mathematically as follows:

=

wherein ,

，

subscript->

A tag representing Python, < >>

The double vertical lines represent feature dimensions for feature stitching.

A recognition method of a bone point behavior recognition system based on a shift graph convolution neural network comprises the following steps:

step 1, firstly, controlling a camera to rotate through an image acquisition module, and further acquiring a human behavior characteristic image; the rotation motor rotates to drive the rotation shaft to rotate, so that the rotation shaft drives the camera to rotate, and the position of the camera is adjusted;

step 2, the image acquisition module performs image shooting human body behaviors through three groups of cameras which are arranged in an equilateral triangle shape, and further, the behavior images acquired by the three groups of cameras are respectively displayed on a computer terminal before, after and at the side parts of the installation, so that the image processing module can perform contrast processing on the images;

Step 3, the image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human edge image; through a Krisch edge detection operator, when detecting the edge of an image, using a convolution 3*3 template to traverse pixel points in the image, examining pixel gray values of adjacent areas around each pixel point one by one, and calculating the gray weighted sum difference value of the gray weights of three adjacent pixels and the gray weights of the other five pixels;

the Krisch operator detection image edge implementation steps are as follows:

step 1, acquiring a data area pointer of an original image;

step 5, copying the processed image 1 into original image data, and programming to realize the edge processing of the image;

step 4, when the human body behavior feature image processing is finished, the extraction module is used for extracting skeleton points of the image processed by the image processing module, and when the image processing module is finished processing the image acquired by the image acquisition module, the skeleton points which are matched and input in advance according to the body shape of the nearest acquired image agent are positioned on the human body edge map, and then the matched skeleton points are displayed on the human body edge map;

step 5, when the skeleton point extraction is completed, the position of the skeleton point is corrected by the correction module, and when the image acquisition module acquires the human body behavior image, the skeleton size is normalized to be the same size because the three-dimensional coordinates of the skeleton point are different due to the fact that the skeleton sizes of the people with different body types are different when the people with different body types perform the same group of actions; firstly, selecting a skeleton of a person as a reference skeleton, selecting a body center point as a root node for certain frame skeleton data, calculating vectors from all points directly connected with the root node to the root node, respectively taking the modular length of the vectors at each vector to obtain a direction vector (the modular length is 1) of each vector, multiplying the length of the corresponding vector in the reference skeleton by the direction vector to obtain a vector, adding the coordinates of the root node to the vector to obtain the corrected coordinates of a point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding bone point after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until the values of all the bone points are corrected; the correction method is to scale the estimated size under the condition of ensuring that the included angle between limbs is unchanged;

the step of solving the human joint vector included angle is as follows:

find the angle of the first joint

As an example;

、

The first joint point is expressed as

；

Point vector->

=

First node to->

Point vector->

=

，

Point to Point->

Vector of (2) is

；

Calculating vectors

Sum vector->

Included angle->

Size of: />

wherein ,

in order to make the joint vector included angle representation more accurate, selecting a representative joint angle to represent according to the importance ranking of the joint angles in the course of behavior, and correcting the bone point position through size normalization and angle correction;

Step 6, after the correction of the bone points is completed, the behavior recognition module is used for recognizing the behaviors of the bone points, adjacent behavior features are shifted and spliced according to the adjacent relation of the graph, the calculated behavior features can be obtained by only carrying out one 1*1 convolution after splicing, and one behavior feature is obtained

For each node diagram, the feature dimension is set as +.>

Characteristic size is +.>

Wherein node->

Is +.>

The adjacent nodes are adjacent to each other, and the set of adjacent nodes is +.>

The shift map module equally divides the characteristics of the nodes into +.>

+1 parts, the first part retaining its own characteristics, followed by +.>

=

wherein ,

，

subscript->

A tag representing Python, < >>

The double vertical lines represent feature dimensions for feature stitching, so that skeleton point behavior features are identified.

The beneficial effects are that: the invention discloses a bone point behavior recognition system based on a shift graph convolution neural network, which is characterized in that a behavior recognition module is designed to recognize the bone point behavior, so that the calculated amount of graph convolution can be remarkably reduced, and the shift graph convolution is different from the traditional graph convolution, the sensing range is not expanded by expanding a convolution kernel, but the graph characteristics are subjected to shift splicing by a novel shift operation, so that the same or even higher recognition precision can be achieved under the condition that the calculated amount is remarkably reduced and the calculation speed is improved, and further, the situation that the calculated amount of the traditional graph convolution is increased along with the increase of the convolution kernel, and further, the calculated amount of the traditional graph convolution is larger is caused.

Drawings

FIG. 1 is a diagram of a skeletal point behavior recognition shift map convolution of the present invention.

Fig. 2 is a schematic of a local chart of the present invention.

Fig. 3 is a schematic diagram of a non-local chart of the present invention.

Fig. 4 is a diagram of a traditional graph convolution for identifying skeletal point behaviors.

FIG. 5 is a table comparing the accuracy and computational complexity of a shift map convolution with a conventional map convolution method.

Detailed Description

The reason why this problem occurs (the conventional graph convolution calculation amount is large) is that in the conventional graph convolution method, the convolution kernel modeled by the method can only cover a neighborhood of one point. However, in skeletal point behavior recognition tasks, some behaviors (e.g., clapping hands) require modeling the positional relationship of points that are physically far apart (e.g., two hands). This requires increasing the convolution kernel size of the graph convolution model. However, the calculated amount of graph convolution increases along with the increase of convolution kernels, so that the calculated amount of traditional graph convolution is larger, and the design behavior recognition module performs the behavior recognition on skeleton points, so that the calculated amount of graph convolution can be remarkably reduced.

A shift-graph-based convolution neural network skeletal point behavior recognition system, comprising: an image acquisition module for acquiring a behavior image; the image processing module is used for processing the behavior image acquired by the image acquisition module to perform image processing; the skeleton point extraction module is used for extracting the image processed by the image processing module; the behavior recognition module is used for recognizing and extracting the behavior characteristics of the bone points extracted by the extraction module;

the present invention does not specify a method of bone point extraction. There are many methods for human skeletal point extraction, for example: shooting from a camera, and then acquiring human skeletal points by using an algorithm. Obtained directly from the Kinect camera. The human body wears the acceleration sensor, so that the bone position is directly obtained; the invention concerns how behavior recognition is performed in case bone points have been acquired. However, the invention is not limited to the extraction method of the bone points, and any bone point extraction method is adopted, but in the embodiment, a correction module is provided to perform recognition correction on the image, and meanwhile, the image acquisition device is correspondingly changed to increase the multiple angles of image acquisition.

The image acquisition module is based on an image acquisition device, the image acquisition device comprises a camera which is arranged in an equilateral triangle shape, and a rotating device which is arranged at the tail part of the camera, the rotating device comprises a rotating shaft which is fixedly connected with the camera, and a rotating motor which is sleeved with the rotating shaft.

The image acquisition module is used for shooting human body behaviors through three groups of cameras which are arranged in an equilateral triangle shape, and further, behavior images acquired by the three groups of cameras are respectively displayed on the computer terminal before, after and at the side parts of the installation, so that the image processing module is used for comparing and processing the images.

The image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human edge image; through a Krisch edge detection operator, when detecting the edge of an image, using a convolution 3*3 template to traverse pixel points in the image, examining pixel gray values of adjacent areas around each pixel point one by one, and calculating the gray weighted sum difference value of the gray weights of three adjacent pixels and the gray weights of the other five pixels; the convolution templates are as follows:

1 2 3 4

5 6 7 8

using eight convolution templates to sequentially process all pixels in an original image, calculating to obtain the edge strength of the pixels, detecting the pixels through a threshold value, extracting the final edge point, and finishing edge detection; the Krisch operator detection image edge implementation steps are as follows: step 1, acquiring a data area pointer of an original image;

The extraction module is used for extracting skeleton points of the image processed by the image processing module, when the image processing module finishes processing the image acquired by the image acquisition module, the skeleton points are matched and pre-recorded according to the body type of the agent closest to the acquired image on the human body edge map, and then the matched skeleton points are displayed on the human body edge map.

The extraction module further comprises a correction module, when the image acquisition module acquires human behavior images, the sizes of the frames are different because of different sizes of the human bodies, and when the human bodies with different sizes perform the same group of actions, the three-dimensional coordinates of bone points are different because of the different sizes of the frames, so that the sizes of the frames are required to be normalized to the same size;

input: the length of the limb in the reference addition is

Preparing normalizationIs a skeleton point coordinate value;

the first step: definition of the definition

The root node coordinates;

and a second step of: will be

Giving an initial value of +.>

；

Thirdly, performing the following steps; for all things

) Sequentially executing according to breadth-first search strategies;

fourth step: calculation of

-

；

Fifth step: calculation of

；

Sixth step:

+

will->

The values of (a) are saved to set a;

wherein ,

the value representing->

Limbs, suffering from pain>

Representing the +.sup.th in the reference valuation>

Length of individual limb->

Respectively represent +.>

the step of solving the human joint vector included angle is as follows:

find the angle of the first joint

As an example;

、

The first joint point is expressed as

；

Point vector->

=

First node to->

Point vector->

=

，

Point to Point->

Vector of (2) is

；

Calculating vectors

Sum vector->

Included angle->

Size of:

wherein ,

The behavior recognition module is mainly used for recognizing and extracting bone point behavior characteristics, shifting and splicing adjacent behavior characteristics according to the adjacent relation of the graph, and obtaining calculated behavior characteristics by only carrying out 1*1 convolution after splicing, wherein the calculated behavior characteristics correspond to one

For each node diagram, the feature dimension is set as +.>

Characteristic size is +.>

Wherein node->

Is +.>

The shift map module equally divides the characteristics of the nodes into +.>

+1 parts, the first part retaining its own characteristics, followed by +.>

=

wherein ,

，

subscript->

A tag representing Python, < >>

The double vertical lines represent feature dimensions for feature stitching; to intuitively understand the above formula, we take a graph of 7 nodes 20-dimensional features as an example, as shown in fig. 2 and 3; here we discuss two cases:

1. the neighborhood of each point contains only physically contiguous locations, we call the local design, shown in FIG. 2;

2. the location of each point contains the entire human skeletal map, we call the non-local design, shown in FIG. 3;

for both designs we take node 1 (node 1) and node 2 (node 2) as examples, respectively; as will be explained in detail below,

in fig. 2, for node 1, there are 1 contiguous nodes (i.e., node 2), so we divide its features equally into 1+1=2 shares, with the first share maintaining its own features (node 1 labeled as part 1) and the second share shifted from node 2 (node 1 labeled as part 2). In fig. 2, for node 2, there are 3 adjacent nodes (i.e., node 1, node 3 and node 4), so we divide its features equally into 3+1=4 parts, with the first part retaining its own features (node 1 labeled as part 2) and the next 3 parts shifted from

nodes

1, 3, 4, respectively (corresponding to node 1 labeled as

part

1, 3, 4, respectively).

In fig. 3, for any one node, all other nodes are contiguous with it, so we shift the features of all other nodes from the current node. Examples of

nodes

1 and 2 are shown in fig. 3. After shifting, the resulting features appear to be helical, which is a result of a thorough mixing of the features of the different nodes, and experiments show that in both designs of the shifted graph convolution, the non-local design is more accurate in the task of behavior recognition, since it can better fuse the features of the different nodes, enable efficient feature fusion even if the nodes are far apart,

it should be noted that, with the same recognition accuracy, the proposed convolution of the shift map is more than 3 times smaller in calculation cost than the convolution of the conventional map, which is very important for quick recognition, and this method can be faster, on the one hand, because of the number of times of calculation of the convolution that we save (compare fig. 1 and 4); on the other hand, the shift operation may be implemented in the C++ or CUDA language by a pointer, and thus may be deployed very efficiently on the CPU or GPU.

Our main experiments are shown in figure 5. ST-GCN, adaptive-GCN and Adaptive-NL GCN are three typical methods of conventional GCN. Our Shift GCN includes both Local Shift GCN and Non-Local Shift GCN designs. As can be seen from the table, the FLOPs (floating point number of computations, representing computational complexity) of our method is more than 3 times smaller than the conventional graph convolution, which is very important for fast recognition. And, our precision is also higher than the traditional graph rolling method.

In addition, we also compare the case of adjacency matrix reducing the convolution of the traditional graph, i.e. the model of suffix "one a", their calculation is comparable to that of us, but the accuracy is significantly reduced. This means that the accuracy is significantly reduced when the computation amount of the conventional graph convolution is reduced. Our Shift GCN can reach the accuracy exceeding all the previous algorithms with small calculation amount.

Description of working principle: firstly, controlling the camera to rotate through the image acquisition module, and further acquiring the human behavior characteristic image; the rotation motor rotates to drive the rotation shaft to rotate, so that the rotation shaft drives the camera to rotate, and the position of the camera is adjusted; the image acquisition module is used for shooting human body behaviors through three groups of cameras which are arranged in an equilateral triangle shape, and further, the behavior images acquired by the three groups of cameras are respectively displayed on the computer terminal before, after and at the side parts of the installation, so that the image processing module is used for comparing and processing the images; the image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human edge image; through a Krisch edge detection operator, when detecting the edge of an image, using a convolution 3*3 template to traverse pixel points in the image, examining pixel gray values of adjacent areas around each pixel point one by one, and calculating the gray weighted sum difference value of the gray weights of three adjacent pixels and the gray weights of the other five pixels; using eight convolution templates to sequentially process all pixels in an original image, calculating to obtain the edge strength of the pixels, detecting the pixels through a threshold value, extracting the final edge point, and finishing edge detection; the Krisch operator detection image edge implementation steps are as follows:

Step 1, acquiring a data area pointer of an original image;

when the human body behavior feature image processing is finished, the extraction module is used for extracting skeleton points of the image processed by the image processing module, and when the image processing module is finished processing the image acquired by the image acquisition module, the skeleton points which are matched and pre-recorded according to the body shape of the nearest acquired image agent are positioned on the human body edge map, and then the matched skeleton points are displayed on the human body edge map; when the skeleton points are extracted, the position of the skeleton points is corrected by the correction module, and when the image acquisition module acquires the human body behavior image, the skeleton sizes are normalized to be the same size because the three-dimensional coordinates of the skeleton points are different due to the fact that the skeleton sizes of the people are different when the people with different body types perform the same group of actions; firstly, selecting a skeleton of a person as a reference skeleton, selecting a body center point as a root node for certain frame skeleton data, calculating vectors from all points directly connected with the root node to the root node, respectively taking the modular length of the vectors at each vector to obtain a direction vector (the modular length is 1) of each vector, multiplying the length of the corresponding vector in the reference skeleton by the direction vector to obtain a vector, adding the coordinates of the root node to the vector to obtain the corrected coordinates of a point directly connected with the root node, recording the coordinates of the connected points as the coordinate values of the corresponding bone point after normalization, sequentially updating the coordinate values of the root node according to the sequence of a breadth-first search algorithm, and repeating the steps until the values of all the bone points are corrected; the correction method is to scale the estimated size under the condition of ensuring that the included angle between limbs is unchanged; when the included angle between the limbs changes, the included angle between vectors is selected to describe the bone points so as to avoid the bone point deviation when the included angle between the limbs changes; the step of solving the human joint vector included angle is as follows: solving the angle of a certain joint point, firstly obtaining three joint points used for calculating the angle, capturing three-dimensional coordinate values of the joint point by using Kinect, constructing structural vectors among the three joint points of the component, and then solving the size of an included angle of the joint vectors by adopting an inverse cosine theorem; in order to enable the joint vector included angle representation to be more accurate, according to the importance ranking of joint angles in the behavior process, representative joint angles are selected for representation, and then the bone point positions are corrected through size normalization and angle correction; after the correction of the bone points is completed, the behavior recognition module is used for recognizing the behaviors of the bone points, adjacent behavior features are shifted and spliced according to the adjacent relation of the graph, and the calculated behavior features can be obtained by only performing 1*1 convolution once after splicing.

The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the specific details of the above embodiments, and various equivalent changes can be made to the technical solutions of the present invention within the scope of the technical concept of the present invention, and these equivalent changes all fall within the scope of the present invention.

Claims

1. A shift graph convolution neural network-based skeletal point behavior recognition system, comprising:

the behavior recognition module is used for recognizing and extracting the behavior characteristics of the bone points extracted by the extraction module;

the behavior recognition module is mainly used for recognizing and extracting bone point behavior characteristics, shifting and splicing the adjacent behavior characteristics according to the adjacent relation of the graph, and obtaining calculated behavior characteristics by only carrying out 1*1 convolution after splicing, wherein for N node graphs, the characteristic dimension is set as C, and the characteristic size is set as [ N, C ]]Wherein n nodes of the node v are adjacent to each other, and the set of adjacent nodes is that

For the v-th node, the shift map module uniformly divides its features into n+1 shares, the first share maintains its own features, and the following n shares are shifted from their neighbor node features, expressed mathematically as follows:

wherein ,

the subscript V in (a) represents a mark of Python, and the II … II double vertical lines represent feature dimensions for feature stitching;

The system also comprises an image acquisition module for acquiring the behavior image;

the image acquisition module is based on an image acquisition device, the image acquisition device comprises a camera which is arranged in an equilateral triangle shape, and a rotating device which is arranged at the tail part of the camera, the rotating device comprises a rotating shaft which is fixedly connected with the camera, and a rotating motor which is sleeved with the rotating shaft;

the system also comprises an image processing module for processing the behavior image acquired by the image acquisition module to perform image processing;

the image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human edge image; traversing pixel points in an image by using a 3*3 convolution template when detecting the edge of the image through a Krisch edge detection operator, examining pixel gray values of adjacent areas around each pixel point one by one, and calculating the gray weighted sum difference value of the gray weights of three adjacent pixels and the gray weighted sum of the other five pixels;

the bone point extraction module is used for extracting the image processed by the image processing module; the extraction module is used for extracting skeleton points of the image processed by the image processing module, when the image processing module finishes processing the image acquired by the image acquisition module, the skeleton points are matched and pre-recorded according to the body type of the agent closest to the acquired image on the human body edge map, and then the matched skeleton points are displayed on the human body edge map.

2. The shift-map-based convolution neural network skeletal point behavior recognition system of claim 1, wherein: the image acquisition module is used for shooting human body behaviors through three groups of cameras which are arranged in an equilateral triangle shape, and further, behavior images acquired by the three groups of cameras are respectively displayed on the computer terminal before, after and at the side parts of the installation, so that the image processing module is used for comparing and processing the images.

3. The shift-map-based convolution neural network skeletal point behavior recognition system of claim 1, wherein: the convolution templates are as follows:

using eight convolution templates to sequentially process all pixels in an original image, calculating to obtain edge strength of the pixels, detecting the pixels through a threshold value, extracting a final edge point, and finishing edge detection;

the Krisch operator detection image edge implementation steps are as follows:

step 1, acquiring a data area pointer of an original image;

4. The shift-map-based convolution neural network skeletal point behavior recognition system of claim 1, wherein: the extraction module further comprises a correction module, wherein the correction module firstly selects a human skeleton as a reference skeleton, selects a body center point as a root node for certain frame skeleton data, calculates vectors from all points directly connected with the root node to the root node, uses the modular length of the vectors to obtain the direction vector of each vector by each vector, the modular length is 1, multiplies the length of the corresponding vector in the reference skeleton by the direction vector to obtain a vector, adds the vector to the coordinates of the root node to obtain the corrected coordinates of a point directly connected with the root node, records the coordinates of the connected point as the coordinates of the corresponding bone point after normalization, sequentially updates the coordinate values of the root node according to the sequence of breadth-first search algorithm, and repeats the steps until the values of all the bone points are corrected, and the algorithm is as follows:

Input: the length of the limb in the reference appendage is R _i Preparing normalized bone point coordinate values;

the first step: definition of the definition

The root node coordinates;

and a second step of: will be

Giving an initial value of +.>

Thirdly, performing the following steps; for all of

Sequentially executing according to breadth-first search strategies;

fourth step: calculation of

Fifth step: calculation of

Sixth step:

will->

The values of (a) are saved to set a;

wherein ,I_i The value of (2) represents the ith limb, R _i Representing the length of the ith limb in the reference assessment,

coordinate values respectively representing the start node and the end node of the ith limb in the reference valuation, so that all +.>

the step of solving the human joint vector included angle is as follows:

find the angle theta of the first joint ₁ As an example;

selecting other two joint points connected with a first joint, and acquiring three-dimensional coordinate values of the joint points captured by Kinect, wherein the other two joint points are represented as S (sx, sy, sz), W (wx, wy, wz), and the first joint point is represented as E (ex, ey, ez);

constructing an inter-articular structure vector, a first articular point to S (sx, sy, sz) point vector

First node to W (wx, wy, wz) point vector +.>

The vector from point W (wx, wy, wz) to point S (sx, sy, sz) is +.>

Calculating vectors

Sum vector->

Included angle theta of (2) ₁ Size of:

wherein ,θ₁ In order to make the joint vector included angle representation more accurate, according to the importance ranking of joint angles in the course of action, selecting representative joint angles for representation, and correcting the bone point position by size normalization and angle correction.

5. The method for identifying the skeletal point behavior based on the shift map convolution neural network according to claim 1, which is characterized by comprising the following steps:

step 3, the image processing module is mainly used for processing the human behavior image acquired by the image acquisition module into a human edge image; traversing pixel points in an image by using a 3*3 convolution template when detecting the edge of the image through a Krisch edge detection operator, examining pixel gray values of adjacent areas around each pixel point one by one, and calculating the gray weighted sum difference value of the gray weights of three adjacent pixels and the gray weighted sum of the other five pixels;

The Krisch operator detection image edge implementation steps are as follows:

step (1), acquiring a data area pointer of an original image;

step (2), two buffer areas are established, the size of the buffer areas is the same as that of the original image, the buffer areas are mainly used for storing the original image and the original image copy, the two buffer areas are initialized to the original image copy, and the original image copy is marked as an image 1 and an image 2 respectively;

step (3), setting a Krisch template for convolution operation in each buffer area, traversing pixels in the duplicate images in two areas respectively, carrying out convolution operation one by one, comparing calculated results, storing calculated comparative values in the image 1, and copying the image 1 into the buffer image 2;

step (4), repeating the step (3), setting the rest six templates once, performing calculation processing, and finally obtaining larger gray values in the image 1 and the image 2 and storing the larger gray values in the buffer image 1;

copying the processed image 1 into original image data, and programming to realize the edge processing of the image;

Step 5, when the skeleton point extraction is finished, correcting the skeleton point position by a correction module, when the image acquisition module acquires a human body behavior image, firstly selecting a skeleton of a human body as a reference skeleton, selecting a body center point as a root node for certain frame skeleton data, calculating vectors from all points directly connected with the root node to the root node, respectively using the modular length of the vectors at each vector to obtain a direction vector of each vector, wherein the modular length is 1, multiplying the length of the corresponding vector in the reference skeleton by the direction vector to obtain a vector, adding the vector to the coordinates of the root node to obtain the corrected coordinates of a point directly connected with the root node, recording the coordinates of the connected point as the coordinates of the corresponding skeleton point after normalization, sequentially updating the coordinate values of the root node according to the sequence of breadth-first search algorithm, and repeating the steps until the values of all the skeleton points are corrected; the correction method is to scale the estimated size under the condition of ensuring that the included angle between limbs is unchanged;

The step of solving the human joint vector included angle is as follows:

find the angle theta of the first joint ₁ As an example;

First node to W (wx, wy, wz) point vector +.>

The vector from point W (wx, wy, wz) to point S (sx, sy, sz) is +.>

Calculating vectors

Sum vector->

Included angle theta of (2) ₁ Size of:

wherein ,θ₁ In order to make the joint vector included angle representation more accurate, selecting a representative joint angle to represent according to the importance ranking of the joint angles in the course of behavior, and correcting the bone point position through size normalization and angle correction;

Step 6, when the bone points are correctedAfter completion, the behavior recognition module performs behavior recognition on bone points, adjacent behavior features are shifted and spliced according to the adjacent relation of the graph, the calculated behavior features can be obtained by performing 1*1 convolution only once after splicing, and for N node graphs, the feature dimension is set as C, and the feature size is set as [ N, C ]]Wherein n nodes of the node v are adjacent to each other, and the set of adjacent nodes is that

wherein ,

the subscript V in (a) represents a mark of Python, and the double vertical lines represent feature dimensions for feature stitching, so that skeleton point behavior features are identified. />